Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2190877.1
Update Date:2016-10-11
Keywords:

Solution Type  Problem Resolution Sure

Solution  2190877.1 :   Oracle Key Manager (OKM) - Key Not Found During Restore Because Replication Operation is Running Very Slowly  


Related Items
  • Sun StorageTek Crypto Key Management System
  •  
Related Categories
  • PLA-Support>Sun Systems>TAPE>Backup Software-Filesystems>SN-TP: Encryption
  •  




In this Document
Symptoms
Changes
Cause
Solution


Created from <SR 3-13432404971>

Applies to:

Sun StorageTek Crypto Key Management System - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

A KMA is unable to find an encryption key for a file restore job. This KMA's hard drive has recently been replaced.
The KMA successfully joined the cluster but the key replication process is running very slowly.  The replication lag size remains large.

There are so many SOAP errors with SOAP Fault String = Connection refused. For example:

-
CD082FDB31A2BB47000000000020CD39 CD082FDB31A2BB47 aueis30kma102 Cluster Client Communication Medium Term Retention Request Peer Replication Service SOAP Error Warning 000088000025 2016-10-04 14:03:02.650147+00 sleis01kma01 10.202.51.7 Peer KMA ID = B4CE55A8E66D4EBE, Anti-Entropy Push = FALSE, Function Name = PushUpdates, SOAP Fault Code = SOAP-ENV:Client, SOAP Fault String = Connection refused, SOAP Fault Detail = connect failed in tcp_connect(), Error Code = 28 If the peer KMA is currently applying initial updates after joining the Cluster, then wait until it completes applying these updates. Otherwise, check the management network connection on the KMA reporting the issue as well as the peer KMA. If this condition persists, then one of these KMAs might need to be rebooted.
-


Rebooting the KMAs restored the management port status to "responding". However, the SOAP errors return after a few hours.
After awhile, the replication operation that has been running slowly on the one KMA stopped altogether.


The audit log is filled with audit events that look like this:

---
8F6CB46BE273173C00000000041446BA 8F6CB46BE273173C uceiskma101 Backup Management Operations Medium Term Retention Get Backup In Progress The Job ID could not be found Error 000117000260 2016-10-05 14:38:07.50707+00 backup 152.69.7.132 Job ID = -7641045483588797578 Retry this operation using different input
---

 

Changes

The KMA's HDD was replaced

Cause

The hung OKM backup process is causing an excessive number of audit log events which is adversely affecting the key replication process.
Shortly after the hung OKM backup process is killed, the replication lag size started to decrease at a much faster rate.

Solution

1. Monitor the replication lag size from the KMA list.
  If the lag size is decreasing very slowly (or not at all), search the audit event log for these errors:
  - "SOAP Fault String = Connection refused"
  - "Backup Management Operations" and "Get Backup In Progress" and "The Job ID could not be found"

2. If the SOAP errors and Backup Management Operation errors are filling up the audit event log,
  verify if there is a hung OKM backup operation running in the cluster. Terminate this hung backup process.
 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback