Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-2127559.1
Update Date:2017-12-13
Keywords:

Solution Type  Sun Alert Sure

Solution  2127559.1 :   Oracle Key Manager (OKM) - Technical Service Bulletin - Joining Any Pre OKM 3.1 KMA's To An Existing Pre OKM 3.1 KMA Cluster May Cause An Interruption In Service  


Related Items
  • Oracle Key Manager
  •  
Related Categories
  • PLA-Support>Sun Systems>TAPE>Backup Software-Filesystems>SN-TP: Encryption
  •  




In this Document
Description
Occurrence
Symptoms
Workaround
Patches
History


Applies to:

Oracle Key Manager - Version 2.0.0 to 3.1 [Release 2.0 to 3.0]
Information in this document applies to any platform.

Description

Regardless of the KMA platform architecture (x86 or SPARC), when joining a pre 3.1 KMA to an existing cluster comprised of all or some KMA's which are running a pre 3.1 version, the joining KMA may fail to properly join the cluster, or replication errors will occur. This problem may also be triggered by performing a RESET on any member of the cluster and re-joining the KMA to the cluster.

Replication errors can prevent KMAs in the OKM cluster from achieving synchronization. Lack of synchronization can lead to encryption/decryption keys being unavailable to OKM end points, e.g. tape devices, encrypted file systems, databases, etc. Over time the cluster will get further and further out of sync.

The root cause of the issue is that pre 3.1 OKM code runs in 32-bit mode which is subject to integer overrun. This was resolved in OKM 3.1 and later releases.

Occurrence

The following configurations are at risk:

*OKM clusters with a mixture of SPARC (Netra T4-1) and x86 (SunFire 2x00 or 4170M2) KMAs.
*OKM clusters consisting of the same architecture, for example x4170 m2 KMA's all being upgraded to a 3.0.x version prior to the 3.1 release, may encounter the issue.
*A particular KMA needs to be removed and re-added to a cluster for any reason, including replacement of disks, motherboards or an entire KMA.
*A particular KMA needs to be RESET to factory default state, e.g. for testing a Disaster Recovery Scenario or per Oracle service recommendation.
*The cluster needs to be grown through addition of a new KMA.

Symptoms

Customers can check if they have already encountered the problem using the following procedure:

1. Using the OKM Management GUI, navigate to the “KMA List” screen
2. Check the “Replication Lag Size” column. High values indicate a problem, especially if they are persistent or growing and the network is healthy
3. Using the OKM Management GUI, navigate to the “Audit Event List” and search for events indicating this problem:
    a. Set a filter for “Condition equals SOAP Error” or “Operation equals Accept Replication Service Connection”
    b. Specify “All Retentions” in the filter pull-down so that events with short term retention are included
    c. Click “Use” to apply the filter
    d. Look for SOAP errors with a message value similar to “Validation constraint violation; data type mismatch xsd:int in element ‘ReplicationSchemaVersion’”

4. The following warnings may also be reported in the Audit Event Log:

Database Command Cannot load a database record for Table Name = Key & Table Name = DataUnit

Also, the following errors may be reported:

Database Command Invalid result set size

5. The Audit Event Log may also report Extended Replication Version showing a very large number or a large minus number.  For example:

Extended Replication Version = 1994429966, Replication Version = 14

Extended Replication Version = -2012831474, Replication Version = 14

 

Workaround

The solution is to upgrade any joining pre OKM 3.1 KMA up to at least the 3.1 version, prior to attempting to join the existing KMA cluster.

 

NOTE:  This issue may also be encountered when replacing the hard drive in an existing KMA, which exists in a pre OKM 3.1 cluster.  For example, please see the following:

OKM 3.0 - Replacing Hard Drive In NS T4-1 KMA Which Is In A Mixed Cluster Including 2.x KMA's May Fail To Join The Existing Cluster (Doc ID 2181897.1)

 

NOTE: If a customer site has already encountered the issue, there was a fix_schemaver script made available via OKM Engineering which can be used to mitigate the replication lag size issue.  An SR should be raised to OKM L2 Support, to resolve the issue.

Also, some customer sites may insist on running in a pre OKM 3.1 environment, in which a joining KMA may encounter this issue.  In that case, the site will also need to initiate an SR with OKM L2 Support to resolve the replication lag issue via the fix_schemaver script.

 

Patches

If the problem has already occurred, upgrade all the joining KMAs to OKM 3.1 . The KMAs will then resume replicating data correctly between cluster members.

It is recommended to initiate a Service Request with Oracle Support if this issue is encountered.

History

Created Alert - 4/15/16

Updated Alert - 12/12/17


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback