Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1983647.1
Update Date:2018-02-02
Keywords:

Solution Type  Problem Resolution Sure

Solution  1983647.1 :   Oracle Key Manager (OKM) - Seeing Certificate Errors That May be Causing KMAs to Drop Connections With Other KMAs  


Related Items
  • Sun StorageTek Crypto Key Management System
  •  
Related Categories
  • PLA-Support>Sun Systems>TAPE>Backup Software-Filesystems>SN-TP: Encryption
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-10313723601>

Applies to:

Sun StorageTek Crypto Key Management System - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

We are having intermittent OKM GUI login errors on our KMAs.
When we reset / reboot the KMAs, then we are able to login to the GUI, however a few hours later it will not let us login again.

We are seeing a lot of "Peer Certificate serial number does not match", and "Peer Certificate is invalid", on one of our KMAs.
This KMA was recently removed from the cluster, had parts replaced, and then joined back to the cluster from a clean state.

We have tried resetting each KMA in the cluster multiple times, however the problem keeps coming up where we cannot login
to the GUI (for any KMA), and some KMAs soon drop their connections with the other KMAs. Both problems are resolved soon after
resetting each KMA, however both problems keep coming back shortly after.


Certificate errors in the /var/adm/messages file:
-----------------------------------------------------------

Feb 18 21:43:30 a-kma01 OKM: [ID 249117 local7.error] Certificate Verification Peer Certificate is invalid u-kma02 22.222.222.134 Certificate Serial Number = 866B070F94660D13000000000000001
E If the Entity is a KMA, then verify that an Entity for this KMA has already been created in this Cluster. If so, then retry the Join Cluster request with valid information. If not, then check
for a possible break in attempt. If the Entity is a User or an Agent, then verify that this Entity is using valid credentials.


Certificate errors in the KMA Audit log:
-----------------------------------------------------------

8FFC72BE685BE88D0000000000032BA9 8FFC72BE685BE88D a-kma01 Cluster Client Communication Medium Term Retention Request Peer Replication Service SOAP Error War
ning 000088000025 2015-02-21 06:45:59.50596+00 a-kma02 11.111.111.25 Peer KMA ID = E2207FDBA432E901, Anti-Entropy Push = FALSE, Function Name = PushUpdates, SOAP Fault Code = S
OAP-ENV:Client, SOAP Fault String = SSL_ERROR_SSLerror:14094416:SSL routines:SSL3_READ_BYTES:sslv3 alert certificate unknown, SOAP Fault Detail = SSL_connect error in tcp_connect(), Error Cod
e = 30 If the peer KMA is currently applying initial updates after joining the Cluster, then wait until it completes applying these updates. Otherwise, check the management network connection o
n the KMA reporting the issue as well as the peer KMA. If this condition persists, then one of these KMAs might need to be rebooted.

E2207FDBA432E901000000000211523E E2207FDBA432E901 a-kma02 Security Violation Medium Term Retention Certificate Verification Peer Certificate serial number does
not match Error 000082000314 2015-02-21 06:45:59.501902+00 a-kma01 11.111.111.24 Certificate Serial Number = E2207FDBA432E9010000000000000084, Enrolled = TRUE If the Enti
ty is a KMA and it was not recently added or added back to this Cluster, log it back into the Cluster. If the Entity is a User, delete it and create it again. If the Entity is an Agent, re-enroll



Cluster client communication "Connection refused" errors between KMAs:
===========================================================================================================================

DD1446CFE3CFD76F0000000002168272 DD1446CFE3CFD76F u-kma01 Cluster Client Communication Medium Term Retention Request Peer Replication Service SOAP Error W
arning 000088000025 2015-02-22 04:12:25.277922+00 a-kma02 11.111.111.25 Peer KMA ID = E2207FDBA432E901, Anti-Entropy Push = FALSE, Function Name = PushUpdates, SOAP Fault Code =
SOAP-ENV:Client, SOAP Fault String = Connection refused, SOAP Fault Detail = connect failed in tcp_connect(), Error Code = 28 If the peer KMA is currently applying initial updates after joining the Cluster,
then wait until it completes applying these updates. Otherwise, check the management network connection on the KMA reporting the issue as well as the peer KMA. If this condition persists,
then one of these KMAs might need to be rebooted.

E2207FDBA432E90100000000021168D5 E2207FDBA432E901 a-kma02 Cluster Client Communication Medium Term Retention Request Peer Replication Service SOAP Error W
arning 000088000025 2015-02-21 13:51:54.585806+00 u-kma02 22.222.222.134 Peer KMA ID = C24B4174506FA63A, Anti-Entropy Push = FALSE, Function Name = PushUpdates, SOAP Fault Code =
SOAP-ENV:Client, SOAP Fault String = Connection refused, SOAP Fault Detail = connect failed in tcp_connect(), Error Code = 28 If the peer KMA is currently applying initial updates after joining the Cluster,
then wait until it completes applying these updates. Otherwise, check the management network connection on the KMA reporting the issue as well as the peer KMA. If this condition persists,
then one of these KMAs might need to be rebooted.

 

Changes

One KMA had multiple parts replaced, including disk and system board.  Replacing the KMA hard drive and system board will require a QuickStart
and a regeneration of the security certificate when the KMA entity is recreated.

Cause

a-kma01 had multiple parts replaced, including disk, system board and CPU.
It is possible that when this KMA was QuickStarted and re-joined the cluster, the KMA ID may not have been recreated correctly.

a-kma02's KMA passphrase was changed as well but the passphrase change may not have been replicated to the other KMAs
because of the ongoing communication issues among the KMAs.

The "connection refused" errors imply that the KMA is not allowing new connections because the listening socket's queue is already full.
This is most likely a side-effect of ongoing communication issues in the management network among the KMAs.
 

Solution

1. Reboot the KMAs that are throwing "connection refused" errors. Make sure that the first KMA is fully functional and providing
    keys before the next KMA is rebooted.

    Refer to <Document 1019656.1>  How to Correctly Shutdown and Reboot a KMA


2. Perform "Factory reset" for each KMA that is throwing "certificate" errors:

    a. Delete the KMA entry in the OKM GUI.

    b. Factory reset the KMA using the OKM Console.
        ( There is no need to zeroize the KMA. )

    c. Add back the KMA entry in the OKM GUI .

    d. Run QuickStart on this KMA.

    e. Join the KMA back to the cluster.
        Refer to <Document 1950058.1> How to join a KMA to an existing cluster

    Note: Make sure each KMA is fully functional before proceeding with resetting the next KMA.
 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback