Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2128317.1
Update Date:2016-08-19
Keywords:

Solution Type  Problem Resolution Sure

Solution  2128317.1 :   Primary and Secondary CMP (DR) displays Active-Active Condition (Split Brain)  


Related Items
  • Oracle Communications Policy Management
  •  
Related Categories
  • PLA-Support>Sun Systems>CommsGBU>Broadband Network Solutions>SN-SND: Tekelec Policy
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-12476581381>

Applies to:

Oracle Communications Policy Management - Version POLICY 12.0.0 and later
Tekelec

Symptoms

Primary and Secondary CMP (DR) displays Active-Active Condition and following Alarm is seen:
"Detected Remote HA Change to Active [10150/MergeSender.cxx:1256]"

Product:
Oracle Communications Policy Management

Version:
12.0.0.4.0_2.1.0

Business Impact:
Unable to manage the MPE or MRA servers in the network due to Active-Active condition on the Manager.

Cause

There is no actual split brain; the false detection was due to a bug which gets exposed when the failover is done between DR-CMP nodes and is more likely to be seen during upgrade . The bug is in the COMCOL inetmerge process responsible for collecting and updating alarms from the DR-CMP to the Active CMP was in a stuck condition. In this state, the DR-CMP inetmerge paused sending alarm updates to the Primary CMP.

 

Due to this bug, inetmerge connections between CMP and DR-CMP clusters can exhibit several problems.

1. Audit which synchronizes Log tables between CMP's fails to complete thus Log tables on CMP's does not remain synchronized.

2.Problems are exposed when a fail over between CMP nodes occurs. The CMP fail over triggers defective logic that falsely detects a state mismatch on the CMP servers. This logic then attempts to correct the mismatch but instead causes inetmerge to behave as if a split brain were taking place. This is observed by the following alarm

(31107):-* 08/25/2015 04:50:52.191 230 inetmerge DB Merge From Child Failure 01cmp01a
GN_ACTACT: Active-Active conflict detected with peer
[17044:MergeReceiver.cxx:1783]

Once this alarm is present, inetmerge pauses collection of stateful tables (including the list of Active Alarms) to the Active CMP at the Primary site until the perceived split brain is resolved. Unfortunately, since there is no actual split brain, this means that all stateful tables on the Active CMP are frozen with respect to the CMP server. In other words, any alarms that happened to be present in the Alarm table on the CMP remained stuck until this false split-brain can be cleared by manual intervention.

 
Example:

0406:195950.962 TR-V SenderLink[05cmp01a]: Detected Remote HA Change to Active [10150/MergeSender.cxx:1256]
0406:195953.610 TR-V ===[STATE PendingStandby]=== anyPeerAvail=1,peerActive=1, parentAckList=1 [10150/IdbMerge.cxx:964]
0406:195956.012 TR-V SenderLink[05cmp01a]: Detected Remote HA Change to Active [10150/MergeSender.cxx:1256]
0406:195958.628 TR-V ===[STATE PendingStandby]=== anyPeerAvail=1,peerActive=1, parentAckList=1 [10150/IdbMerge.cxx:964]
0406:200001.012 TR-V SenderLink[05cmp01a]: Detected Remote HA Change to Active [10150/MergeSender.cxx:1256]

 


Known Bugs:
Bug 21697864 - cmp01b has a stuck mysql sync alarm
Bug 22007430 - cmp01b has a stuck mysql sync alarm
Bug 21899519 - 31106 and 31107 alarms not getting auto cleared
Bug 20686427 - seeing many "older version of inetmerge on DRNO" (vital trace) logs

Other SR's with the same issue:
3-11817513504 - CMP getting A general error occurred in the application.MYSQL is not ready
3-11400018991 - 03cmp01b reporting DB merging / replication alarms
3-11446411891 - CMP 01 A no data shown on active CMP, CMP B is in standby

Solution

This issue is  fixed in following releases:

12.1.1.0.0

12.2.0.0.0

References

<BUG:22007430> - CMP01B HAS A STUCK MYSQL SYNC ALARM
<BUG:20686427> - [LRGSYS] SEEING MANY "OLDER VERSION OF INETMERGE ON DRNO" LOGS
<BUG:21899519> - 31106 AND 31107 ALARMS NOT GETTING AUTO CLEARED
<BUG:23066369> - ACTIVE-ACTIVE CONDITION BETWEEN CMP'S
<BUG:21697864> - CMP01B HAS A STUCK MYSQL SYNC ALARM

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback