Diameter Signaling Router (DSR) : Disaster Recovery (DR) NOAMs Raise MJ 31225 Alarm "HA Service Start Failure" After Upgrade or Restore

Asset ID:	1-72-2365072.1
Update Date:	2018-03-06
Keywords:

Solution Type Problem Resolution Sure

Solution 2365072.1 : Diameter Signaling Router (DSR) : Disaster Recovery (DR) NOAMs Raise MJ 31225 Alarm "HA Service Start Failure" After Upgrade or Restore

Applies to:

Oracle Communications Diameter Signaling Router (DSR) - Version DSR 7.2.0 to DSR 8.2.0 [Release DSR 7.0 to DSR 8.0]
Information in this document applies to any platform.
<

Symptoms

The secondary pair of Network Operation, Adminsitation, and Maintenance (NOAM) servers, generally referred to as the 'Disaster Recovery' (DR) pair, raise MJ Alarm 31225 "HA Service Start Failure" against DSROAM_Proc.

Changes

The DR NOAMs have recently been upgraded to DSR 7.2 or later release, or the NOAM --now designated as secondary or DR-- has been restored via Disaster Recovery procedure.

Cause

In DSR 7.2 a change in criteria for alarming was introduced but it failed to consider whether the DR NOAM needed this process running (it doesn't). Hence, this alarm in the DR NOAM is harmless.

Solution

The solution to this condition is to identify the secondary (DR) NOAM server cluster as 'optional' in a specific table. These instructions are in the DSR Installation Guide for every release beginning in DSR 7.2 (see section called "Pairing for DR-NOAM site"), but sites that are upgraded from releases previous to DSR 7.2 will also need this adjustment.

Resolution Steps:

1) Log into the active primary NOAM server kernel as admusr
2) Using a DR NOAM server's hostname, determine the ClusterID for the DR NOAM pair:

Command:
iqt -fClusterID TopologyMapping where "NodeID='<DR_NOAM_Host_Name>'"

Example command:
[admusr@primaryActiveNOAM ~]$ iqt -fClusterID TopologyMapping where "NodeID='dr-noam-a'"

Example output:
Server_ID NodeID ClusterID
0 dr-noam-a A3943

3) Using the ClusterID found above, check if missing in table HaClusterResourceCfg:

Command:
iqt HaClusterResourceCfg where "cluster = '<ClusterID>' and resource = 'DSROAM_Proc'"

Example command:
[admusr@primaryActiveNOAM ~]$ iqt HaClusterResourceCfg where "cluster = 'A3943' and resource = 'DSROAM_Proc'"

If no output, entry does not exist. Proceed to next step.

4) Insert the missing entry into the table to allow the process to be optional:

Command:
echo "<ClusterID>|DSROAM_Proc|Yes" | iload -ha -xun -fcluster -fresource -foptional HaClusterResourceCfg

Example command:
[admusr@primaryActiveNOAM ~]$ echo "A3943|DSROAM_Proc|Yes" | iload -ha -xun -fcluster -fresource -foptional HaClusterResourceCfg

After making the above changes, the 31225 alarm should clear from both DR NOAM servers.

ADDITIONAL STEPS IF CONDUCTED FAILOVER TO DR NOAM PAIR:

For situations where a failover from the Primary Active NOAM pair to the Disaster Recovery (DR) NOAM pair has been completed, the above steps are necessary to clear 31225 alarms from the previously-active NOAM pair (if restored) however care should be taken to ensure that DSROAM_Proc is not 'optional' in HaClusterResourceCfg for the *newly promoted* Primary Active NOAM Pair (previously DR NOAM pair). In DSR 8.x documentation these steps are found in the "DSR / SDS NOAM Failover" guide but these steps may not be readily found in existing DSR 7.x documentation.

After Failover to (Previous) DR NOAM Pair ONLY:

5) Check if a second A-level cluster entry exists in HaClusterResourceCfg

iqt HaClusterResourceCfg where "cluster like 'A*' and resource='DSROAM_Proc'"

6) If two exist, verify the unknown <ClusterID> is indeed associated with the previous Primary Active NOAM pair:

iqt -fClusterID TopologyMapping where "ClusterID = '<ClusterID>'"

7) After verification NodeIDs (hostnames) produced are of the formerly Primary Active NOAM pair, remove the entry from HaClusterResourceCfg with the following:

irem HaClusterResourceCfg where "cluster='<ClusterID>'"

Steps 5-7 above will ensure that the newly active Primary Active NOAM pair will start the dsroam process without intervention after server reboot.

Attachments

This solution has no attachment