Diameter Signaling Router (DSR) : NOAM Restore Not Working And Causing NOAM Switchover

Asset ID:	1-72-2398226.1
Update Date:	2018-05-14
Keywords:

Solution Type Problem Resolution Sure

Solution 2398226.1 : Diameter Signaling Router (DSR) : NOAM Restore Not Working And Causing NOAM Switchover

Applies to:

Oracle Communications Diameter Signaling Router (DSR) - Version DSR 8.0.0 and later
Tekelec

Symptoms

Switchover happens while doing NOAM Restore as a part of Operational Test Plan (OTP).

Changes

Cause

Oracle Consulting were doing OTP

TP009929

(Operational Test Plan for DSR 8.1) for newly installed DSR system and observed that while executing procedure of NOAM Restore, switchover happens. It looks like replication is getting started right after the restore

Have to verify why the processes that were explicitly stopped by application gets restarted after idb.restore.

How Restore works?
---------------------
Here are the steps that took place in this procedure. All steps are executed on server where restore is going to happen i.e. Active NOAM or SOAM

1. Cache the value of 'nodeCapability' and 'inhibitRepPlans' for each node in the NodeInfo table

2. Shut down various COMCOL processes (pm.set off inetsync inetrep inetmerge vipmgr cmha cmsoapa) in order to fully stop replication and HA
prior to the idb.restore.

3. Restore database files in '$RUN/db/part-to-be-restored' directories and IDB database in shared memory using idb.restore

4. Perform prod.stop -i and prod.start -i -d -V

5. Restore HA and Replication status for all the nodes from the cache

6. Start comcol processes (pm.set respawn inetsync inetrep inetmerge vipmgr cmha cmsoapa)

7. Reset database connections after restore. The msqld process has re-started.

8. Update Node Capability of target server (Active NOAM/SOAM) to Active.

Issue is introduced by DCA in DSR 8.0 as backup entries for CoreAdmPart is added by DCA loaders.

BUG 27944675 has been opened to fix this issue in next GA release.

Conclusion : PmControl IDB table is getting backed up and that is leading to restore issues. PmControl table contains the state of the process. Before restore, replication processes are stopped so that data can be replicated in control fashion, but here PmControl table is getting restored with the old state of the process because of which replication is automatically getting started.

Solution

We have the following workaround that needs to be executed until the BUG is fixed..

1. Before triggering the restore, perform following steps for the workaround:
    a. Log into the Application GUI using the NOAM VIP as user “with admin privileges”.
    b. Navigate to [Main Menu >Status & Manage >Database] , then find the NOAM server name with “OAM Max HA Role” of “Standby”
    c. Use the SSH command (on UNIX systems – or putty if running on Windows) to log into the Standby NOAM server:
            i. PROMPT> ssh admusr@<standby NOAM server address>
            ii. password: <enter password>
            iii. Answer yes if you are asked to confirm the identity of the server.
    d. Execute following command to fully stop replication and HA prior to restore

pm.set off inetsync inetrep inetmerge vipmgr cmha cmsoapa

    e. Following expected alarms will be raised due to the processes stopped in above step:
            i. Alarm(#31114) DB Replication over SOAP has failed
            ii. Alarm(#31107) DB Merge From Child Failure
            iii. Alarm(#31101) DB Replication To Slave Failure
            iv. Alarm(#31106) DB Merge To Parent Failure
            v. Alarm(#31283) Lost Communication with server

2. Restore the database using following steps. All these steps are same as per the OTP

TP009929

, except step# 9.1 which is added to revert back the workaround that was applied before restore :
1. Log into the Application GUI using the NOAM VIP as user “with admin privileges”.

2. Navigate to [Main Menu >Status & Manage >Database]. Disable Provisioning by clicking on “Disable Provisioning” button at the bottom left hand side of the GUI form

3. Click [Ok] on the pop-up window, the system will display an
Alarm(#10008) indicating that Provisioning has been manually disabled

4. Navigate to [Main Menu >Status & Manage >Database] , then select the NOAM with a “OAM Max HA Role” of “Active”

5. Select “Restore”

6. Select the backup file to be used in the restore

7. Select [Ok]

8. The ‘Database Restore Confirm’ screen will appear. If there are inconsistencies between the current state of the system and the information found in the backup file, a message will appear indicating ‘Incompatible database selected’. If this is the case, check the ‘Force’ checkbox, then select ‘OK’. If not, simply click ‘OK’ to start the restore process.

9. The system will begin restoring the database. After the restore is completed the user will be logged out of the NOAM GUI. Allow up to ten minutes for the restore to complete before the GUI returns to the login prompt.

9.1 Revert back the workaround. Start the processes that were stopped before restore. Execute following command on Standby NOAM server. This is the same server where the processes were stopped in step 1(d)

pm.set respawn inetsync inetrep inetmerge vipmgr cmha cmsoapa

Alarms raised in step 1 (e) should be cleared after starting the above processes.

10. On the NOAM GUI navigate to [Main Menu >Status & Manage >Database].

11. Enable Provisioning by clicking on “Enable Provisioning” button at the bottom left hand side of the GUI form

12. Click [Ok] on the pop-up window, Provisioning will now be enabled

13. Log into the Application GUI using the NOAM VIP as user “with admin privileges”

14. Allow replication on all Servers in this order:
          o Active NOAM Server
          o Standby NOAM Server
          o Active SOAM Server
          o Standby SOAM Server
          o Active MP Servers
          o Standby MP Servers

15. Navigate to [Main Menu: Status & Manage -> HA]. Click on the “Edit” button.

16. Select the Standby NOAM and Change the Max Allowed HA Role to “Active”.

17. Verify proper configuration is now present in GUI.

References

<BUG:27944675> - NOAM RESTORE NOT WORKING

Attachments

This solution has no attachment