Diameter Signaling Router (DSR) : How To Correct Network Time Protocol (NTP) Alarms on DSR Servers

Asset ID:	1-72-2361375.1
Update Date:	2018-02-21
Keywords:

Solution Type Problem Resolution Sure

Solution 2361375.1 : Diameter Signaling Router (DSR) : How To Correct Network Time Protocol (NTP) Alarms on DSR Servers

Applies to:

Oracle Communications Diameter Signaling Router (DSR) - Version DSR 7.0.1 and later
Information in this document applies to any platform.

Symptoms

If the Network Time Protocol (NTP) times drift significantly among DSR servers, a number of alarms can arise up to and including application database alarms. These may include:

Platform/SW CRITICAL 31235 Untrusted Time After Initialization
TPD/PLAT CRITICAL 32117 NTP Offset Check Failure
TPD/PLAT MAJOR 32342 NTP Offset Check Failure
TPD/PLAT MINOR 32509 Server NTP Daemon Not Synchronized
TPD/PLAT MINOR 32520 NTP Stratum Check Failure
Platform/SW MINOR 31100 DB Replication Fault

Among other possible alarms.

Another symptom is the very high offset and jitter observed when querying the NTP daemon at the command line. Example:

[admusr@dsrServer ~]$ ntpq -p
   remote      refid       st t when poll reach delay offset jitter
==============================================================================
10.20.30.33 172.16.188.201 10 u 793  1024 377  0.647 -308767 398.460

Changes

Changes triggering this condition can include a network disruption or a critical server in the NTP topology path becoming unavailable for some duration. It can also occur due to manual changes made to NTP settings along servers in the NTP data path.

Cause

The cause of these alarms are a combination of NTP changes as discussed above and often poor NTP topology structure.

If good NTP topology design is not followed, servers may be more likely to suffer from NTP drift.

Solution

Sudden database timestamp jump may temporarily affect the system, hence the given server must be taken out of service. NTP Sync action from the NOAM GUI will not be available if the server is active.

Step 1: Set server to OOS
Warning! This will drop Diameter connections to the server. It should normally be done during a maintenance window.
     1) Log into the NOAM GUI VIP;
     2) Go to Main Menu / Status & Manage / HA
     3) Select 'Edit'
     4) Locate the server in the list.
     5) Use the pulldown to set the 'Max Allowed HA Role' to OOS, and select [OK].
Alarms will arise due to this and may not abate until after the activity concludes.

Step 2: Force NTP daemon to re-sync:
1) Log into the server's shell as admusr using ssh client (e.g PuTTy or SecureCRT).
2) Execute the following commands:

[admusr@dsrServer ~]$ ntpq -pn
[admusr@dsrServer ~]$ sudo service ntpd status
[admusr@dsrServer ~]$ sudo service ntpd stop
[admusr@dsrServer ~]$ sudo ntpdate <IP from 'remote' column of ntpq command> --> do the ntpdate command 3 times, until offset is/or is close to zero
[admusr@dsrServer ~]$ sudo service ntpd start
[admusr@dsrServer ~]$ sudo service ntpd status
[admusr@dsrServer ~]$ ntpq -pn

3) Logout

Step 3: Force DSR Application to sync NTP:
[Do this if the server has Event ID 31235 CRITICAL alarm (Untrusted Time After Initialization)]
     1) NOAM GUI VIP: Go to Main Menu / Status & Manage / Server
     2) Select (highlight) the Server
     3) Select [NTP Sync] button

Step 4: Return server to Active
     1) NOAM GUI VIP: Go to Main Menu / Status & Manage / HA
     2) Select 'Edit'
     3) Locate the server in the list.
     4) Use the pulldown to set the 'Max Allowed HA Role' to Active, and select [OK].
Alarms should abate for this server.

Repeat these steps for other servers, one at a time, until all are cleared.

Note1: If preferred, a reboot may suffice to achieve the same ends as above. Verify via 'ntpq -pn' command on the server afterward if choose this method.
Note2: NTP topology design is out of scope for this article, but sound design will reduce or prevent this occurrence. Refer to the DSR Installation Guide (NTP section) and MOS Knowledge article 2016591.1 for information about NTP topology design recommendations.

References

<NOTE:2016591.1> - Recommended Changes for NTP Configuration According to Release 5.0 Standard Solution
<NOTE:2242964.1> - Diameter Signaling Router (DSR): Event_Number - 31100 - DB Replication Fault
<NOTE:1683873.1> - How to Change Timezone on Diameter Signaling Router (DSR)?
<NOTE:2125246.1> - Diameter Signaling Router (DSR) - How to Edit the NTP Configuration on a Given Server Running DSR 7.x Release?

Attachments

This solution has no attachment