Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2221642.1
Update Date:2017-01-10
Keywords:

Solution Type  Problem Resolution Sure

Solution  2221642.1 :   Diameter Signaling Router (DSR) -Observed 31002 Critical ApwSoapSever Alarm Continuously Clearing and Reappearing  


Related Items
  • Oracle Communications Diameter Signaling Router (DSR)
  •  
Related Categories
  • PLA-Support>Sun Systems>CommsGBU>Global Signaling Solutions>SN-SND: Tekelec DSR
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-13591115361>

Applies to:

Oracle Communications Diameter Signaling Router (DSR) - Version DSR 5.0 to DSR 7.3.0 [Release DSR 5.0 to DSR 7.0]
Information in this document applies to any platform.
<

Symptoms

The following critical alarm sets and clears continuously:

Main Menu: Alarms & Events -> View History [Report]

SEQ_NUM: 8850
EVENT_NUMBER: 31002
SEVERITY: CRITICAL
PROCESS: procmgr
TYPE: SW
INSTANCE: apwSoapServer
NAME: Process Watchdog Failure
DESCRIPTION: Process watchdog timed out
ERR_INFO:
GN_EXPIRED/WRN Watchdog expired [procmgr.cxx:857]
^^ threshold:30 seconds, last updated: 11/03/2016 05:11:37.000 [procmgr.cxx:858]
^^ Sending TERM signal (attempting to get abterm) [procmgr.cxx:889]
^^ [12890:procmgr.cxx:890]SECS: 1478121157
USECS: 433000
CISECS: 1478121157
CIUSECS: 656000
ID: 0

Changes

This issue can appear when there is a restart of the DSR process such as during a reboot.

Cause

On systems with more than 200 routes apwSoapServer Process can fail. The failure is due to the timer killing the process before it can finish with more than 200 routes. So at startup of the DSR software the AW audit cannot finish resulting in the COMCOL watchdog to kill the apwSoapServer software and restart it. This continues in a loop.

Solution

On every server in the topology starting with NOs, SOs and MPs (including all C level servers i.e., SBRs anbd IPFE if available) the maxtime setting for the route audit must be added in the /usr/TKLC/appworks/etc/apwSoapServer.cfg file. Here are the steps below to be executed:

Steps below assume the user is logged in as admusr. If user is root then "sudo" can be removed before the commands.
  1. Before making changes to /usr/TKLC/appworks/etc/apwSoapServer.cfg check it out of RCS:
    # sudo rcstool co /usr/TKLC/appworks/etc/apwSoapServer.cfg
  2. Edit /usr/TKLC/appworks/etc/apwSoapServer.cfg file as follow
    Below these lines in the file (i.e., around line 240 or search for "route_where"):
    ; The route audit will ensure that the route config in the database is properly deployed on the target every 30*2 = 60 s
    route_freq = 30
    route_where = ALL
  3. Add these lines:
    ; Allow route processing to take up to 5 minutes –needed for large route counts
    route_maxtime = 300
  4. When edit is complete check the file back into RCS:
    # sudo rcstool ci /usr/TKLC/appworks/etc/apwSoapServer.cfg "updating route_maxtime"
  5. Restart the apwSoapServer by executing command:
    # sudo pm.kill apwSoapServer
 This issue is associated with Bug 25070587 and is fixed in DSR 7.3.1

References

<BUG:25070587> - EVENTID 31002 PROCESS WATCHDOG FAILURE PROCESS WATCHDOG TIMED OUT AFTER UPGRADE

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback