Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1910778.1
Update Date:2015-12-10
Keywords:

Solution Type  Problem Resolution Sure

Solution  1910778.1 :   NSP GUI Unreachable With "Failure of server APACHE bridge" Due to the Big Flow of Received Alarms  


Related Items
  • Oracle Communications Performance Intelligence Center (PIC) Software
  •  
Related Categories
  • PLA-Support>Sun Systems>CommsGBU>Global Signaling Solutions>SN-SND: Tekelec PIC
  •  




In this Document
Symptoms
Cause
Solution
 Alternate method using sql


Created from <SR 3-9248675168>

Applies to:

Oracle Communications Performance Intelligence Center (PIC) Software - Version 4.1 and later
Information in this document applies to any platform.

Symptoms

NSP GUI is not accessible anymore with the following error "Failure of server APACHE bridge". This is happening when all Weblogic instances become unreachable which can have many causes.

The status of the different instances can be checked through the Weblogic console.

Cause

NSP is receiving a big number of alarms from distant servers. This causes the overload of JMS queues and consequently Weblogic instances hang up.
Check the JMS queues of different Weblogic instances and verify that queues are overloaded. Especially focus on "Messages Pending" column. JMS queues load can be checked from the Weblogic console interface under the folowing menu:
Domain Structure -> tekelec -> Services -> Messaging -> JMS Servers -> NSPJMSServerxa -> Monitoring -> Active Destinations

Solution

Identify the object sending the biggest number of alarms in ProAlarm viewer, all alarms list.

Each alarms with high amount of occurences must be addressed.

Alternate method using sql

  1. Identify the object sending the biggest number of alarms. On NSP ORACLE server, as oracle user:
    1. Connect to the NSP database:
      # sqlplus login/password
    2. Order the number of raising alarms by Object ID:
      SQL> select count(*),MO_ID from COR_ALARM group by MO_ID order by count(*);
      The result should look like below:
      COUNT(*)      MO_ID
      ---------- ----------
            13244     331048
            14497      27909
            16642     376377
            44439     275531
    3. Identify the name of the impacted object (biggest counts at bottom of list):
      SQL> select NAME from COR_MANAGED_OBJECT where MO_ID=xxxx;
      xxxx being the MO_ID extracted from 1-b
  2. Identify the alarm causing overload. Connect to the impacted server already identified in the previous step (or master server of subsystem) and display JMX logs. In case the impacted Object is a link or linkset, connect to the master server of the xMF subsystem where the link/linkset is defined. As cfguser:
    $ cd $PROC
    $ cf.follow -20 jmx_agent.log
    In the following example, alarms are caused by a Network Interface Board Error:
    0701:093710.587 TR-V alarm 'Ethernet - Network Interface Board Error' activated (devName=eth31; moOid=.1.3.6.1.4.1.4404.20.0.181952565.3.1) [14571/MonMgr.C:549]
    0701:093712.090 TR-V alarm 'Ethernet - Network Interface Board Error' cleared (devName=eth34; moOid=.1.3.6.1.4.1.4404.20.0.181952565.3.4) [14571/MonMgr.C:559]
  3. Make needed actions to stop raising identified alarms. In the above example, Network Interface Board Error was caused by bad frames' MTU size.


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback