Asset ID: |
1-72-1910778.1 |
Update Date: | 2015-12-10 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1910778.1
:
NSP GUI Unreachable With "Failure of server APACHE bridge" Due to the Big Flow of Received Alarms
Related Items |
- Oracle Communications Performance Intelligence Center (PIC) Software
|
Related Categories |
- PLA-Support>Sun Systems>CommsGBU>Global Signaling Solutions>SN-SND: Tekelec PIC
|
In this Document
Created from <SR 3-9248675168>
Applies to:
Oracle Communications Performance Intelligence Center (PIC) Software - Version 4.1 and later
Information in this document applies to any platform.
Symptoms
NSP GUI is not accessible anymore with the following error "Failure of server APACHE bridge". This is happening when all Weblogic instances become unreachable which can have many causes.
The status of the different instances can be checked through the Weblogic console.
Cause
NSP is receiving a big number of alarms from distant servers. This causes the overload of JMS queues and consequently Weblogic instances hang up.
Check the JMS queues of different Weblogic instances and verify that queues are overloaded. Especially focus on "Messages Pending" column. JMS queues load can be checked from the Weblogic console interface under the folowing menu:
Domain Structure -> tekelec -> Services -> Messaging -> JMS Servers -> NSPJMSServerxa -> Monitoring -> Active Destinations
Solution
Identify the object sending the biggest number of alarms in ProAlarm viewer, all alarms list.
Each alarms with high amount of occurences must be addressed.
Alternate method using sql
- Identify the object sending the biggest number of alarms. On NSP ORACLE server, as oracle user:
- Connect to the NSP database:
# sqlplus login/password
- Order the number of raising alarms by Object ID:
SQL> select count(*),MO_ID from COR_ALARM group by MO_ID order by count(*);
The result should look like below:
COUNT(*) MO_ID
---------- ----------
13244 331048
14497 27909
16642 376377
44439 275531
- Identify the name of the impacted object (biggest counts at bottom of list):
SQL> select NAME from COR_MANAGED_OBJECT where MO_ID=xxxx;
xxxx being the MO_ID extracted from 1-b
- Identify the alarm causing overload. Connect to the impacted server already identified in the previous step (or master server of subsystem) and display JMX logs. In case the impacted Object is a link or linkset, connect to the master server of the xMF subsystem where the link/linkset is defined. As cfguser:
$ cd $PROC
$ cf.follow -20 jmx_agent.log
In the following example, alarms are caused by a Network Interface Board Error:
0701:093710.587 TR-V alarm 'Ethernet - Network Interface Board Error' activated (devName=eth31; moOid=.1.3.6.1.4.1.4404.20.0.181952565.3.1) [14571/MonMgr.C:549]
0701:093712.090 TR-V alarm 'Ethernet - Network Interface Board Error' cleared (devName=eth34; moOid=.1.3.6.1.4.1.4404.20.0.181952565.3.4) [14571/MonMgr.C:559]
- Make needed actions to stop raising identified alarms. In the above example, Network Interface Board Error was caused by bad frames' MTU size.
Attachments
This solution has no attachment