Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2393667.1
Update Date:2018-05-02
Keywords:

Solution Type  Problem Resolution Sure

Solution  2393667.1 :   DSR IPFE Sync State Error After  


Related Items
  • BNS Platform Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>CommsGBU>Global Signaling Solutions>SN-SND: Tekelec DSR
  •  




In this Document
Symptoms
Changes
Cause
Solution
 Sequence of events:
 Troubleshooting:
 Conclusion:
References


Created from <SR 3-17256724491>

Applies to:

BNS Platform Hardware - Version DSR 7.3.0 and later
Tekelec

Symptoms

IPFE-1 is reporting following issues :-

• IPFE-2 is showing storage failure and is not processing any traffic in KPI reports.
• 5003, sync state error alarm is observed in system for IPFE-1. IPFE-1 is active IPFE. No command working on IPFE-2. IPFE-2 is completely down
• path.test between IPFE-1 > IPFE-2 is successful.
• path.test between IPFE-2 > IPFE-1 failed with input output error.
• Customer cant see server in status & manage -- > Server

syscheck is reporting Platform healthcheck failure.( Couldn't read , query sensors).

Changes

 

Cause

This is a hardware issue on the IPFE blade causing the problem with application
 
Syscheck proves the hardware failure on blade
 

Solution

Sequence of events:

sycheck shows following errors on IPFE-2:

Syscheck shows Hard disk failure: (exact time can be seen in the /var/TKLC/log/syscheck/fail_log of the problematic IPFE)
* hpdisk: FAILURE:: MAJOR::3000000200000000 -- The hpacucliStatus utility needs intervention.
* hpdisk: FAILURE:: Failure message: The HP disk status is stale, and server has been up longer than 600
* smart: FAILURE:: MINOR::5000000000040000 -- Platform Health Check Failure
* smart: FAILURE:: Error: Cannot open lock file: /var/TKLC/log/smartd/lock.
One or more module in class "disk" FAILED

2018-04-10 07:55:14.221 UTC: 5003 IPFE state sync run error

2018-04-10 07:55:16.175 UTC: 5012 Signaling interface heartbeat timeout

2018-04-10 08:10:29.349 UTC: 31201 Process Not Running (idbsvc)

2018-04-10 08:10:30.348 UTC: 31201 Process Not Running (IPFE)

Troubleshooting:

We see procmgr consistently reporting alarm 31201 (i.e. Process Not Running) for processes ipfe and idbsvc on IPFE-2 (due to the disk error as reported by the syscheck output).
Since IPFE process on IPFE-2 is not running (or unable to run as indicated by above events), alarms 5003 (i.e. IPFE state sync run error) and 5012 (i.e. Signaling interface heartbeat timeout) are being raised by its mate (i.e. IPFE-1).
This shows that the Server hardware error has caused the failure on IPFE-2. The IPFE-1 has sensed the issue and taken over the role of Active IPFE for the TSAs (if any) that were being managed by IPFE-2.

Conclusion:

The hardware error occurs due to a frimware issue with HP Gen8 Blade Disk Array Controllers.

According to HP, Any HPE Smart Arry Px2x controller using firmware version 8.0 can cause the blade to hang up in an unknown state. The solution is to upgrade the Array controller Firmware to 8.32 or higher.

This is explained in detail in FUP 2.2.12 Release Notes >>Appendix A.1 >>Advisory a00029265 (Page 24)

https://docs.oracle.com/cd/E91277_01/docs.75/E88975-01.pdf

 

References

<NOTE:1902526.2> - Syscheck Errors

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback