Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1575797.1
Update Date:2013-08-19
Keywords:

Solution Type  Technical Instruction Sure

Solution  1575797.1 :   Netra CT900 ShMM: Openhpid[270] ERROR: Session 1 Queue Is Out Of Space Number Of Events Is 2000; Max Is 2000  


Related Items
  • Sun Netra CT900 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Usx/Blade/Netra>SN-SPARC: Netra Cxxxx
  •  




In this Document
Goal
Solution
References


Created from <SR 3-7541623711>

Applies to:

Sun Netra CT900 Server - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Goal

 How to identify and solve this ShMM related issue.

Solution

SYMPTOMS:

There are two symptoms on this particular issue:

1. IPMB goes up and down; and triggers SNMP event that shows the "ShMM is down"

Within the syslog, one could see the following messages repeated very frequently (several tens to hundreds in a minute):

Jul 11 15:36:36 ATLNGAUSCM1-CCMP-SHMM2 daemon.err shelfman[238]: Fault detection failed, check Backplane/ShM IPMB-A, err=-145
Jul 11 15:36:36 ATLNGAUSCM1-CCMP-SHMM2 daemon.err shelfman[238]: Reattachment failed on IPMB 0, rc=-145
Jul 11 15:36:36 ATLNGAUSCM1-CCMP-SHMM2 daemon.err shelfman[238]: Reattachment failed on IPMB 0, rc=-145
Jul 11 15:36:36 ATLNGAUSCM1-CCMP-SHMM2 daemon.err shelfman[238]: Reattachment failed on IPMB 0, rc=-145
...
Jul 11 15:36:44 ATLNGAUSCM1-CCMP-SHMM2 daemon.info shelfman[238]: Reattachment succeeded on IPMB 0
Jul 11 15:36:44 ATLNGAUSCM1-CCMP-SHMM2 daemon.info shelfman[238]: Status change on IPMB 0: flags = 106 (ONL,ATT,ATT_CH)

(The first part shows that IPMB0 goes down [reattachment failed]; second part shows it back up [reattachment succeeded])

NOTE: Please group messages according to its process # --- [238] in this case.

 

2. The SNMP event queue seems to be out of space:

Jul 11 15:36:36 ATLNGAUSCM1-CCMP-SHMM2 daemon.err openhpid[270]: ERROR: (session.c, 278, Session 1's queue is out of space; # of events is 2000; Max is 2000)
Jul 11 15:36:36 ATLNGAUSCM1-CCMP-SHMM2 daemon.err openhpid[270]: ERROR: (session.c, 278, Session 1's queue is out of space; # of events is 2000; Max is 2000)
Jul 11 15:36:36 ATLNGAUSCM1-CCMP-SHMM2 daemon.err openhpid[270]: ERROR: (session.c, 278, Session 1's queue is out of space; # of events is 2000; Max is 2000)

 

SOLUTION:

The problem is at ShMM:

    IF IPMB 0 goes up and down, replace shm1 (Top ShMM)
    IF IPMB 1 goes up and down, replace shm2 (bottom ShMM)

Please make sure customer and FE knows the procedure of restoring ShMM configuration prior replacement.

 

SIMILAR PROBLEM:

There is a similar symptom, but very different from this.  Here are the differences:

1. It is less frequent, only up to 200+ per day (so symptom #2 above does not exist),

2. IPMB 0 and / or 1 goes up and down --- so it looks like active bus switching between IPMB 0 and IPMB 1.

 

The above symptom is due to the R3U3 upgrade of ShMM firmware.  Vendor makes it more verbose and these IPMB switching messages are now visible.

However, it is much less frequent --- only up to 200+ per DAY.

 

FOR THIS SYMPTOM (that is similar to the OpenHPI issue):

At R3U3, the vendor turn on a verbose feature; it is now normal seeing IPMB messages switching between IPMB 0 & 1.  The frequency of these type of messages were evulated at around 200+ per day is normal.

These are part or normal ShMM operation at and after R3U3.

References

<NOTE:1471349.1> - How to Replace ShMM (Shelf Management Module) card for Netra CT900 chassis:ATR:1471349.1:2
<NOTE:1346016.1> - Netra CT900 ShMM replacement procedure
<NOTE:1499248.1> - CT900 ShMM replacement & firmware upgrade procedure example

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback