Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-2055090.1
Update Date:2017-10-11
Keywords:

Solution Type  Troubleshooting Sure

Solution  2055090.1 :   SPARC M7 Series Servers : A component may remain reported as faulty after replacement  


Related Items
  • SPARC M7-8
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: M7
  •  




In this Document
Purpose
Troubleshooting Steps
References


Applies to:

SPARC M7-8
Information in this document applies to any platform.

Purpose

Some M7 servers components such as a CMIOU have a tree of components with several levels; ie. several level of sub-components.

For instance, a DIMM : /SYS/CMIOU0/CM/CMP/BOB00/CH0/DIMM

After replacing the faulted sub-component, this sub-component may remain reported as faulty.

Note that this should not happen for the first level of sub-component.
    Ex : /SYS/CMIOU0/CM
This may for instance happen after removing CMIOU to replace a DIMM.

In such a case, if no new fault has been reported since the component has been replaced then it should be manually repaired from the Fault Management Shell.

This situation is fixed in SysFW 9.5.2.g.

 

Troubleshooting Steps

1. After replacing a component, check if any remaining fault exists for this component.
This can done via :

    -> show /System/Open_Problems
     or
    -> start -script /SP/faultmgmt/shell/
    faultmgmtsp> fmadm faulty -r

 Check that the fault_state for the component is set to “Faulted”.

Ex :

-> show /SYS/CMIOU0/CM/CMP/BOB00/CH0/DIMM fault_state

  /SYS/CMIOU0/CM/CMP/BOB00/CH0/DIMM
    Properties:
        fault_state = Faulted



2. Confirm that no new fault has been reported against the component since it’s been replaced. This can be done by checking the UUID and timestamp for the fault from the Fault Management Shell :

    -> start -script /SP/faultmgmt/shell/
    faultmgmtsp> fmadm faulty
    faultmgmtsp> fmdump


If a new fault has been reported since the suspect component has been replaced then it’s possible that the wrong component has been replaced and the diagnosis should be confirmed.


3. If no new fault has been reported then the fault can be repaired from Fault Management Shell :

    -> start -script /SP/faultmgmt/shell/
    faultmgmtsp> fmadm repair <FRU>


Where <FRU> is the name of the component that was replaced and reported in the ‘fmadm faulty’ output.

Ex : /SYS/CMIOU0/CM/CMP/BOB00/CH0/DIMM

Note : Alternatively, it’s possible to use the clear_fault_action property for the component to clear the fault

Ex :

-> set /SYS/CMIOU0/CM/CMP/BOB00/CH0/DIMM/ clear_fault_action=true



4. Confirm that the component is no longer reported as Faulted :

    -> show /System/Open_Problems
     or
    -> start -script /SP/faultmgmt/shell/
    faultmgmtsp> fmadm faulty -r
     or
    -> show <FRU> fault_state


fault_state should now be reported as “OK”. 

 

References

<BUG:19640979> - ILOM DOES NOT CLEAR THE FAULT AFTER THE FAULTY FRU IS REMOVED FROM SYSTEM

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback