Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-79-1483194.1
Update Date:2017-12-22
Keywords:

Solution Type  Predictive Self-Healing Sure

Solution  1483194.1 :   Commands to run to fully clear ILOM/SP, faultmgmt shell, and FMA faults on the T3-x, T4-x , T5-x Servers  


Related Items
  • Netra SPARC T3-1B
  •  
  • SPARC T3-1B
  •  
  • SPARC T3-1
  •  
  • SPARC T3-4
  •  
  • Netra T3-1BA
  •  
  • SPARC T4-2
  •  
  • Netra SPARC T4-2 Server
  •  
  • Netra SPARC T4-1 Server
  •  
  • Netra SPARC T4-1B
  •  
  • SPARC T3-2
  •  
  • SPARC T4-1
  •  
  • SPARC T4-1B
  •  
  • SPARC T4-1
  •  
  • SPARC T4-4
  •  
  • Netra T3-1
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T4
  •  


Quick Reference for CLI commands to run to fully clear ILOM/SP,  faultmgmt shell, and FMA faults on the T3-x and T4-x Servers

In this Document
Purpose
Scope
Details
 Goal
References


Applies to:

Netra SPARC T3-1B - Version All Versions and later
Netra T3-1BA - Version Not Applicable and later
SPARC T3-2 - Version All Versions and later
Netra SPARC T4-1 Server - Version All Versions and later
SPARC T4-2 - Version All Versions and later
Information in this document applies to any platform.

Purpose

Quick Reference for CLI commands to run to fully clear ILOM/SP,  faultmgmt shell, and FMA faults on T3-x and T4-x Servers (refer to doc 2216293.1 for T5-x, T7-x and S7-x servers)

Scope

- To clear component faults reported in Oracle ILOM, the Admin (a) role must be enabled

- The server SP must have Oracle ILOM firmware 3.0.3 or later installed

Details

Goal

To clear ILOM/SP, faultmgmt shell, and FMA faults on T3-x and T4-x Servers
 
1.  To view and clear faults in Oracle ILOM/SP:

          a)  Log into the Oracle ILOM/SP

          b)  View faulted components

                       -> show faulty

          c)  For each fault listed, clear the fault on the reported component_path

                      -> set /SYS/<COMPONENT_PATH> clear_fault_action=true

          Example:

                      -> set /SYS/MB clear_fault_action=true


                      Are you sure you want to clear component_path (y/n)? y
                      Set ’clear_fault_action’ to ’true’

            You must reset the SP after clearing the fault:

                                   ->reset /SP

          d)  For disabled components, enable the component by setting the component_state to Enabled.      

                      -> show -l all /SYS component_state==disabled  (to see what components are disabled)

                      -> set /SYS/<COMPONENT_PATH> component_state=enabled  (for each disabled component)

          e)  If components have been re-enabled in step 1d, a powercycle of the server will be required (after completing all of the below steps) for the component_state change to take affect.

                      -> reset /SYS

CAUTION: A 'reset /SYS' does not ensure a graceful shutdown of the system (Solaris).  Please be sure to shutdown Solaris gracefully before performing a 'reset /SYS', in order to avoid further issues, such as data corruption.              

                   

NOTE:  In ILOM 3.2.1 and above, the component_state property has been split into two new properties, to distinguish between components manually disabled by the user and components disabled by the system fault engine:

     current_config_state
     requested_config_state

For systems with components disabled by the system (e.g. Configuration rules, DIMM population chip symmetry rule), these should be re-enabled by the below 'fmadm repair' (or 'fmadm acquit') command.  If components remain in a disabled state, requiring manual intervention, please see Doc ID 1643464.1 - OBP reports "One or more resources have been retired, please run 'show faulty' on the SP" on console

 


2.  To view and clear the Fault Management Shell (FMA faults logged to the SP):

          a)  While still logged into the Oracle ILOM/SP, start the Fault Management Shell
       
                      -> start /SP/faultmgmt/shell

          b)  View faulted components
        
                      faultmgmtsp> fmadm faulty

          c)  For each fault listed, run the repair command

                      faultmgmtsp> fmadm repair <UUID>

          d)  Log out of the shell

                      faultmgmtsp> exit


3.  To view and clear the FMA faults logged from Solaris:

          a)  Log into Solaris

          b)  View faulted components

                      # fmadm faulty                     

NOTE: Do not use 'fmadm faulty -a' in this step. When you specify the -a option all resource information cached by the Fault Manager is listed including faults which have already been corrected or where no recovery action is needed (see 'fmadm' man page). The listings also include information for resources that may no longer be present in the system.


          c)  For each fault listed, run the repair command


                      # fmadm repair <uuid>

Please Note:

The fmadm subcommand repair is used to initiate an FMA check for a serial number change, such as when a server component is replaced.   As of Solaris 10 U7, the subcommand repaired has been introduced.
             
The fmadm subcommand repaired is used to notify FMA that a corrective action has been performed, such as a patch install or a component reseat.  This subcommand should only be used if directed by a documented Sun repair procedure, as additional steps may be needed to re-enable a faulted resource.

Using these fmadm subcommands from Solaris will clear FMA generated faults from both Solaris and the SP, however, using these subcommands from the SP will not clear the faults from Solaris.


For additional information on fmadm subcommands, please see: http://docs.oracle.com/cd/E19860-01/E21549/z400015e1396982.html#scrolltoc

 

 

NOTE: There is an issue (CR 6983432), which causes previously diagnosed and repaired PSH faults from the host to reappear (to be replayed) in Oracle ILOM when the host reboots. It manifests itself as an incorrect report of a PSH-diagnosed fault represented through the Oracle ILOM CLI, BUI, and fault LED. You can identify this defect by checking to see if the same PSH fault was reported from the host as well.

The affected OS is Solaris 10 u10( 8/11) on all T3-x and T4-x platforms. The issue is fixed in patch 147790-01: SunOS 5.10: fmd patch. See the workaround below:

 

 # fmdump TIME UUID SUNW-MSG-ID
Sep 16 08:38:19.5582 af875d87-433e-6bf7-cb53-c3d665e8cd09 SUN4V-8002-6E
Sep 16 08:40:47.8191 af875d87-433e-6bf7-cb53-c3d665e8cd09 FMD-8000-4M Repaired
Sep 16 08:40:47.8446 af875d87-433e-6bf7-cb53-c3d665e8cd09 FMD-8000-6U Resolved
#
# fmadm flush /SYS/MB
fmadm: flushed resource history for /SYS/MB
#
faultmgmtsp> fmadm repair /SYS/MB
faultmgmtsp> fmadm faulty
No faults found
faultmgmtsp>

 

References

<NOTE:2216293.1> - Commands To Clear FMA faults on the T5-x, T7-x, S7-x Servers
<NOTE:1643464.1> - [SPARC T3/T4/T5 and T7] OBP reports "One or more resources have been retired, please run 'show faulty' on the SP" on console

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback