Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1009467.1
Update Date:2017-02-02
Keywords:

Solution Type  Problem Resolution Sure

Solution  1009467.1 :   How to clear faults in FMA after component replacement on Sun Fire[TM] servers.  


Related Items
  • Sun Fire V880z Visualization Server
  •  
  • Sun Fire V880 Server
  •  
  • Sun Fire V890 Server
  •  
  • Sun Fire 3800 Server
  •  
  • Sun Fire 6800 Server
  •  
  • Sun Fire V480 Server
  •  
  • Sun Fire 4810 Server
  •  
  • Sun Fire 280R Server
  •  
  • Sun Blade 1000 Workstation
  •  
  • Sun Fire E6900 Server
  •  
  • Sun Fire V1280 Server
  •  
  • Sun Fire 4800 Server
  •  
  • Sun Fire E2900 Server
  •  
  • Sun Blade 2000 Workstation
  •  
  • Sun Netra 1280 Server
  •  
  • Solaris Operating System
  •  
  • Sun Fire V490 Server
  •  
  • Sun Fire E4900 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: SF-x8x0/Ex900
  •  
  • _Old GCS Categories>Sun Microsystems>Operating Systems>Solaris Operating System
  •  
  • _Old GCS Categories>Sun Microsystems>Servers>Midrange Servers
  •  
  • _Old GCS Categories>Sun Microsystems>Servers>Entry-Level Servers
  •  
  • _Old GCS Categories>Sun Microsystems>Desktops>Workstations
  •  
  • _Old GCS Categories>Sun Microsystems>Servers>NEBS-Certified Servers
  •  

PreviouslyPublishedAs
213078


Applies to:

Sun Netra 1280 Server - Version Not Applicable and later
Solaris SPARC Operating System - Version 8.0 and later
Sun Fire E6900 Server - Version Not Applicable and later
Sun Fire V1280 Server - Version All Versions and later
Sun Fire V480 Server - Version Not Applicable and later
All Platforms

Symptoms

Solaris[TM] 10 FMD (Fault Management Daemon) reports a failure or suspect component, often referred to as a Field Replaceable Unit (FRU). The FRU is replaced but it may still reported as faulty or suspect in fmdadm output for Solaris 10 or the system still prints a self-healing message during boot.

Cause

There are three cases in which you have to clear the fault manually:

  1. The component has no fruid/serial number support (e.g. PCI cards)
  2. The fruid/serial number support of the given platform wasn't implemented into fma for this part (e.g Sun Fire 3800 - Sun Fire[TM] E25k and memory)
  3. A self-healing message is printed during boot even though the fmadm faulty list is empty (caused by CR 6369961 fmd emits identical diagnosis after repair when case was never closed).

Solution

Procedure:

As the root user on the domain in question, run the following commands:

  • fmadm faulty
    • This will display a list of components and their associated resource/uuid's that are categorized as faulty or degraded.
    • The resource/uuid is required in order to clear the fault tags.
  • fmadm repair
    • This will clear the suspect or fault tags associated with the resource/uuid's in the faulty list.

The following is an example of how to clear the fault tags on a Host Bus Adapter (HBA) in a Sun Fire[TM] 6800 that has been replaced but is still reporting in FMA as degraded:

 

  
# fmadm faulty
STATE RESOURCE / UUID
-------- ----------------------------------------------------------------------
degraded dev:////ssm@0,0/pci@19,700000
         47b86ff0-6743-ceff-ba0d-b452d09b0b65
-------- ----------------------------------------------------------------------
degraded dev:////ssm@0,0/pci@19,700000/lpfc@1
  47b86ff0-6743-ceff-ba0d-b452d09b0b65
-------- ----------------------------------------------------------------------

degraded mod:///mod-name=lpfc/mod-id=54         47b86ff0-6743-ceff-ba0d-b452d09b0b65
-------- ----------------------------------------------------------------------
degraded mod:///mod-name=pcisch/mod-id=25       47b86ff0-6743-ceff-ba0d-b452d09b0b65
-------- ----------------------------------------------------------------------
 

 

NOTE: Once you see the faulty components, run the fmadm repair command to clear the fault.

 

  
# fmadm repair dev:////ssm@0,0/pci@19,700000
 

NOTE: After you have run the repair command on each component that has been replaced, re-run the fmadm faulty command to ensure that the fault has been cleared.  If there are no faults, you will not see any output other than the column headings:

# fmadm faulty
STATE RESOURCE / UUID
-------- ----------------------------------------------------------------------
#

 

 



Product
Solaris 10 Operating System
Solaris 10 01/06 Operating System
Solaris 10 3/05 Hardware 1 Operating System



Internal Comments

For Internal Oracle users only

See Bug ID 6229087 for more information about the missing FMA implementation on Serengeti and Starcat regarding DIMMs (fixed in Nevada).
FMA, fmd, Sun Fire, Solaris 10, fault, management, architecture, fmadm faulty, replaced, component, still failed, failed, faulty, suspect, swapped, offline, disabled, missing

Previously Published As
82357

Also very useful:

FMA Cheat Sheet (Doc ID 1355350.1)

References

<NOTE:1355350.1> - FMA Cheat Sheet

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback