![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||
Solution Type Technical Instruction Sure Solution 1571240.1 : Tx000/T5x20/T5x40: How To Clear recurring faults on Solaris FMA and System Controller after hardware replacement
In this Document
Applies to:Sun SPARC Enterprise T2000 Server - Version All Versions to All Versions [Release All Releases]Sun Netra T5440 Server - Version All Versions to All Versions [Release All Releases] Sun SPARC Enterprise T5140 Server - Version All Versions to All Versions [Release All Releases] Sun Fire T1000 Server - Version All Versions to All Versions [Release All Releases] Sun SPARC Enterprise T5120 Server - Version All Versions to All Versions [Release All Releases] Oracle Solaris on SPARC (32-bit) Oracle Solaris on SPARC (64-bit) GoalThis document explains what to do, if the same fault (different timestamps but the same FMA event ID) occurs after replacing a hardware component. If you simply wish to know how to clear faults within Solaris FMA or on the System Controller, please refer to: SolutionSymptoms As an example, let us assume you have encountered the following fault on the ALOM of your server: sc> showfaults -v ID Time FRU Fault The Host FMA shows: # fmadm faulty Host : somehostname Fault class : fault.memory.dimm-ue-imminent 95% Description : A pattern of correctable errors has been observed suggesting the Response : None at this time. Impact : None at this time. However, the potential uncorrectable error Action : Schedule a repair procedure to replace the DIMM. Use fmadm faulty
But the replaced DIMM gets faulted again: sc> showfaults -v ID Time FRU Fault
#fmadm faulty
#
Solution Steps
Please note that the UUIDs of the event before and the event after the DIMM replacement are the same: f258876c-0c10-edef-a7a5-d2283828fe09 This means that the original fault has been not correctly cleared either by the host FMA or by the System Controller.
# svcadm disable -s svc:/system/fmd:default
Important:
The procedure shown above clears the entire fault history. That history is necessary to analyze problems that might occur in the future. Please follow these steps only if you encounter the problem described in this document. There is no need to clear FMA caches this way during normal work with Solaris FMA. Instead, you should use the normal "#fmadm repair <uuid>" commands. See: PSH Procedural Article for Solaris FMA-Based Diagnosis
sc> enablecomponent MB/CMP0/CH0/R1/D1
sc> clearasrdb
sc> setsc sc_servicemode true
sc> clearereports -y sc> setsc sc_servicemode false
sc> resetsc
sc> setkeyswitch diag
sc> poweroff sc> poweron To poweron the server and to automatically start the host console, execute:
Note: If you're in ILOM shell, you may need to execute: -> set /SYS/<COMPONENT_PATH> clear_fault_action=true and, if there are any disabled components, -> set /SYS/<COMPONENT_PATH> component_state=enabled See "Section C.2 Using the ILOM Command Line Interface to Clear the Fault" in the "PSH Procedural Article for ILOM-Based Diagnosis (Doc ID 1155200.1)"
On the Host: # fmadm faulty
On the System Controller: sc> showfaults -v
References<NOTE:1004229.1> - How To Clear FMA faults from Solaris[TM] and SC (System Controller) on T1000/T2000 T5120/T5220/T5140/T5240/T5440, T3-1/T3-2/T3-4, T4-1/T4-2/T4-4<NOTE:1173733.1> - PSH Procedural Article for Solaris FMA-Based Diagnosis Attachments This solution has no attachment |
||||||||||||||||
|