Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1643464.1
Update Date:2017-10-05
Keywords:

Solution Type  Problem Resolution Sure

Solution  1643464.1 :   [SPARC T3/T4/T5 and T7] OBP reports "One or more resources have been retired, please run 'show faulty' on the SP" on console  


Related Items
  • SPARC T4-1
  •  
  • SPARC T7-2
  •  
  • SPARC T5-4
  •  
  • SPARC T5-2
  •  
  • SPARC T4-4
  •  
  • SPARC T7-4
  •  
  • SPARC T4-2
  •  
  • SPARC T7-1
  •  
  • SPARC T5-8
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T5
  •  




In this Document
Symptoms
Cause
Solution
References


Applies to:

SPARC T5-2 - Version All Versions to All Versions [Release All Releases]
SPARC T5-4 - Version All Versions to All Versions [Release All Releases]
SPARC T5-8 - Version All Versions to All Versions [Release All Releases]
SPARC T4-1 - Version All Versions to All Versions [Release All Releases]
SPARC T4-2 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

When a the SPARC T3,T4 or T5 system is powered on ( start /SYS ) and the following WARNING 
message is logged in the console of the system. [LDOM service or guest domain]. It indicates 
that one or more system components have been disabled or degraded. 

Console Message
----------------
WARNING: One or more resources have been retired, please run 'show faulty' on the SP.

Example 1 [ console message ]
 
SPARC T5-8, No Keyboard
Copyright (c) 1998, 2014, Oracle and/or its affiliates. All rights reserved.
OpenBoot 4.35.5.a, 1.9987 TB memory available, Serial #103641986.
Ethernet address 0:10:e0:2d:73:82, Host ID: 862d7382.



WARNING: One or more resources have been retired, please run 'show faulty' on the SP.

Boot device: disk  File and args: 
SunOS Release 5.11 Version 11.1 64-bit
Copyright (c) 1983, 2012, Oracle and/or its affiliates. All rights reserved.
/
 
 

Cause

System components may be "degraded" by the ILOM (FDD) or Solaris (FMA) fault engine. A 
system component may also be "disabled" by a user. Once the component has been degraded 
or disabled it will no longer be visible in OBP and Solaris.

To search for disabled or degraded component from ILOM the following ILOM CLI command may be used

show -l all /SYS current_config_state==(disabled,degraded)

Multiple components could be manually disabled from the ILOM CLI, the ILOM CLI command
"show components" will list all the components that could disabled or degraded on the 
platform. 

The following example indicates that the PCI slot component was manually disabled by the operator.

Example 2 [ list user disabled component ]
-> show -l all /SYS current_config_state==(disabled,degraded)

 /SYS/RCSA/PCIE9
    Targets:
        CAR

    Properties:
        type = Slot
        requested_config_state = Disabled
        current_config_state = Disabled
        disable_reason = By user

    Commands:
        cd
        show

-> 
The following example indicates that the PCI slot component was degraded by the system Fault Engine.

Example 3 [ list degraded component ]

-> show -l all /SYS current_config_state==(disabled,degraded)

 /SYS/RCSA/PCIE9
    Targets:
        CAR

    Properties:
        type = Slot
        requested_config_state = Enabled
        current_config_state = Disabled
        disable_reason = Diagnosed faulty

    Commands:
        cd
        show

-> 

   
     

Solution

For components that have been manually disabled, Manually re-enabling a 
component from ILOM CLI will require a system restart

STEP 1.  set <component label> requested_config_state=enabled
STEP 2.  stop /SYS
STEP 3.  start /SYS
Example 3 [ re-enabling a component ]
-> show -l all /SYS current_config_state==(disabled,degraded)

 /SYS/RCSA/PCIE9
    Targets:
        CAR

    Properties:
        type = Slot
        requested_config_state = Disabled
        current_config_state = Disabled
        disable_reason = By user

    Commands:
        cd
        show

->

-> set /SYS/RCSA/PCIE9 requested_config_state=enabled
Set 'requested_config_state' to 'enabled'

-> show -d properties /SYS/RCSA/PCIE9
  /SYS/RCSA/PCIE9
    Properties:
        type = Slot
        requested_config_state = Enabled
        current_config_state = Disabled
        disable_reason = Configuration Rules


-> 
-> stop -f /SYS
Are you sure you want to immediately stop /SYS (y/n)? y
Stopping /SYS immediately

-> show -d properties /SYS/RCSA/PCIE9
  /SYS/RCSA/PCIE9
    Properties:
        type = Slot
        requested_config_state = Enabled
        current_config_state = Enabled
        disable_reason = None


-> start /SYS
Are you sure you want to start /SYS (y/n)? y
Starting /SYS

-> 
 
For components that have been degraded by the system fault engine, the suspected faulty component
can be determined by running "show faulty" or starting the ILOM fault management shell [/SP/faultmgmt/shell].
Once the suspected faulty components have been replaced or have been verified to be not faulty the following
procedure could be carried out.

STEP 1. Shutdown Platform

stop /SYS

STEP 2. Enter system fault management shell

start /SP/faultmgmt/shell

STEP 3. List faulty components reported by the system

fmadm faulty

STEP 4. acquit or repair the faulty events using the uuid

fmadm acquit <uuid>

STEP 5. verify that there are no degraded components in 
ILOM fault management shell

fmadm faulty -r 

STEP 6. exit faultmanagement shell 

exit

STEP 7. re-verify that there are no degraded components in ILOM

show faulty

STEP 8. Start the platform

start /SYS


Example 4 [ The following example was carried out after replacing /SYS/MB/PCIE6 ]
SPARC T5-2, No Keyboard
Copyright (c) 1998, 2013, Oracle and/or its affiliates. All rights reserved.
OpenBoot 4.35.4, 255.0000 GB memory available, Serial #104142XXX.
Ethernet address 0:10:e0:35:XX:XX, Host ID: 8635XXXX.

WARNING: One or more resources have been retired, please run 'show faulty' on the SP. 

Boot device: disk  File and args:



-> stop /System
Are you sure you want to stop /System (y/n)? y
Stopping /System

-> start /SP/faultmgmt/shell
Are you sure you want to start /SP/faultmgmt/shell (y/n)? y

faultmgmtsp>  fmadm faulty
------------------- ------------------------------------ -------------- --------
Time                UUID                                 msgid          Severity
------------------- ------------------------------------ -------------- --------
2014-02-28/00:29:15 56d8bb58-0b42-426b-dcb8-f318462c438c PCIEX-8000-0A  Critical

Problem Status    : solved
Diag Engine       : [unknown]
System
  Manufacturer   : Oracle Corporation
  Name           : SPARC T5-2
  Part_Number    : 31845050+1+1
  Serial_Number  : AK00107XXX

----------------------------------------
Suspect 1 of 1
  Fault class  : fault.io.pciex.device-interr
  Certainty    : 100%
  Affects      : /SYS/MB/PCIE8
  Status       : faulted

  FRU
     Status            : not present
     Location          : /SYS/MB/PCIE8
     Chassis
        Manufacturer   : Oracle Corporation
        Name           : SPARC T5-2
        Part_Number    : 31845050+1+1
        Serial_Number  : AK00107XXX

Description : A fault has been diagnosed by the Host Operating System.

Response    : The service required LED on the chassis and on the affected
             FRU may be illuminated.

Impact      : No SP impact.

Action      : Refer to the associated reference document at
             http://support.oracle.com/msg/PCIEX-8000-0A for the latest
             service procedures and policies regarding this diagnosis. 


------------------- ------------------------------------ -------------- --------
Time                UUID                                 msgid          Severity
------------------- ------------------------------------ -------------- --------
2014-02-28/00:29:22 90374df4-2819-6e8d-cac7-982b2a90e8ed PCIEX-8000-0A  Critical

Problem Status    : solved
Diag Engine       : [unknown]
System
  Manufacturer   : Oracle Corporation
  Name           : SPARC T5-2
  Part_Number    : 31845050+1+1
  Serial_Number  : AK00107XXX

----------------------------------------
Suspect 1 of 1
  Fault class  : fault.io.pciex.device-interr
  Certainty    : 100%
  Affects      : /SYS/MB/PCIE6
  Status       : faulted

  FRU
     Status            : not present
     Location          : /SYS/MB/PCIE6
     Chassis
        Manufacturer   : Oracle Corporation
        Name           : SPARC T5-2
        Part_Number    : 31845050+1+1
        Serial_Number  : AK00107XXX

Description : A fault has been diagnosed by the Host Operating System.

Response    : The service required LED on the chassis and on the affected
             FRU may be illuminated.

Impact      : No SP impact.

Action      : Refer to the associated reference document at
             http://support.oracle.com/msg/PCIEX-8000-0A for the latest
             service procedures and policies regarding this diagnosis.

faultmgmtsp>
faultmgmtsp>  fmadm repair /SYS/MB
faultmgmtsp>  fmadm acquit /SYS/MB
faultmgmtsp> fmadm acquit 90374df4-2819-6e8d-cac7-982b2a90e8ed
faultmgmtsp> fmadm acquit 56d8bb58-0b42-426b-dcb8-f318462c438c
faultmgmtsp> fmadm faulty -r
No faults found
faultmgmtsp> fmadm rotate errlog
faultmgmtsp>  fmadm rotate fltlog
faultmgmtsp> exit 


>
-> show faulty
Target                                       | Property                                            | Value                                                  
---------------------------------------------+-----------------------------------------------------+---------------------------------------------------------------------------- 

-> start /SYS

  

There are situations where a single DIMM fault would disable other dimms due to minimum dimm confguration requirements, in the following example  the console will report the following messages indicating that the bank could not be configured due to configuration rules.

 

2014-05-04 17:13:56  2:0:0> NOTICE:  SPARC-T5 Revision 1.2  Speed     3600MHz
2014-05-04 17:15:05  0:0:0> NOTICE:  Initializing Memory
2014-05-04 17:16:30  2:0:0> ERROR:       /SYS/PM1/CM0/CMP/BOB4/CH1/D0: DIMM is not populated in order on the     BOB. Not configured
2014-05-04 17:16:31  2:0:0> ERROR:       /SYS/PM1/CM0/CMP/BOB0/CH0/D0: DIMM population chip symmetry rule     violation. Not configured
2014-05-04 17:16:32  2:0:0> ERROR:       /SYS/PM1/CM0/CMP/BOB0/CH1/D0: DIMM population chip symmetry rule     violation. Not configured
2014-05-04 17:16:32  2:0:0> ERROR:       /SYS/PM1/CM0/CMP/BOB2/CH0/D0: DIMM population chip symmetry rule     violation. Not configured
2014-05-04 17:16:33  2:0:0> ERROR:       /SYS/PM1/CM0/CMP/BOB2/CH1/D0: DIMM population chip symmetry rule     violation. Not configured
2014-05-04 17:16:34  2:0:0> ERROR:       /SYS/PM1/CM0/CMP/BOB6/CH0/D0: DIMM population chip symmetry rule     violation. Not configured
2014-05-04 17:16:35  2:0:0> ERROR:       /SYS/PM1/CM0/CMP/BOB6/CH1/D0: DIMM population chip symmetry rule     violation. Not configured
2014-05-04 17:17:13  0:0:0> NOTICE:  Initializing MCU 0 Memory     Link 0
2014-05-04 17:17:30  0:0:0> NOTICE:  Initializing MCU 0 Memory     Link 1

 

If "fmadm repair" or "fmadm acquit" command does not re-enable the other DIMMs disabled due to "Symmetry Rule" , manually clear each DIMM with the following command

 set <COMPONENT PATH> clear_fault_action=true
 set <COMPONENT PATH> requested_config_state=enabled

 


reference to this behavior maybe verfied from the following

 /nyx-1.3.x/src/hostconfig/common/src/gmd_config.c

References

<NOTE:1614738.1> - [SPARC T4/T5/M5 and M6] FMA I/O retirement : PCI devices can be seen from OBP but disappear when System Boots up into Solaris

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback