Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-2377203.1
Update Date:2018-04-03
Keywords:

Solution Type  Troubleshooting Sure

Solution  2377203.1 :   Troubleshooting SPT-8002-QD error reported on S7-2 SPARC server  


Related Items
  • SPARC S7-2L
  •  
  • SPARC S7-2
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: S7
  •  




In this Document
Purpose
Troubleshooting Steps
References


Applies to:

SPARC S7-2
SPARC S7-2L
Information in this document applies to any platform.

Purpose

This document provides a guidance in troubleshooting SPT-8002-QD  - alert.ilom.chassis.config.fan.capacity-insufficient with probability=100.

Impact:

The chassis will power off immediately and subsequent power on will be inhibited.

 

Symptoms

 

1. fma/@usr@local@bin@fmadm_faulty.out from ILOM snapshot

2017-07-31/07:56:33 3aa28b88-8848-4f03-b7ad-abcdabcd SPT-8000-3R Major
/SYS/MB/FM0/F3

Fan tachometer speed is below its normal operating range.

2017-07-31/07:56:33 dfeb259e-0e76-6bc0-9147-abcdabcd SPT-8000-3R Major
/SYS/MB/FM2/F0

Fan tachometer speed is below its normal operating range.

2017-07-31/07:56:35 f65047a3-8083-4220-9490-abcdabcd SPT-8002-QD Critical
/SYS Part#: 35043973+1+1 SPARC Ser#: XXXXXX

Insufficient cooling capacity due to multiple faulted or missing fans.

 

1. The output of show faulty command from ILOM prompt:

-> show faulty
Target | Property | Value
-------------------+-----------------------+-----------------------------------
/SP/faultmgmt/0/ | class | alert.ilom.chassis.config.fan.capacity-insufficient
/SP/faultmgmt/0/ | sunw-msg-id | SPT-8002-QD
/SP/faultmgmt/0/ | component | /SYS
/SP/faultmgmt/0/ | uuid | 2
/SP/faultmgmt/0/ | timestamp | 2018-03-01/19:47:52
/SP/faultmgmt/0/ | detector | /SYS

/SP/faultmgmt/1 | fru | /SYS/MB/FM0
/SP/faultmgmt/1/ | class | fault.chassis.device.fan.fail
/SP/faultmgmt/1/ | sunw-msg-id | SPT-8000-3R
/SP/faultmgmt/1/ | component | /SYS/MB/FM0/F1|
/SP/faultmgmt/1/ | detector | /SYS/MB/FM0/F1|

/SP/faultmgmt/2 | fru | /SYS/MB/FM2
/SP/faultmgmt/2/ | class | fault.chassis.device.fan.fail
/SP/faultmgmt/2/ | sunw-msg-id | SPT-8000-3R
/SP/faultmgmt/2/ | component | /SYS/MB/FM2/F3
/SP/faultmgmt/2/ | detector | /SYS/MB/FM2/F3

Or display the current system faults from the Fault Management Shell :

-> start /SP/faultmgmt/shell
faultmgmtsp> fmadm faulty

 

2.Check the history of all events logged in the event log

ILOM snapshot ilom/@usr@local@bin@spshexec_show_-script_@X@logs@event@list.out

-> show /SP/logs/event/list

13 Mon Jul 31 07:56:35 2017 Fault Fault critical Fault detected at time = Mon Jul 31 07:56:35 2017. The suspect component: /SYS has alert.ilom.chassis.config.fan.capacity-insufficientwith probability=100. Refer to http://support.oracle.com/msg/SPT-8002-QD for details.

12 Mon Jul 31 07:56:35 2017 Power Off major Power to /SYS has been turned off by: SP, Reason: Fault

11 Mon Jul 31 07:56:33 2017 Fault Fault critical Fault detected at time = Mon Jul 31 07:56:33 2017. The suspect component: /SYS/MB/FM2 has fault.chassis.device.fan.fail with probability=100. Refer to http://support.oracle.com/msg/SPT-8000-3R for details.

10 Mon Jul 31 07:56:33 2017 Fault Fault critical Fault detected at time = Mon Jul 31 07:56:33 2017. The suspect component: /SYS/MB/FM0 has fault.chassis.device.fan.fail with
probability=100. Refer to http://support.oracle.com/msg/SPT-8000-3R for details.

 

3.The output of fmdump -v from the Fault Management Shell

fma/@usr@local@bin@fmdump_-v.out from ILOM snapshot

-> start /SP/faultmgmt/shell
faultmgmtsp> fmdump -v

2018-03-01/19:47:37 d043fb9c-22ca-ee37-83df-abcdabcd SPT-8000-3R Fan Speed Below Normal Range
FRU = /SYS/MB/FM0
2018-03-01/19:47:37 0e191c86-903e-6462-9256-abcdabcd SPT-8000-3R Fan Speed Below Normal Range
FRU = /SYS/MB/FM2
2018-03-01/19:47:52 2ff61538-80d3-61e7-f301-abcdabcd SPT-8002-QD
fault = alert.ilom.chassis.config.fan.capacity-insufficient@/SYS
certainty = 100.0 %
FRU = /SYS
ASRU = /SYS
resource = /SYS

 

4. The output of fmdump -ev from the Fault Management Shell

fma/@usr@local@bin@fmdump_-ev.out from ILOM snapshot

faultmgmtsp> fmdump -ev

2018-02-14/11:58:27 ereport.chassis.config.fan.toofew-asserted@/SYS
2018-02-14/11:59:27 ereport.chassis.config.fan.toofew-deasserted@/SYS

 

Troubleshooting Steps

1. Please gather an ILOM snapshot

SRDC - SPARC T3-x, T4-x, T5-x, T7-x, S7-x, T8-x servers: Simple instructions to collect ILOM snapshot (Doc ID 2077387.1)

2. Displays the environmental status of the host server. 

-> show -o table -level all /SYS


Target              | Property                    | Value
---------------------------------------------------------------------
/SYS/MB/FM0         | type                        | Front Fan
/SYS/MB/FM0         | fault_state                 | Faulted
/SYS/MB/FM0         | clear_fault_action          | (none)
/SYS/MB/FM0/F0      | type                        | Fan
/SYS/MB/FM0/F0/TACH | type                        | Fan
/SYS/MB/FM0/F0/TACH | ipmi_name                   | FM0/F0/TACH
/SYS/MB/FM0/F0/TACH | class                       | Threshold Sensor
/SYS/MB/FM0/F0/TACH | value                       | 11500.000 RPM
/SYS/MB/FM0/F0/TACH | upper_nonrecov_threshold    | N/A
/SYS/MB/FM0/F0/TACH | upper_critical_threshold    | N/A
/SYS/MB/FM0/F0/TACH | upper_noncritical_threshold | N/A
/SYS/MB/FM0/F0/TACH | lower_noncritical_threshold | N/A
/SYS/MB/FM0/F0/TACH | lower_critical_threshold    | N/A
/SYS/MB/FM0/F0/TACH | lower_nonrecov_threshold    | 1000.000 RPM
/SYS/MB/FM0/F0/TACH | alarm_status                | cleared
/SYS/MB/FM0/F1      | type                        | Fan
/SYS/MB/FM0/F1/TACH | type                        | Fan
/SYS/MB/FM0/F1/TACH | ipmi_name                   | FM0/F1/TACH
/SYS/MB/FM0/F1/TACH | class                       | Threshold Sensor
/SYS/MB/FM0/F1/TACH | value                       | 0 RPM
/SYS/MB/FM0/F1/TACH | upper_nonrecov_threshold    | N/A
/SYS/MB/FM0/F1/TACH | upper_critical_threshold    | N/A
/SYS/MB/FM0/F1/TACH | upper_noncritical_threshold | N/A
/SYS/MB/FM0/F1/TACH | lower_noncritical_threshold | N/A
/SYS/MB/FM0/F1/TACH | lower_critical_threshold    | N/A
/SYS/MB/FM0/F1/TACH | lower_nonrecov_threshold    | 1000.000 RPM
/SYS/MB/FM0/F1/TACH | alarm_status                | cleared
/SYS/MB/FM0/F2      | type                        | Fan
/SYS/MB/FM0/F2/TACH | type                        | Fan
/SYS/MB/FM0/F2/TACH | ipmi_name                   | FM0/F2/TACH
/SYS/MB/FM0/F2/TACH | class                       | Threshold Sensor
/SYS/MB/FM0/F2/TACH | value                       | 11400.000 RPM
/SYS/MB/FM0/F2/TACH | upper_nonrecov_threshold    | N/A
/SYS/MB/FM0/F2/TACH | upper_critical_threshold    | N/A
/SYS/MB/FM0/F2/TACH | upper_noncritical_threshold | N/A
/SYS/MB/FM0/F2/TACH | lower_noncritical_threshold | N/A
/SYS/MB/FM0/F2/TACH | lower_critical_threshold    | N/A
/SYS/MB/FM0/F2/TACH | lower_nonrecov_threshold    | 1000.000 RPM
/SYS/MB/FM0/F2/TACH | alarm_status                | cleared
/SYS/MB/FM0/F3      | type                        | Fan
/SYS/MB/FM0/F3/TACH | type                        | Fan
/SYS/MB/FM0/F3/TACH | ipmi_name                   | FM0/F3/TACH
/SYS/MB/FM0/F3/TACH | class                       | Threshold Sensor
/SYS/MB/FM0/F3/TACH | value                       | 9900.000 RPM
/SYS/MB/FM0/F3/TACH | upper_nonrecov_threshold    | N/A
/SYS/MB/FM0/F3/TACH | upper_critical_threshold    | N/A
/SYS/MB/FM0/F3/TACH | upper_noncritical_threshold | N/A
/SYS/MB/FM0/F3/TACH | lower_noncritical_threshold | N/A
/SYS/MB/FM0/F3/TACH | lower_critical_threshold    | N/A
/SYS/MB/FM0/F3/TACH | lower_nonrecov_threshold    | 1000.000 RPM
/SYS/MB/FM0/F3/TACH | alarm_status                | cleared
/SYS/MB/FM0/SERVICE | type                        | Indicator
/SYS/MB/FM0/SERVICE | ipmi_name                   | FM0/SERVICE
/SYS/MB/FM0/SERVICE | value                       | Off

 

2. Please check the output from IPMI:

/ipmi/@usr@local@bin@ipmiint_sensor_list.out from ILOM snapshot

FM0/F0/TACH | 12100.000 | RPM | ok | 1000.000 | na | na | na | na | na
FM0/F1/TACH | 9100.000  | RPM | ok | 1000.000 | na | na | na | na | na
FM0/F2/TACH | 12300.000 | RPM | ok | 1000.000 | na | na | na | na | na
FM0/F3/TACH | 9000.000  | RPM | ok | 1000.000 | na | na | na | na | na
FM0/PRSNT | 0x2 | discrete | 0x0200| na | na | na | na | na | na
FM1/F0/TACH | 11900.000 | RPM | ok | 1000.000 | na | na | na | na | na
FM1/F1/TACH | 9300.000  | RPM | ok | 1000.000 | na | na | na | na | na
FM1/F2/TACH | 12300.000 | RPM | ok | 1000.000 | na | na | na | na | na
FM1/F3/TACH | 9300.000  | RPM | ok | 1000.000 | na | na | na | na | na
FM1/PRSNT | 0x2 | discrete | 0x0200| na | na | na | na | na | na
FM2/F0/TACH | 12500.000 | RPM | ok | 1000.000 | na | na | na | na | na
FM2/F1/TACH | 9300.000  | RPM | ok | 1000.000 | na | na | na | na | na
FM2/F2/TACH | 12400.000 | RPM | ok | 1000.000 | na | na | na | na | na
FM2/F3/TACH | 9400.000  | RPM | ok | 1000.000 | na | na | na | na | na
FM2/PRSNT | 0x2 | discrete | 0x0200| na | na | na | na | na | na
FM3/F0/TACH | 12300.000 | RPM | ok | 1000.000 | na | na | na | na | na
FM3/F1/TACH | 9200.000  | RPM | ok | 1000.000 | na | na | na | na | na
FM3/F2/TACH | 12300.000 | RPM | ok | 1000.000 | na | na | na | na | na
FM3/F3/TACH | 9100.000  | RPM | ok | 1000.000 | na | na | na | na | na
FM3/PRSNT | 0x2 | discrete | 0x0200| na | na | na | na | na | na

 

3. Please check and replace the fans that are reported as faulty and that doesn't have any RPM value.

4. If the values of rotation per minute (RPM) are ok in -> show -o table -level all /SYS , please proceed with the following actions:

A. Clear the errors from FMA Solaris and from the service processor (ILOM prompt):

Commands To Clear FMA faults on the T5-x, T7-x, S7-x Servers (Doc ID 2216293.1)

B. Reset the service processor

-> reset /SP

C. If issues persist please open a Service Request with Oracle Support.

5. If all the fans are reported as working ok  but the error SPT-8002-QD is still reported - on rare cases you will need to replace  the Left Indicator Assembly (FRU)

The above component can also cause unexpected power down of the server (but no-one physically pressed the power button)

 

 

References

<NOTE:2077387.1> - SRDC - SPARC T3-x, T4-x, T5-x, T7-x, S7-x, T8-x servers: Simple instructions to collect ILOM snapshot
<NOTE:2216293.1> - Commands To Clear FMA faults on the T5-x, T7-x, S7-x Servers
<NOTE:2130436.1> - SPT-8002-QD - there are insufficient operational fans present
<NOTE:1120673.1> - SPT-8000-3R - Fan Speed Below Normal Range

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback