Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1567000.1
Update Date:2017-10-05
Keywords:

Solution Type  Problem Resolution Sure

Solution  1567000.1 :   Multiple Fan Modules failing at the same time, on Sun SPARC Enterprise M4000 and Sun SPARC Enterprise M5000 servers  


Related Items
  • Sun SPARC Enterprise M4000 Server
  •  
  • Sun SPARC Enterprise M5000 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: Mx000
  •  




Applies to:

Sun SPARC Enterprise M5000 Server - Version All Versions and later
Sun SPARC Enterprise M4000 Server - Version All Versions and later
Information in this document applies to any platform.

Symptoms

If a Fan Module suffers from Fan rotation speed getting below its predefined threshold,  or if a Fan Module has its  fans stopped completely, FMA will report SCF-8005-YH - Fan rotation speed lower than its predefined threshold or fan stopped completely . If multiple Fan Modules report this error at about the same point in time, it is very unlikely that multiple Fan modules are failing at the same time. Further analysis is needed, to determine if it is the Fan modules that are actually failing, or if it is the Fan Module Controller that is failing.

Multiple Fan Modules failing at about the same point in time, can be recognized at various data locations at the XSCF.


XSCF> showstatus
    FANBP_C Status:Normal;
*       FAN_A#0 Status:Faulted;
*       FAN_A#2 Status:Faulted;
XSCF>

XSCF> fmdump -e
TIME                 CLASS
.
.
Jun 19 18:09:47.5747 ereport.chassis.device.fan.tooslow
Jun 19 18:10:46.5616 ereport.chassis.device.fan.tooslow
.
.
XSCF>

XSCF> fmdump -V
TIME                 UUID                                 MSG-ID
.
.
Jun 19 18:09:48.6441 5f362716-ad19-46cd-a6ff-24cae9c87a3c SCF-8005-YH

  TIME                 CLASS                                 ENA
  Jun 19 18:09:47.5747 ereport.chassis.device.fan.tooslow    0x59a612a199400001
.
        location = /FAN_A#2
.
.
.
TIME                 UUID                                 MSG-ID
Jun 19 18:10:48.3930 54fdba45-e8a5-496e-a975-9d50720ac2a1 SCF-8005-YH

  TIME                 CLASS                                 ENA
  Jun 19 18:10:46.5616 ereport.chassis.device.fan.tooslow    0x5a81d06c09200001
.
        location = /FAN_A#0.
.
XSCF>

XSCF> showlogs error
Date: Jun 19 18:09:48 CEST 2013    Code: 80002000-ccff0000-0104340600000000
    Status: Alarm                  Occurred: Jun 19 18:09:47.389 CEST 2013
    FRU: /FAN_A#2
    Msg: Abnormal FAN rotation speed. Insufficient rotation
Date: Jun 19 18:10:48 CEST 2013    Code: 80006000-ccff0000-0104340100000000
    Status: Alarm                  Occurred: Jun 19 18:10:46.550 CEST 2013
    FRU: /FAN_A#0
    Msg: Abnormal FAN rotation speed. Insufficient rotation
XSCF>

XSCF> showlogs monitor
.
.
Jun 19 18:09:53 m5000-sum505-p2 Alarm: /FAN_A#2:SCF:Abnormal FAN rotation speed. Insufficient rotation
Jun 19 18:10:50 m5000-sum505-p2 Alarm: /FAN_A#0:SCF:Abnormal FAN rotation speed. Insufficient rotation
.
.
XSCF>

 

 

Cause

The issue is most likely caused by a broken Fan Module Controller, it is not the individual Fan Modules that are failing.

Sun SPARC M4000

The FANBP_B contains the Fan Module Controllers for two defined sets of fans. One Fan Module Controller has control over FAN_A#0 and FAN_B#0, the other Fan Module Controller has control over FAN_A#1 and FAN_B#1. If you have one or both of these pairs of Fan Modules fail at about the same point in time, it is very likely you are looking at one or both Fan Module Controllers failing, and you will need to replace FANBP_B.

Sun SPARC M5000

The FANBP_C contains the Fan Module Controllers for two defined sets of fans. One Fan Module Controller has control over FAN_A#0 and FAN_A#2, the other Fan Module Controller has control over FAN_A#1 and FAN_A#3. If you have one or both of these pairs of Fan Modules fail at about the same point in time, it is very likely you are looking at one or both Fan Module Controllers failing, and you will need to replace FANBP_C.

 

Solution

To get the appropriate Fan Backplane replaced, customers need to create a Service Request in My Oracle Support . The replacement of the Fan Backplane is urgent and should be scheduled immediately, because if the 2nd Fan Module controller develops a problem, it will completely bring the platform down.
For the Oracle Support engineer to confirm the failing part, please collect a full snapshot (snapshot -L F) and attach the data file to the created Service Request.
Details on how to collect a snapshot can be found in Gathering diagnostic data for Sun SPARC Enterprise[TM] Mx000 (OPL) Servers .

The expected service action (replace FANBP_B on  Sun SPARC Enterprise M4000, replace FANBP_C on Sun SPARC Enterprise M5000 servers) requires a complete platform outage.

Be aware that due to the Fan Backplane failing, secondary errors can  remain in the BDB for the Fan Modules behind the broken Fan Module Controller. Affected Fan Modules will have their state changed from Faulted to Degraded. Make sure the Field Engineer runs clearfault against those Fan Modules, provide the service password if customer is running XCP1114 or earlier.

 

References

<NOTE:1021830.1> - SCF-8005-YH - Fan rotation speed lower than its predefined threshold or fan stopped completely.
<NOTE:1008229.1> - Gathering diagnostic data for SPARC Enterprise M3000/M4000/M5000/M8000/M9000 (OPL) Servers

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback