Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-79-1021830.1
Update Date:2018-01-18
Keywords:

Solution Type  Predictive Self-Healing Sure

Solution  1021830.1 :   SCF-8005-YH - Fan rotation speed lower than its predefined threshold or fan stopped completely.  


Related Items
  • Sun SPARC Enterprise M5000 Server
  •  
  • Sun SPARC Enterprise M9000-64 Server
  •  
  • Sun SPARC Enterprise M3000 Server
  •  
  • Sun SPARC Enterprise M9000-32 Server
  •  
  • Sun SPARC Enterprise M4000 Server
  •  
  • Sun SPARC Enterprise M8000 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun PSH
  •  

PreviouslyPublishedAs
276359


Applies to:

Sun SPARC Enterprise M3000 Server
Sun SPARC Enterprise M4000 Server
Sun SPARC Enterprise M5000 Server
Sun SPARC Enterprise M8000 Server
Sun SPARC Enterprise M9000-32 Server
All Platforms

Purpose

Provide additional information for message ID: SCF-8005-YH

Details

Predictive Self-Healing Article
Fan rotation speed lower than its predefined threshold or fan stopped completely.

Type

Fault
  fault.chassis.device.fan.tooslow

Severity

Major

Description

The fan rotation speed for a fan is lower is than its predefined threshold or a fan has stopped completely.

Automated Response

If there are sufficient fans to maintain normal operation, fan speeds for fans on the rest of the platform are raised. If there are insufficient fans to maintain normal operation, domains are requested to shut down.

Impact

If there are insufficient fans to maintain normal operation, domains are sent shutdown requests. If the fan is in a PSU, the PSU is deconfigured.

Suggested Action for System Administrator

Schedule a repair action to replace the affected Field Replaceable Unit (FRU), the identity of which can be determined using fmdump -v -u EVENT_ID. Please consult the detail section of the knowledge article for additional information.

Details

The fan rotation speed for a fan is lower is than its predefined threshold or a fan has stopped completely.


   SPARC Enterprise M3000 platforms:

      If the failure is due to a failure of a FAN_A and if the other FAN_A in the cooling group is operational,
      then this other fan's speed is raised to full speed and the speed of all other fans on platform raised to high speed.

      If the failure is due to a failure of a fan in a PSU, and if the other PSU is operational,
      then the speed of the fan in the other PSU and the speed of the FAN_As is raised to high speed.

      If this is the second fan in a cooling group to become non-operational,
      then the SCF driver is used to send a shutdown request to all domains in the system.

      If the domains ignore the shutdown request, then the system will likely encounter overtemperature events.

      Nothing is deconfigured.


   SPARC Enterprise M4000/M5000 platforms:

      If the failure is due to a failure of a FAN_A, and if the other FAN_A in the cooling group is operational,
      then this other fan's speed is raised to full speed and the speed of all other fans on platform raised to high speed.
     
      If the failure is due to a failure of a fan in a PSU, and if all the other fans in the PSUs of the cooling group are operational,
      then the other fan of this PSU has its speed raised to full speed and the speed of all other fans on platform raised to high speed.

      If this is the second fan in a cooling group to become non-operational,
      then the SCF driver is used to send a shutdown request to all domains in the system.
    
      If the domains ignore the shutdown request, then the system will likely encounter overtemperature events.
     
M4000
   The FANBP_B contains the controllers for both sets of fans.
   The FANBP_A does not include any controllers and is a pass through board.

M5000
   The FANBP_C contains the controllers for both sets of fans.

  
IF there are 2 fans failing at the same time please reference "Multiple Fan Modules failing at the same time, on Sun SPARC Enterprise M4000 and Sun SPARC Enterprise M5000 servers (Doc ID 1567000.1)" for more information.
  
   SPARC Enterprise M8000:

      If the failure is due to a failure in a fan tray and if there are sufficient fan trays for normal operation in the cooling group,
      then raise the fan speed to high for all the fans on the entire platform.  Nothing is deconfigured.

      If the failure is due to a failure in a fan tray and if there are insufficient fan trays for normal operation in the cooling group,
      then the SCF driver is used to send a shutdown request to all domains in the system.

      If the failure is due to a faulty fan in a PSU, then the PSU is powered off and deconfigured.
      If there are insufficient operational PSU's to power the platform, then the platform is powered down.
 
      If the domains ignore the shutdown request, then the system will likely encounter overtemperature events.

      Otherwise, nothing is deconfigured.


   SPARC Enterprise M9000-32 platforms:

      If the failure is due to a failure in a fan tray and if there are sufficient fan trays for normal operation in the cooling group,
      then raise the fan speed to high for all the fans on the entire platform.  Nothing is deconfigured.

      If the failure is due to a failure in a fan tray and if there are insufficient fan trays for normal operation in the cooling group,
      then the SCF driver is used to send a shutdown request to all domains in the system.

      If the failure is due to a faulty fan in a PSU, then the PSU is powered off and deconfigured.
      If there are insufficient operational PSU's to power the platform, then the platform is powered down.
 
      If the domains ignore the shutdown request, then the system will likely encounter overtemperature events.

      Otherwise, nothing is deconfigured.


SPARC Enterprise M9000-64 platforms:

      If failure is due to a failure in a fan tray and if there are sufficient fan trays for normal operation in the cooling group,
      then raise the fan speed to high for all the fans in the entire cabinet.

      If the failure is due to a failure in a fan tray and if there are insufficient fan trays for normal operation in the cooling group,
      then the SCF driver is used to send a shutdown request to all domains in the system. 

      If the failure is due to a faulty fan in a PSU, then the PSU is powered off and deconfigured.
      If there are insufficient operational PSU's to power the platform, then the platform is powered down.
 
      If the domains ignore the shutdown request, then the system will likely encounter overtemperature events.

      Otherwise, nothing is deconfigured.


The recommended service action for this event is to schedule the replacement of the affected FRU.


Step 1. Collect the fault message (use one of the following methods):


   Single-line fault message displayed on the XSCF console:

   Mar 20 21:37:49 san-ff2-21-0 fmd: SOURCE: sde, REV: 1.12, CSN: 7860000772  
   EVENT-ID: 14de52b7-c016-445d-a041-6cd166b283c5
   Refer to http://www.sun.com/msg/SCF-8005-YH for detailed information.


   Complete fault message using 'fmdump -m' on the XSCF console:

   MSG-ID: SCF-8005-YH, TYPE: Fault, VER: 1, SEVERITY: Major
   EVENT-TIME: Tue Mar 20 21:37:49 UTC 2007
   PLATFORM: SPARC-Enterprise, CSN: 7860000772, HOSTNAME: san-ff2-21-0
   SOURCE: sde, REV: 1.12
   EVENT-ID: 14de52b7-c016-445d-a041-6cd166b283c5
   DESC: The fan rotation speed for a fan is lower is than its predefined threshold or a fan
   has stopped completely.
   Refer to http://www.sun.com/msg/SCF-8005-YH for more information.
   AUTO-RESPONSE: If there are sufficient fans to maintain normal operation, fan speeds for fans on the
   rest of the platform are raised. If there are insufficient fans to maintain normal
   operation, domains are requested to shut down.
   IMPACT: If there are insufficient fans to maintain normal operation, domains are sent shutdown requests.
   If the fan is in a PSU, the PSU is deconfigured.
   REC-ACTION: Schedule a repair action to replace the affected Field Replaceable Unit (FRU),
   the identity of which can be determined using fmdump -v -u EVENT_ID.
   Please consult the detail section of the knowledge article for additional information.


Step 2. Collect the output from the 'fmdump -v -u EVENT_ID' command

   
   SPARC Enterprise platform example:
 
   xscf> fmdump -v -u 14de52b7-c016-445d-a041-6cd166b283c5

   TIME                 UUID                                 MSG-ID
   Mar 20 21:37:49.0860 14de52b7-c016-445d-a041-6cd166b283c5 SCF-8005-YH
     100%  fault.chassis.device.fan.tooslow

           Problem in: hc:///chassis=0/psu=0/fan=0
              Affects: hc:///chassis=0/psu=0/fan=0
                  FRU: hc://:product-id=SPARC-Enterprise:chassis-id=7860000772:
                       server-id=san-ff2-21-0/component=/PSU#0



Step 3. Contact your Authorized Service Provider.



If you require additional information, please refer to Document 1002526.1.



Product
Other Server Models 1000-9999

Product_uuid
41751310-63d5-11d7-9179-89d0596b661d


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback