Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1543162.1
Update Date:2018-05-30
Keywords:

Solution Type  Technical Instruction Sure

Solution  1543162.1 :   How to Identify and Diagnose Fan Failures on Sun SPARC Enterprise T5120/T5140/T5220/T5240/T5440 and Netra T5220/T5440  


Related Items
  • Sun SPARC Enterprise T5220 Server
  •  
  • Sun SPARC Enterprise T5240 Server
  •  
  • Sun Netra T5220 Server
  •  
  • Sun SPARC Enterprise T5140 Server
  •  
  • Sun SPARC Enterprise T5240 Server
  •  
  • Sun SPARC Enterprise T5220 Server
  •  
  • Sun Netra T5440 Server
  •  
  • Sun SPARC Enterprise T5120 Server
  •  
  • Exadata Database Machine V2
  •  
  • Sun SPARC Enterprise T5440 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T5xx0
  •  




In this Document
Goal
Solution
References


Applies to:

Sun Netra T5220 Server - Version All Versions and later
Sun Netra T5440 Server - Version All Versions and later
Exadata Database Machine V2 - Version All Versions and later
Sun SPARC Enterprise T5140 Server - Version All Versions and later
Sun SPARC Enterprise T5120 Server - Version All Versions and later
Information in this document applies to any platform.

Goal

The goal of this document is to provide guidance on How to Identify and Diagnose Fan Failures on Sun SPARC Enterprise T5120 / T5140 / T5220 / T5240 / T5440 and Netra T5220 / T5440 servers.

Fan failures are typically made known to user by reporting a /SYS/FAN_FAULT LED indicator being ON.

Solution

Determine which of the fan modules are faulty or failed, keeping in mind that there are usually multiple fan modules within a given system.

Using the machine's service processor to perform diagnostics and fan module identification, the following output will be seen from issuing ILOM or ALOM commands:
*Note: The Service Processors of T5XX family of servers are by default configured to operate under an ILOM shell, by default.You can create an ALOM compatibility shell if you prefer to use commands that resemble ALOM CMT commands to administer your server.

Please see the following article for more information on How to Configure the ILOM for ALOM Compatibility Shell on T5120/T5220/T5140/T5240/T5440/T6320/T6340/Netra T5220/Netra T5440/Netra T6340 (Doc ID 1543116.1)

Examples:

For ILOM commands:

-> show faulty
Target                    | Property               | Value
--------------------+------------------------+-------------------------------
/SP/faultmgmt/0     | fru                       | /SYS/FANBD0/FM1   <---
/SP/faultmgmt/0     | timestamp            | Dec 14 23:01:32
/SP/faultmgmt/0/    | timestamp            | Dec 14 23:01:32 faults/0
/SP/faultmgmt/0/    | sp_detected_fault   | TACH at /SYS/FANBD0/FM1/F0 has <---
faults/0                  |                             | exceeded low non-recoverable
                             |                             | threshold.

-> show /SYS/FANBD0/FM1

 /SYS/FANBD1/FM0
    Targets:
        PRSNT
        SERVICE
        F0
        F1

    Properties:
        type = Front Fan
        fault_state = Faulted    <---
        clear_fault_action = (none)

-> show /SYS/FANBD0/FM1/F0/TACH

 /SYS/FANBD0/FM1/F0/TACH
    Targets:

    Properties:
        type = Fan
        class = Threshold Sensor
        value = 0000.000 RPM    <---
        upper_nonrecov_threshold = N/A
        upper_critical_threshold = N/A
        upper_noncritical_threshold = N/A
        lower_noncritical_threshold = 5000.00 RPM
        lower_critical_threshold = N/A
        lower_nonrecov_threshold = 3000.00 RPM

 
For ALOM commands:

sc> showenvironment (ILOM equivalent is -> show –o table –level all /SYS)

~snip~

/SYS/LOCATE                    /SYS/SERVICE                   /SYS/ACT
OFF                                 ON   <---                          ON

/SYS/PS_FAULT                  /SYS/TEMP_FAULT                /SYS/FAN_FAULT
OFF                                 OFF                                    ON   <---

~snip~

Fan Status:
--------------------------------------------------------------------------------
Fans (Speeds Revolution Per Minute):
Sensor                                   Status       Speed     Warn      Low
--------------------------------------------------------------------------------
/SYS/FANBD0/FM0/F0/TACH        OK            6300     5000     3000
/SYS/FANBD0/FM0/F1/TACH        OK            6300     4000     2400 
/SYS/FANBD0/FM1/F1/TACH        OK            6300     5000     3000
/SYS/FANBD0/FM1/F0/TACH        FAILED            0     5000     3000   <---
/SYS/FANBD1/FM0/F1/TACH        OK            7300     5000     3000
/SYS/FANBD1/FM0/F0/TACH        OK            7000     5000     3000
/SYS/FANBD1/FM1/F1/TACH        OK            6800     5000     3000
/SYS/FANBD1/FM1/F0/TACH        OK            7100     5000     3000

 
From ALOM command:

sc> showfru
~snip~
Component     : /SYS/FANBD0
Time Stamp    : Thu, Dec 21 2000 12:17:06 GMT
New_Status    : 0x10 (PROXIED FAULT)
Old_Status    : 0x10 (PROXIED FAULT)
Initiator     : SCAPP
Component     : 50
Message       : TACH at /SYS/FANBD0/FM1/F0 has exceeded low non-recoverable threshold.  <---
~snip~

 

From ALOM command:

sc> showfaults -v (ILOM equivalent is -> show faulty)

Last POST Run: Mon Oct 23 13:57:23 2000

Post Status: Passed all devices
 ID Time                           FRU               Class             Fault
  1 Dec 21 12:16:34                /SYS/FANBD0/FM0                     SP detected fault: TACH at /SYS/FANBD0/FM1/F0 has exceeded low non-recoverable threshold.
 

 From ALOM command:

sc> showlogs -v (ILOM equivalent is -> show /SP/logs/event/list)

Log entries since Jul 30 22:17:27
----------------------------------
Jul 30 22:17:27: IPMI |critical: "ID = 57 : 07/30/2011 : 22:17:27 : Fan : /FB0/FM1/F0/TACH : Lower Non-recoverable going low : reading 0 <= threshold 2400 RPM"
Jul 30 22:27:28: IPMI |critical: "ID = 58 : 07/30/2011 : 22:17:28 : Fan : /FB0/FM1/F0/TACH : Lower Non-recoverable going low : reading 0 <= threshold 2400 RPM"

 

At the Operating System, indicators of the fan failure can also be reported within /var/adm/messages file(s) or can be queried by issuing the prtdiag -v command.

Example from /var/adm/messages

Sep 25 15:15:11 abcd0123 SC Alert: [ID 652077 daemon.notice] IPMI | minor: ID = 179 : 09/25/2012 : 13:49:15 : Fan : /FB0/FM1/F0/TACH : Lower Non-critical going low : reading 0 <= threshold 4000 RPM
Sep 25 15:15:23 abcd0123 SC Alert: [ID 586125 daemon.alert] IPMI | critical: ID = 17a : 09/25/2012 : 13:49:28 : Fan : /FB0/FM1/F0/TACH : Lower Non-recoverable going low : reading 0 <= threshold 2400 RPM
Sep 25 15:15:31 abcd0123 SC Alert: [ID 988124 daemon.alert] Fault | critical: SP detected fault at time Tue Sep 25 13:49:36 2012. TACH at /SYS/FANBD0/FM1/F0 has reached low non-recoverable threshold.

 
Example of prtdiag -v output

System Configuration:  Oracle Corporation  sun4v SPARC Enterprise T5120
Memory size: 16256 Megabytes
~snip~
============================ Environmental Status ============================
Fan sensors:
----------------------------------------------------------------
Location                           Sensor             Status    
----------------------------------------------------------------
SYS/FANBD0/FM1/F0                  TACH               failed (0rpm )   <---
SYS/FANBD0/FM1/F1                  TACH               ok
SYS/FANBD1/FM0/F0                  TACH               ok
SYS/FANBD1/FM0/F1                  TACH               ok
SYS/FANBD1/FM1/F0                  TACH               ok
SYS/FANBD1/FM1/F1                  TACH               ok
SYS/FANBD1/FM2/F0                  TACH               ok
SYS/FANBD1/FM2/F1                  TACH               ok
~snip~
LEDs:
----------------------------------------------------------------
Location                           LED                State   
----------------------------------------------------------------
SYS                                SERVICE             steady  <---
SYS                                LOCATE              off     
SYS                                ACT                   steady  
SYS                                PS_FAULT           off     
SYS                                TEMP_FAULT       off     
SYS                                FAN_FAULT         steady  <---

 

Conclusion to the example from this document, shown above:

From the sample commands' output, it would be concluded that system fan @ SYS/FANBD0/FM1/F0 has failed, thus the fan module* will need to be replaced.

References

<NOTE:1006084.1> - Advanced Lights Out Manager (ALOM) Commands
<NOTE:1543116.1> - How to Configure the ILOM for ALOM Compatibility Shell on T5120/T5220/T5140/T5240/T5440/T6320/T6340/Netra T5220/Netra T5440/Netra T6340
<NOTE:1443791.1> - Hot Insertion / Hot Removal Messages are Being Reported when Systems are in Normal Operation on T5xx0 Platforms
<NOTE:1155200.1> - PSH Procedural Article for ILOM-Based Diagnosis
<NOTE:1009715.1> - Integrated Lights Out Manager (ILOM) CLI Quick Reference
<NOTE:1307829.1> - How to Replace a T5120/T5220 and T5140/T5240 Fan Power Board [VCAP]

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback