Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1984183.1
Update Date:2016-06-24
Keywords:

Solution Type  Problem Resolution Sure

Solution  1984183.1 :   IO PRIM_FAN and SEC_FAN Tray Failure on Sun Fire V890/V880 Server  


Related Items
  • Sun Fire V890 Server
  •  
  • Sun Fire V880 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Workgroup Servers>SN-SPARC: SF-Vx90
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-10215273171>

Applies to:

Sun Fire V890 Server - Version All Versions to All Versions [Release All Releases]
Sun Fire V880 Server - Version All Versions to All Versions [Release All Releases]
Oracle Solaris on SPARC (64-bit)
FAN FAILURE BEHAVIOR IN SUN FIRE V890 SERVER
=========================================

Only the primary fan trays are running during normal system operation. If a primary fan tray fails, the environmental monitoring subsystem detects the failure and automatically activates the secondary fan tray.

If any primary fan fails, the monitoring subsystem detects the failure and performs the following:

■Generates an error message and logs it in the /var/adm/messages file
■Lights the System Fault and Thermal Fault LEDs on the status and control panel
■Lights the appropriate Fan Fault LED inside the system
■Automatically activates the appropriate secondary fan tray (if enabled )

Symptoms


Issue reported as follows

During a walk-through it was noted that the ATTENTION LIGHT ON and HIGH TEMP ALARM were on.


Error reported in message file

Feb 3 02:41:14 omzitrs02 picld[389]: [ID 300385 daemon.error] Secondary fan failure, device IO1_SEC_FAN
Feb 3 05:33:56 omzitrs02 picld[389]: [ID 300385 daemon.error] Secondary fan failure, device IO1_SEC_FAN
Feb 3 05:39:24 omzitrs02 picld[389]: [ID 300385 daemon.error] Secondary fan failure, device IO1_SEC_FAN
Feb 3 05:41:13 omzitrs02 picld[389]: [ID 300385 daemon.error] Secondary fan failure, device IO1_SEC_FAN

 

 

 

 

On examination of prtdiag inconsistent information was displayed.

 

 

<Truncated prtdiag output>
-----------------------------------------

System LED Status:
                  GEN FAULT                REMOVE
                   [ ON]                    [OFF]

                  DISK FAULT               POWER FAULT
                   [OFF]                    [OFF]

                  LEFT THERMAL FAULT       RIGHT THERMAL FAULT
                   [ ON]                    [OFF]

                  LEFT DOOR                RIGHT DOOR
                   [OFF]                    [OFF]

Fan Bank :
----------

Bank                                   Speed         Status             Fan State
                                        ( RPMS )
----                                  --------         ---------          ---------
CPU0_PRIM_FAN                2654        [ENABLED]          OK
CPU1_PRIM_FAN                2727        [ENABLED]          OK
CPU0_SEC_FAN                         0        [DISABLED]         OK
CPU1_SEC_FAN                         0        [DISABLED]         OK
IO0_PRIM_FAN                          0        [DISABLED]         ERROR
IO1_PRIM_FAN                          0        [DISABLED]         ERROR      <------- IO fan's reporting error
IO0_SEC_FAN                     4000        [ENABLED]          ERROR
IO1_SEC_FAN                            0        [ENABLED]          ERROR       <-------IO1_SEC_FAN not rotating
IO_BRIDGE_PRIM_FAN      3488        [ENABLED]          OK
IO_BRIDGE_SEC_FAN               0        [DISABLED]         OK

Primary and Secondary are marked as in Error.  The Secondary IO0_SEC_FAN is working at 4000rpm's while the other IO1_SEC_FAN is not rotating .

RSC logs unavailable.

Cause

The IO secondary fan reported as failed in message file.

On these systems, the Secondary Fan Tray's do not run unless the Primary Fan Tray has failed or is not present.

 

In this case :

------------------

Primary and Secondary were marked as in Error, the Secondary IO0_SEC_FAN was working at 4000rpm's while the other IO1_SEC_FAN was not rotating.

To clear up the possible picld issue causing the prtdiag inconsistency a picld restart was suggested.  After picld restart,  IO1_PRIM_FAN speed status was showing 0 RPMS  and the secondary FAN was not started, this is an indication of FAN failure in both PRIM_FAN and SEC_FAN.

Primary suspect Noisy I2C bus was causing picld to report false environmental FRU failures, or a suspected picld daemon issue, because it marked all the IO Fans as failed even when the secondary was working .

Secondary suspect was IO FAN hardware failure.

Solution

Action plan :
===========

Step : 1

Eliminate know issue : Restart the picld and check the prtdiag .


# svcadm disable -t picl
# svcadm enable picl
# prtdiag -v

For Solaris 8 or 9, use the following command as root:

# /etc/init.d/picld stop
# svcadm disable -t picl

After restarting picld ,the prtdiag reported following :

 

<Truncated prtdiag output>
--------------------------------

 

Fan Bank :
----------

Bank                                         Speed             Status        Fan State
                                              ( RPMS )
----                                        --------          ---------         ---------
CPU0_PRIM_FAN                      3333        [ENABLED]       OK
CPU1_PRIM_FAN                      3370        [ENABLED]       OK
CPU0_SEC_FAN                               0        [DISABLED]      OK
CPU1_SEC_FAN                               0        [DISABLED]      OK
IO0_PRIM_FAN                         3896        [ENABLED]       OK
IO1_PRIM_FAN                                0        [ENABLED]       OK            <----------------------- Noticed IO1_PRIM_FAN is not rotating
IO0_SEC_FAN                                  0        [DISABLED]      OK       
IO1_SEC_FAN                                  0        [DISABLED]      OK            <----------------------- IO_SEC_FAN has not started and the fan's are not rotating
IO_BRIDGE_PRIM_FAN           3488        [ENABLED]       OK
IO_BRIDGE_SEC_FAN                    0        [DISABLED]      OK

 

IO1_PRIM_FAN  speed status was showing 0 RPMS  and the secondary was not started, this was an indication of Both Primary and Secondary FAN failure

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

A service activity was created for an Field Engineer with 2 FAN Tray FRU

Part location: Primary_IO_FT_SLT3
Quantity: 1
Description: PCI I/O Fan Tray (FRU)

Part location: Secondary_IO_FT_SLT4
Quantity: 1
Description: PCI I/O Fan Tray (FRU)

 

Before any hardware replacement, the Field Engineer was instructed to physically check and verify the server.

If the rsc is configured check the hardware status using :

rsc> environment

From ok prompt FE can check the hardware status using :

ok>.env

FE to check Lights the appropriate Fan Fault LED inside the system

Confirm the FAN failure and proceed further with hardware replacement if needed
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------

In this instance FE identified fault light was on and that there was no fan rotation.  FE replaced both the Fans and the issue was resolved .

 

References

<NOTE:1506312.1> - Troubleshooting FAN Failures on Sun Fire 280R/V480/V490/V880/V890 Servers
<NOTE:1000325.1> - FAB: Standard: Reactive: Noisy I2C bus causes picld to report false environmental FRU failures.

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback