Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1001307.1
Update Date:2016-12-08
Keywords:

Solution Type  FAB (standard) Sure

Solution  1001307.1 :   Power Supply Fan failures can occur without notification in Sun Fire 3800, 4800, 4810, and 6800 Systems.  


Related Items
  • Sun Fire 4800 Server
  •  
  • Sun Fire 4810 Server
  •  
  • Sun Fire 6800 Server
  •  
  • Sun Fire 3800 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun FAB
  •  

PreviouslyPublishedAs
201768


__________
***Checked for relevance on 04-Nov-2014***

Product
Sun Fire 3800 Server
Sun Fire 4800 Server
Sun Fire 4810 Server
Sun Fire 6800 Server

SUNBUG:15322255

Part
  • Part No: 300-1529
  • Part Description: AC-48VDC PS - A145E
Part
  • Part No: 300-1460
  • Part Description: AC-48VDC PS - A153
Part
  • Part No: 300-1459
  • Part Description: AC-48VDC PS - A152
Part
  • Part No: 300-1441
  • Part Description: AC-48VDC PS - A145
Xoption
  • Xoption Number: X4303A
  • Xoption Description: A145, AC-48VDC PS
Xoption
  • Xoption Number: X4301A
  • Xoption Description: A153, AC-48VDC PS
Xoption
  • Xoption Number: X4302A
  • Xoption Description: A152, AC-48VDC PS

Impact

When a fan fails on a power supply (PSU) with firmware prior to 5.20.2, the power supply will not normally increase in temperature enough to reach the threshold for a warning message to be issued. Thus there is no indication that a fan on a power supply has failed.

When fans fail on additional power supplies, the temperatures of the affected power supplies may rise enough to trigger the warning messages, but the appearance of these messages may be only a matter of minutes before the platform shuts down because of the rise in temperature.

As a result affected platforms will shutdown with very little warning.


Contributing Factors

Use the "showboards" command from the SC (as shown in the example below) and reference the column labeled "Component Type" to see if the platform has any of the power supply models listed in the parts affected section of this FAB.

sc0:SC> showboards
Slot     Pwr Component Type                 State      Status     Domain
----     --- --------------                 -----      ------     ------
...
PS0      On  A152 Power Supply              -          OK         -
PS1      On  A152 Power Supply              -          OK         -
PS2      On  A152 Power Supply              -          OK         -

 


Symptoms

The error messages that appear once there are multiple fans failed and potentially minutes before the platform shutdown will look similar to the following:

Feb  6 19:32:31 sc0 Platform.SC: WARNING: PS2 temperature is approaching max limit of 78C
Feb  6 19:32:32 sc0 Platform.SC: PS2 48 VDC 0 Temp. 0 value: 68 Degrees C
Feb  6 19:32:32 sc0 Platform.SC: Check for abnormal environmental operating conditions.
Feb  6 19:32:32 sc0 Platform.SC: PS2, sensor status, outside acceptable limits (7,1,0x605020b00030000)

To determine if power supply fan failures have contributed to or caused the shut down of a platform, it is necessary to visually inspect the power supply fans to see if any have either stopped spinning or are spinning at a significantly reduced speed.


Root Cause

Certain power supplies do not have the feature that unaffected power supplies have which results in the power supply being shutdown when the fan on the power supply fails.

An early warning messages is now provided in firmware 5.20.2 and later. This fix is also expected to be back ported to 5.19.x, but currently is not included in 5.19.6.  

It has been determined through extensive testing that the best indicator of a failed power supply fan is when one power supply in the platform reports a temperature that is at least 10 degrees C above the others. When this scenario occurs with the new firmware installed the following warning messages will be issued:

Feb  6 19:32:31 sc0 Platform.SC: WARNING: PSx temperature is elevated indicating it may have a failed cooling fan.
Feb  6 19:32:32 sc0 Platform.SC: PSx 48 VDC 0 Temp. 0 value: xx Degrees C
Feb  6 19:32:32 sc0 Platform.SC: Contact Sun Support Services to check for PSU fan failure.

 


Resolution

For platforms that contain power supplies affected by this issue:

First, visually inspect fans in the power supplies and replace any that have either stopped spinning or are spinning at a noticeably reduced speed.  If visual inspection is not possible or you are unsure whether a fan is spinning properly, the following observations can be made:

Hold a piece of paper in front of the vent to determine air flow.

  • 3800 Normal PSU fan - Blows air out
  • 3800 PSU fan failure - Sucks air in or no movement of air
  • 4800/6800 Normal PSU fan - Sucks air in
  • 4800/6800 PSU fan fail - No air movement

Second, upgrade to 5.20.2 firmware (available in patch 114527-03 or later) so that a failed power supply fan will produce warning messages before the Power Supply reaches the over temperature threshold.

Keep in mind the new firmware does not have data to positively prove the power supply fan has failed when it prints the warning. It has recognized that the there is a situation for which a failed power supply fan is by far the most likely scenario. Visual verification of the slowly spinning or stopped fan(s) is still required to determine root cause.

Third, the new firmware (5.20.2 and later) will detect failing fans and the PSU's should be replaced when detected and/or verified visually. There is no need to replace PSU's that are functioning properly and that are not identified by the firmware.


Related Information
  • Other: SRDB 83819
Modification History:

04-Nov-2014: Maintenance check for relevance/currency, no change in content


Previously Published As: 102577
Contributor/submitter: Roy.Stiles@sun.com
Internal Eng Business Unit Group: SSG ES (Enterprise Systems)
Internal Eng Responsible Engineer: Darrell.May@sun.com
Internal Services Knowledge Engineer: Sean.Hassall@sun.com
Responsible Manager: David.Re@sun.com
Internal Resolution Patches: 114527-03
Internal Kasp FAB Legacy ID: 102577
Internal Sun Alert & FAB Admin Info
Critical Category:
Significant Change Date: 2006-09-05
Avoidance: Firmware
Original Admin Info: [WF 05-Sep-2006, Sean Hassall: Patch is now available - sending to Joe for approval]
WF 23-Aug-2006, Sean Hassall: sending to extended review]
WF 22-Aug-2006, Sean Hassall: made some minor grammar changes
Product_uuid
29d05214-0a18-11d6-92b2-a111614865b5|Sun Fire 3800 Server
29d3a694-0a18-11d6-92da-df959df44cdd|Sun Fire 4800 Server
29d6f808-0a18-11d6-8aa8-943929fbbdd8|Sun Fire 4810 Server
29da7938-0a18-11d6-8a41-9ed1ad6d6779|Sun Fire 6800 Server

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback