Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1006233.1
Update Date:2017-09-05
Keywords:

Solution Type  Problem Resolution Sure

Solution  1006233.1 :   Sun Fire[TM] 12K/15K/20K/25K: Interpreting System Management Services(SMS) Failed Power Supply Messages  


Related Items
  • Sun Fire E25K Server
  •  
  • Sun Fire 12K Server
  •  
  • Sun Fire 15K Server
  •  
  • Sun Fire E20K Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: SF-Exxk
  •  
  • _Old GCS Categories>Sun Microsystems>Servers>High-End Servers
  •  

PreviouslyPublishedAs
208741


Applies to:

Sun Fire E20K Server - Version Not Applicable to Not Applicable [Release N/A]
Sun Fire E25K Server - Version Not Applicable to Not Applicable [Release N/A]
Sun Fire 12K Server - Version Not Applicable to Not Applicable [Release N/A]
Sun Fire 15K Server - Version Not Applicable to Not Applicable [Release N/A]
All Platforms
Add ***Checked for relevance on 27-Dec-2010***


Symptoms

A failed power supply must be replaced to insure proper platform operation.
Accurate interpretation of the error message is critical.

Changes

{CHANGE}

Cause

Each of the members of the Highend Platform Family(12K/15K/E20K/E25K) contain six AC-to-DC power supplies. On detection of a power supply failure, you must contact your authorized Sun[TM] Service representative to have any failed supplies replaced.

Solution

The SMS "showenvironment -p powers" command will display the status of the six power supplies.  Partial output from this command is contained in the example below: 

Note: All command output may vary slightly depending on type of power supply installed in the platform.
POWER UNIT AC0 AC1 DC0 DC1 FAN0 FAN1
----- ---- --- --- --- --- ---- ----
PS0   OK   OK  OK  ON  ON  OK   FAIL
PS1   OK   OK  OK  ON  ON  OK   OK
PS2   OK   OK  OK  ON  ON  OK   OK
PS3   OK   OK  OK  ON  ON  OK   OK
PS4   OK   OK  OK  ON  ON  OK   OK
PS5   FAIL OK  OK  ON  ON  OK   OK

This example shows PS0 has a failed fan and PS5 has a failed unit status. Messages are logged in the SMS platform messages file (/var/opt/SUNWSMS/adm/platform/messages) at the time of failure. Example messages for the above failures are shown below: 

Example message 1:

May 1 23:17:12 2004 sc0 esmd[1363]: [1926 17158283261375019 ERR Equipment.cc 604] A power supply failure has been noted on PS at PS5.  For N+1 redundancy, the system configuration requires 19034.00 watts.
 The power supplies are providing 20000.00 watts.
May 1 23:17:14 2004 sc0 esmd[1363]: [1929 17158284966232407 NOTICE Patrols.cc 1876] PS at PS5 breaker has been tripped: ecode = 0

In the above message, SMS has detected a unit fail status on PS5 and has tripped its breakers. The remaining power supplies are providing 20,000 watts for a system, which requires 19,034 watts for N+1 redundancy. The system is still within N+1 power redundancy. 

Example message 2:

May 24 06:32:36 2004 sc0 esmd[1363]: [1925 19085207450945051 ERR Equipment.cc 531] An internal fan failure has been noted in PS at PS0, which is being shutdown. For N+1 redundancy, the system configuration
requires 18311.00 watts. The power supplies are providing 16000.00 watts.
May 24 06:32:37 2004 sc0 esmd[1363]: [1929 19085208483622553 NOTICE Patrols.cc 1916] PS at PS0 breaker has been tripped: ecode = 0

In the above message, SMS has detected a failed fan internal to PS0.

For such a failure SMS must shut down the power supply to prevent overheating. It trips the failed supply's breakers. In this failure, the system requires 18,311 watts for N+1 redundancy. The remaining supplies are providing only 16,000 watts. The system has fallen below the required power for N+1 redundancy. An additional power supply failure would enter brown out condition and the results are undefined. A domain crash is likely.

 

Example message 3:

Oct 4 08:47:31 2010 sc0 showboards[11720]: [6235 70887614090602764 ERR BPSPowerControl.cc 521] 48.0 Voltage value is out of tolerance. value = 0.390000 expected between 40.00 and 56.00 on PS at PS0

SMS has detected that the AC input voltage has fallen below the acceptable range.  The datacenter electrician should confirm the proper operation of the AC power cord feed.  Upon confirmation that the AC power cord is properly supplying power you should contact your authorized Oracle Service representative for further diagnosis.

NOTE: Error message 3 is only generated upon running the showboards command and will result in output similar to the below.

Location    Pwr    Type of Board   Board Status  Test Status   Domain
--------    ---    -------------   ------------  -----------   ------
PS0         Unk    PS                -               -             -

 

Resolution

 

Contact your authorized Oracle Service representative to have any failed power supplies replaced as soon as detected.

 

@ N+1, redundancy
 Previously Published As 77254


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback