Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1002259.1
Update Date:2017-10-18
Keywords:

Solution Type  Technical Instruction Sure

Solution  1002259.1 :   Sun Enterprise[TM] xx00 (Sun Fire[TM] Classic): WARNING: AC Power failure detected  


Related Items
  • Sun Enterprise 6500 Server
  •  
  • Sun Enterprise 5000 Server
  •  
  • Sun Enterprise 3500 Server
  •  
  • Sun Enterprise 4000 Server
  •  
  • Sun Enterprise 3000 Server
  •  
  • Sun Enterprise 4500 Server
  •  
  • Sun Enterprise 5500 Server
  •  
  • Sun Enterprise 6000 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: Exx00
  •  
  • _Old GCS Categories>Sun Microsystems>Servers>Midrange Servers
  •  

PreviouslyPublishedAs
203183


Applies to:

Sun Enterprise 3000 Server - Version All Versions and later
Sun Enterprise 4000 Server - Version All Versions and later
Sun Enterprise 5000 Server - Version All Versions and later
Sun Enterprise 6000 Server - Version All Versions and later
Sun Enterprise 3500 Server - Version All Versions and later
All Platforms

Goal

Understanding the Expected Behavior of an AC power Failure for
Sun Enterprise[TM] xx00 (Sun Fire[TM] Classic)

Fix

Description

The purpose of this document is to describe expected behavior for a Sun Enterprise[TM] xx00 (Sun Fire[TM] Classic) 
server when it encounters an AC power failure. It details several of the causes to the failure and how to distinguish 
which cause is in fact the most likely root cause to the outage itself. The error message that may commonly appear in 
such an event as this is:
sysctrl: [ID 712134 kern.warning] WARNING: AC Power failure detected

Unfortunately, this message alone doesn't tell us much.  Basically, it tells you what you already know, that a power 
failure occurred. But, why or how it occurred is the mystery that needs solving. This document can be used to provide 
hints into the common causes of a power failure for this platform. The common causes we investigated were:
1.  Power supply failure Simulated by physically removing the unit.
2.  Removing the power cord Simulates loss of power feed and also an accidental or purposeful cord removal.
3.  Flipping off the power rocker switch Simulates power feed loss and purposeful or accidental switch flip.
4.  Changing the keyswitch position to OFF Demonstrates non-power interruption behavior. 
At times, it is suspected that power was either accidentally or purposely removed from the server 
(pulling a power cord, tripping over the cord, flipping off the switch, etc), but it is hard to prove this without 
someone admitting it or seeing it happen.  

As this document will show, there is a way to  almost prove  the cause to be a manual one, depending on the behavior 
that a certain outage will show.

NOTE:  All observations were made in two different Sun internal labs and observed by two separate teams of Sun personnel,
using two different servers.  The separate teams results were identical, so it is an assumption that the servers of this 
class will all behave in the same manner. But, we can not know with 100% certainty that in EVERY case, this is true.

This document only applies to situations where a single server has encountered a power failure. 
If multiple servers in the environment have had a loss of power, the cause is most likely the power feed to the environment itself. 
This document describes situations where a single server mysteriously loses power while others nearby remain unaffected.



Steps to Follow

 Removing a Peripheral Power Supply 
The console reported the loss of the PPS unit and ultimately it's re-insertion.
NOTE:  /var/adm/messages logged the same information as well
# Dec 30 12:15:31 v4u-4500b sysctrl: NOTICE: Core Power Supply 1 Removed
Dec 30 12:15:31 v4u-4500b sysctrl: WARNING: Redundant power lost
Dec 30 12:16:05 v4u-4500b sysctrl: NOTICE: Core Power Supply 1 Installed
Dec 30 12:16:09 v4u-4500b sysctrl: NOTICE: Core Power Supply 1 OK
# Dec 30 12:16:09 v4u-4500b sysctrl: NOTICE: Redundant power available
Dec 30 12:16:32 v4u-4500b sysctrl: NOTICE: Core Power Supply 3 Removed
Dec 30 12:16:32 v4u-4500b sysctrl: WARNING: Redundant power lost
Dec 30 12:16:39 v4u-4500b sysctrl: NOTICE: Core Power Supply 3 Installed
Dec 30 12:16:43 v4u-4500b sysctrl: NOTICE: Core Power Supply 3 OK
Dec 30 12:16:43 v4u-4500b sysctrl: NOTICE: Redundant power available
#
At no point did the system go down, reboot, etc. 
Assuming enough redundant power is being supplied the system would remain in operation in such a situation.  
These are messages that could be expected (or similar) if a Power Supply failure were to have happened. 
In addition, prtdiag following a reboot should show a PPS unit failure and messaging during bootup would 
indicate problems with the unit.

An example of a failed Power Supply unit from prtdiag may be:

Detected System Faults
======================
Key Switch Fan failure
Detected Wed May 26 08:44:45 2004
AC Box Fan failure
Detected Wed May 26 08:44:45 2004
AC Power failure
Detected Wed May 26 08:44:45 2004
System 5.0 Volt Precharge failure
Detected Wed May 26 08:44:45 2004
System 3.3 Volt Precharge failure
Detected Wed May 26 08:44:45 2004
Peripheral 12 Volt Precharge failure
Detected Wed May 26 08:44:45 2004
Peripheral 12 Volt Power failure
Detected Wed May 26 08:44:45 2004
Unit 0 Peripheral Power Supply failure
Detected Wed May 26 08:44:45 2004
PROM detected failure
Detected Wed May 26 08:44:45 2004
 Removing a power cord 
As soon as the power cord was pulled from the system, simulating tripping on the cord, 
or simply removing it, the console reported:

# Dec 30 12:17:52 v4u-4500b sysctrl: WARN}Hardware Power ON
# Hardware Power ON

The server immediately rebooted, ran POST (as shown above), and booted back up (assumes auto-boot=true).  
Once back into Solaris, the /var/adm/messages file had only one message reflecting the  incident :
Dec 30 12:17:52 v4u-4500b sysctrl: [ID 712134 kern.warning] WARNING: AC Power failure detected
So, when power is  instantly  disrupted to the server, in the case of tripping over or removing the power cord, 
the only symptoms should be the single warning message and the system reboots.
 Flipping off the power rocker switch 
Flipping off the rocker switch behaved exactly identical to removing the power cord. 
The result was instant domain reboot and a single message to both the console and /var/adm/messages file. 
Console reported:
# Dec 30 12:31:36 v4u-4500b sysctrl: WARN Hardware Power ON
#  Hardware Power ON

The domain reboots and /var/adm/messages shows:
Dec 30 12:31:36 v4u-4500b sysctrl: [ID 712134 kern.warning] WARNING: AC Power failure detected
 Changing the keyswitch position to OFF 
As might be expected, changing the keyswitch does not log any messages concerning a power failure.  After all this is
a standard process on this platform, unrelated to power distribution. The console reported the following immediately
after changing the keyswitch position to off:
# Hardware Power ON

The domain ran through POST (on keyswitch ON), and the /var/adm/messages only logged the following message
(nothing in regards to power):

Dec 30 12:48:24 v4u-4500b sysctrl: [ID 273467 kern.info] sysctrl0: Key switch is not in the secure position
 Summary 
A power supply unit failure should leave evidence of the failure beyond a single message indicating
AC power failure detected. A bad supply should remain bad following a reboot, and ultimately remain
bad until it is replaced. Assuming a system has enough redundant power left without this defective PS unit, 
it will continue operating without crashing.

A normal keyswitch operation does not list anything with regards to power in messages.  
It merely says that the key switch status has changed.

If the only message we see reported to the console or /var/adm/messages file is,  
WARNING: AC Power failure detected it is quite likely that the power feed was instantly disrupted.  
If all systems attached to the same power feed are instantly disrupted, root cause is the power source.  
If only one machine was effected on a specific power feed, the instant disruption is most likely the result 
of accidental or purposeful disruption of power.



Product
Sun Enterprise 4000 Server
Sun Enterprise 3000 Server

Internal Comments


 Escalation 1-6082530, Radiance case ID 64404887 were the source of this document.



Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback