![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Technical Instruction Sure Solution 1002259.1 : Sun Enterprise[TM] xx00 (Sun Fire[TM] Classic): WARNING: AC Power failure detected
PreviouslyPublishedAs 203183 Applies to:Sun Enterprise 3000 Server - Version All Versions and laterSun Enterprise 4000 Server - Version All Versions and later Sun Enterprise 5000 Server - Version All Versions and later Sun Enterprise 6000 Server - Version All Versions and later Sun Enterprise 3500 Server - Version All Versions and later All Platforms GoalUnderstanding the Expected Behavior of an AC power Failure for FixDescription The purpose of this document is to describe expected behavior for a Sun Enterprise[TM] xx00 (Sun Fire[TM] Classic) server when it encounters an AC power failure. It details several of the causes to the failure and how to distinguish which cause is in fact the most likely root cause to the outage itself. The error message that may commonly appear in such an event as this is: sysctrl: [ID 712134 kern.warning] WARNING: AC Power failure detected Unfortunately, this message alone doesn't tell us much. Basically, it tells you what you already know, that a power failure occurred. But, why or how it occurred is the mystery that needs solving. This document can be used to provide hints into the common causes of a power failure for this platform. The common causes we investigated were: 1. Power supply failure Simulated by physically removing the unit. 2. Removing the power cord Simulates loss of power feed and also an accidental or purposeful cord removal. 3. Flipping off the power rocker switch Simulates power feed loss and purposeful or accidental switch flip. 4. Changing the keyswitch position to OFF Demonstrates non-power interruption behavior. At times, it is suspected that power was either accidentally or purposely removed from the server (pulling a power cord, tripping over the cord, flipping off the switch, etc), but it is hard to prove this without someone admitting it or seeing it happen. As this document will show, there is a way to almost prove the cause to be a manual one, depending on the behavior that a certain outage will show. NOTE: All observations were made in two different Sun internal labs and observed by two separate teams of Sun personnel, using two different servers. The separate teams results were identical, so it is an assumption that the servers of this class will all behave in the same manner. But, we can not know with 100% certainty that in EVERY case, this is true. This document only applies to situations where a single server has encountered a power failure. If multiple servers in the environment have had a loss of power, the cause is most likely the power feed to the environment itself. This document describes situations where a single server mysteriously loses power while others nearby remain unaffected.
Removing a Peripheral Power Supply The console reported the loss of the PPS unit and ultimately it's re-insertion. NOTE: /var/adm/messages logged the same information as well # Dec 30 12:15:31 v4u-4500b sysctrl: NOTICE: Core Power Supply 1 Removed Dec 30 12:15:31 v4u-4500b sysctrl: WARNING: Redundant power lost Dec 30 12:16:05 v4u-4500b sysctrl: NOTICE: Core Power Supply 1 Installed Dec 30 12:16:09 v4u-4500b sysctrl: NOTICE: Core Power Supply 1 OK # Dec 30 12:16:09 v4u-4500b sysctrl: NOTICE: Redundant power available Dec 30 12:16:32 v4u-4500b sysctrl: NOTICE: Core Power Supply 3 Removed Dec 30 12:16:32 v4u-4500b sysctrl: WARNING: Redundant power lost Dec 30 12:16:39 v4u-4500b sysctrl: NOTICE: Core Power Supply 3 Installed Dec 30 12:16:43 v4u-4500b sysctrl: NOTICE: Core Power Supply 3 OK Dec 30 12:16:43 v4u-4500b sysctrl: NOTICE: Redundant power available # At no point did the system go down, reboot, etc. Assuming enough redundant power is being supplied the system would remain in operation in such a situation. These are messages that could be expected (or similar) if a Power Supply failure were to have happened. In addition, prtdiag following a reboot should show a PPS unit failure and messaging during bootup would indicate problems with the unit. An example of a failed Power Supply unit from prtdiag may be: Detected System Faults ====================== Key Switch Fan failure Detected Wed May 26 08:44:45 2004 AC Box Fan failure Detected Wed May 26 08:44:45 2004 AC Power failure Detected Wed May 26 08:44:45 2004 System 5.0 Volt Precharge failure Detected Wed May 26 08:44:45 2004 System 3.3 Volt Precharge failure Detected Wed May 26 08:44:45 2004 Peripheral 12 Volt Precharge failure Detected Wed May 26 08:44:45 2004 Peripheral 12 Volt Power failure Detected Wed May 26 08:44:45 2004 Unit 0 Peripheral Power Supply failure Detected Wed May 26 08:44:45 2004 PROM detected failure Detected Wed May 26 08:44:45 2004 Removing a power cord As soon as the power cord was pulled from the system, simulating tripping on the cord, or simply removing it, the console reported: # Dec 30 12:17:52 v4u-4500b sysctrl: WARN}Hardware Power ON # Hardware Power ON The server immediately rebooted, ran POST (as shown above), and booted back up (assumes auto-boot=true). Once back into Solaris, the /var/adm/messages file had only one message reflecting the incident : Dec 30 12:17:52 v4u-4500b sysctrl: [ID 712134 kern.warning] WARNING: AC Power failure detected So, when power is instantly disrupted to the server, in the case of tripping over or removing the power cord, the only symptoms should be the single warning message and the system reboots. Flipping off the power rocker switch Flipping off the rocker switch behaved exactly identical to removing the power cord. The result was instant domain reboot and a single message to both the console and /var/adm/messages file. Console reported: # Dec 30 12:31:36 v4u-4500b sysctrl: WARN Hardware Power ON # Hardware Power ON The domain reboots and /var/adm/messages shows: Dec 30 12:31:36 v4u-4500b sysctrl: [ID 712134 kern.warning] WARNING: AC Power failure detected Changing the keyswitch position to OFF As might be expected, changing the keyswitch does not log any messages concerning a power failure. After all this is a standard process on this platform, unrelated to power distribution. The console reported the following immediately after changing the keyswitch position to off: # Hardware Power ON The domain ran through POST (on keyswitch ON), and the /var/adm/messages only logged the following message (nothing in regards to power): Dec 30 12:48:24 v4u-4500b sysctrl: [ID 273467 kern.info] sysctrl0: Key switch is not in the secure position Summary A power supply unit failure should leave evidence of the failure beyond a single message indicating AC power failure detected. A bad supply should remain bad following a reboot, and ultimately remain bad until it is replaced. Assuming a system has enough redundant power left without this defective PS unit, it will continue operating without crashing. A normal keyswitch operation does not list anything with regards to power in messages. It merely says that the key switch status has changed. If the only message we see reported to the console or /var/adm/messages file is, WARNING: AC Power failure detected it is quite likely that the power feed was instantly disrupted. If all systems attached to the same power feed are instantly disrupted, root cause is the power source. If only one machine was effected on a specific power feed, the instant disruption is most likely the result of accidental or purposeful disruption of power.
Escalation 1-6082530, Radiance case ID 64404887 were the source of this document. Attachments This solution has no attachment |
||||||||||||
|