Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1002941.1
Update Date:2018-03-26
Keywords:

Solution Type  Technical Instruction Sure

Solution  1002941.1 :   How To Check Why the System Powered Off, on Oracle x86 Servers  


Related Items
  • Sun Server X4-2L
  •  
  • Oracle Server X6-2L
  •  
  • Oracle Server X5-2
  •  
  • Oracle Server X6-2
  •  
  • Sun Server X4-2
  •  
  • Sun Server X3-2
  •  
  • Oracle Server X7-2
  •  
  • Sun Server X3-2L
  •  
  • Oracle Server X5-2L
  •  
  • Oracle Server X7-2L
  •  
Related Categories
  • PLA-Support>Sun Systems>x86>Server>SN-x86: Oracle Server X7
  •  
  • _Old GCS Categories>Sun Microsystems>Servers>x64 Servers
  •  

PreviouslyPublishedAs
204044


Description

This document describes what to check if a Sun X64 server appears to be powered off and you expect it to be ON, using ipmitool, Service Processor web GUI, Service Processor CLI, as well as if you are local to the server.

Symptoms:

Server powered off, need to investigate why


To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Sun x86 Systems

Steps to Follow
This document gives some guidance on how to proceed if you find a Sun X64 server has unexpectedly powered off. The examples offered may be specific to a particular command or firmware version, are provided to illustrate a troubleshooting concept, and may not apply to all Sun X64 servers. Always refer to the support documentation for your particular server product to determine the correct equivalent command or procedure.

Various conditions can trigger a system shutdown, including:

  • Temperature of a component or ambient air is too high.
  • Multiple cooling fan failures.
  • A voltage fluctuation beyond the acceptable threshold.
  • Multiple power supplies have failed or have been removed causing loss of power redundancy.
  • External (computer room) AC or DC power fails, or falls outside the range required by the server power supplies to safely continue to run the system.
  • A component hot-swap circuit has faulted.

The first thing to note, is that if the chassis has no power, then the Service Processor (SP) will not function, as it operates from standby / housekeeping voltage. If this is the case then a physical examination of the server is required, as outlined below in the section "Verifying cause of NO chassis power".

If the SP is accessible, this means external power is being delivered to at least one of the server power supplies, which in turn are supplying standby voltage to the chassis.

Gathering possible reasons for the outage using ipmitool

The ipmitool command can be used to collect information about the possible reasons for the platform state, such as voltage & temperature sensors, fault LEDs & indicators, and the platform System Event Log (SEL).

See <Document: 1009698.1 > for detailed information on the use of ipmitool for collection of data from the platform.

Example - SEL entries showing high temperature events that resulted in automatic system power-off:
281 | 05/12/2009 | 22:23:20 | Temperature dbp.t_amb | Upper Critical going high | Reading 34 > Threshold 33 degrees C
282 | 05/12/2009 | 22:42:07 | Temperature dbp.t_amb | Upper Non-recoverable going high | Reading 44 > Threshold 43 degrees C
283 | 05/12/2009 | 22:42:46 | System ACPI Power State sys.acpi | S5/G2: soft-off | Asserted


Example - SEL entry showing Chassis Intrusion switch was triggered when the chassis cover was removed:
200 | 06/24/2008 | 10:35:36 | Physical Security sys.intsw | General Chassis intrusion | Asserted

Example - SEL entries showing the power button was used to power-off the system:
109 | 11/17/2008 | 19:01:26 | Button | Power Button pressed | Asserted
10a | 11/17/2008 | 19:01:29 | System ACPI Power State ACPI | S5/G2: soft-off | Asserted

Further information about gathering data using ipmitool may also be available in the Servers Diagnostics Guide, or ILOM / ELOM Supplement specific to that server platform.

http://www.oracle.com/technetwork/documentation/oracle-x86-servers-190077.html

Gathering possible reasons for the outage using Service Processor web GUI

Integrated Lights Out Manager (ILOM) and Embedded Lights Out Manager (ELOM) based Service Processors provide an easy-to-use web interface for managing the platform. Point your web browser to the Service Processor IP address or resolving DNS hostname, and enter your login credentials when prompted.

Once logged in, click the System Monitoring tab, which reveals access to additional tabs. Click to drill down further:

  • Sensor readings.
  • Event logs.
  • Fault and other Indicator LED states.
  • Power Management & utilization.

Note: tab names may differ slightly between ILOM and ELOM versions.

For further information, refer to the ILOM or ELOM Administration Guide and Supplement Document for your platform and Lights-Out Manager version:

http://www.oracle.com/technetwork/documentation/oracle-x86-servers-190077.html

Gathering possible reasons for the outage using the Service Processor Command Line Interface (CLI)

Login to the Service Processor using ssh (requires SP IP address or resolvable DNS hostname):

# ssh -l <username> <SP host name or IP>

Display System Event Logs, and sensor & fault indicator information:

ILOM:
show /SP/logs/event/list
show -d properties -level all /SYS
show -o table -level all /SP/faultmgmt
(Not available in all ILOM versions).

ELOM:
show /SP/AgentInfo/SEL
show -d properties -level all /SP/SystemInfo

CMM:
show /CMM/logs/event/list
show -d properties -level all /

V20z & V40z:
sp get events -v
sensor get --verbose
inventory get all -v
sp get tdulog -f stdout

Gathering possible reasons of the outage if you are local to the server

After Checking LEDs and verifying the Power OK LED from the front or rear is illuminated either STEADY GREEN (ON) or SLOW BLINK GREEN (OFF) (which tells you power is available and that AC is applied):

  • Power on the Platform by pressing the power on button.
  • Press F2 when prompted to enter BIOS. Note any events that might be reported.
  • Once in BIOS, navigate using the curser keys to the tab labelled Advanced.
  • Navigate down to Event Log Configuration, press enter.
  • Select View Event Log, examine for possible reasons of the outage, use Esc to exit.

Once back at Advanced tab navigate to IPMI 2.0 Configuration, Select and press enter to view View BMC System Event Log.

NOTE: Unless you are familiar with these events as they are in raw format, I would suggest you use the ipmitool procedure above as this decodes these events automatically. As there will be events that are part of the normal process of the system powering on, decoding of these events would be required to look for issues.

The messages can also be decoded manually by accessing the following document:
ftp://download.intel.com/design/servers/ipmi/IPMIv2_0rev 1_0.pdf

It is beyond the scope of this document to discuss this manual process of decoding.

If external power is present and the system will still not power on or remain up:

  • Check that all system covers are firmly in place,
  • Check the chassis intrusion switch (if present) is aligned correctly with the cover.
Refer to the Server Service manual for the location of the intrusion switch.

Gathering possible reasons for the outage from the Operating System

If the system can be powered up and OS booted OK after an unexpected shutdown, check:

  • OS messages and event logs: Was the shutdown graceful? Is there any indication of the power button being pressed, temperature or other event recorded?
  • OS fault manager (such as Solaris FMA) records?
  • Console log: was anything relevant displayed on the system console at or near the time of the shutdown?

Verifying cause of NO chassis power

  • Visually inspect each power supply for the status of the AC Present, Power OK, and Fault LEDs. If the Fault LED is illuminated on any of the PSUs then further troubleshooting will be required.
  • If AC Present is NOT illuminated, ensure the AC power cords are securely plugged into the server and connected to working AC power outlet(s). Test using known good power cables and power source. Engage a qualified electrician to test voltage on the power cords.



Product
Sun Fire X4600 Server
Sun Fire X4600 M2 Server
Sun Fire X4200 Server
Sun Fire X4200 M2 Server
Sun Fire X4100 Server
Sun Fire X4100 M2 Server
RoHS Sun Fire X4200 Server
RoHS Sun Fire X4100 Server

Internal Comments
This document contains normalized content and is managed by the the Domain Lead(s) of the respective domains. To notify content owners of a knowledge gap contained in this document, and/or prior to updating this document, please contact the domain engineers that are managing this document via the "Document Feedback" alias(es) listed below:
tsc-emea-x64@sun.com

x64, normalized, power, AC, failure
Previously Published As
91594
Change History
Date: 2010-05-23
User Name: james.j.carter@oracle.com
Action: Updated
Comment: Currency check & update.

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback