Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-79-1554985.1
Update Date:2014-11-07
Keywords:

Solution Type  Predictive Self-Healing Sure

Solution  1554985.1 :   How to investigate the Auto Service Request "ASR:Probable power supply failure" on X4150, X4250 and X4450 systems  


Related Items
  • Sun SPARC Enterprise T5440 Server
  •  
  • Sun Fire X4150 Server
  •  
  • Sun Fire X4450 Server
  •  
  • Sun Fire X4250 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>x86>Server>SN-x64: SERVER 64bit
  •  


This article describes activity required by a System Administrator to verify whether action has to be taken when a X86 ASR power supply alarm has occurred on a  X4x50 based system

In this Document
Purpose
Scope
Details
 Description of the ASR Event:
 How to verify if there is a genuine Power Supply issue:
 Example alarm:


Applies to:

Sun Fire X4450 Server - Version Not Applicable to Not Applicable [Release N/A]
Sun Fire X4150 Server - Version Not Applicable to Not Applicable [Release N/A]
Sun Fire X4250 Server - Version Not Applicable to Not Applicable [Release N/A]
Sun SPARC Enterprise T5440 Server
x86

Purpose

 This article describes activity required by a System Administrator to verify whether action has to be taken on a X86 ASR power supply alarm.

Scope

 This document is intended for system administrators and support personnel.

Details

Auto Service Request (ASR) provides automatic failure detection and SR creation for Oracle X86 systems.  See http://www.oracle.com/us/asr/index.html for more information on ASR. 

Description of the ASR Event:

Power supply events can be both transient or persistent. They can be generated by external changes and actions, most notably by the removal of AC from a power supply.

Additional checks may need to performed in order to understand the cause of this ASR event.  If a persistent failure has occurred, or if a power event or events cannot

be explained by changes in the supplied power or work being carried out on the machine, then further investigation by a support engineer may be required

If the event has been been caused by changes in site power or a similar event then no action need be taken.

Please find an example ASR alarm at the bottom of this document

How to verify if there is a genuine Power Supply issue:

Step 1: Identify the system that experienced this ASR Alarm.

The Auto Service Request (ASR) will be logged against the serial number of the machine that generated the alarm.

The information provided by the alarm will contain the hostname of the machine or the machine's Service

Processor.

 

Step 2: To verify that a power supply has a persistent failure, perform either Step2a, Step 2b or Step 2c of this document.

The ASR alarm will identify the power supply number that has generated the alarm.

This power supply can be checked by using the one of the methods below:

"value = State Asserted" for PWROK indicates the power supply DC output is functioning

"value = State Asserted" for VINOK indicates the AC input to the power supply is present

If the AC input for a particular power supply is "State Deasserted" then the corresponding DC output of the power supply (PWROK) will also be  "State Deasserted" and the site infrastructure

should be checked to discover the reason for the loss of AC.

Note on a server which is powered off PWROK will be in state "Deasserted" for all power supplies.

On a system that is powered up and VINOK is Asserted (On) for a power supply but PWROK is "Deasserted" (Off) for that power supply then the DraftSR should be promoted to a full SR.

Step 2a

If the utility ipmitool is available then the ipmitool command "sensor" can be used to check the incoming AC and DC output of a power supply. It can also be used to check whether a particular supply is indicating a fault.

Example:

 

ipmitool -H <sp_address or name> -U root sensor

<output omitted>

PS0/VINOK | 0x2 | discrete | 0x0200| na | na | na | na | na | na
PS0/PWROK | 0x2 | discrete | 0x0200| na | na | na | na | na | na
PS0/CUR_FAULT | 0x1 | discrete | 0x0100| na | na | na | na | na | na
PS0/VOLT_FAULT | 0x1 | discrete | 0x0100| na | na | na | na | na | na
PS0/FAN_FAULT | 0x1 | discrete | 0x0100| na | na | na | na | na | na
PS0/TEMP_FAULT | 0x1 | discrete | 0x0100| na | na | na | na | na | na
PS0/T_AMB | 34.000 | degrees C | ok | na | na | na | na | na | na
PS0/F0/TACH | 8704.000 | RPM | ok | na | na | na | na | na | na
PS0/V_IN | 242.000 | Volts | ok | 70.000 | 80.000 | na | na | 270.000 | 280.000
PS0/I_IN | 0.750 | Amps | ok | na | na | na | na | na | na

<output omitted>

 

 

Note "0x2 indicates "Asserted" and "0x1" indicates "Deasserted"


Step 2b

Log onto the Command Line Interface (CLI) of the server's service processor and use the commands below to check the relevant power supply status under /SYS/PSx where "x" is the power supplied implicated in the ASR alarm.

"value = State Asserted" under PWROK indicates the power supply DC output is functioning

"value = State Asserted" under VINOK indicates the AC input to the power supply is present

Note the following output will vary slightly according to the version of ILOM.

If the AC input for a particular power supply is "State Deasserted" then the corresponding DC output of the power supply (PWROK) will also be  "State Deasserted" and the site infrastructure should be checked to discover the reason for the loss of AC.

The status of hardware fault indicators on the power supply can also be checked see example below:

-> show -d properties /SYS/PS0/VINOK
  /SYS/PS0/VINOK
    Properties:
        type = Power Supply
        ipmi_name = PS0/VINOK
        class = Discrete Sensor
        value = State Asserted
        alarm_status = cleared

-> show -d properties /SYS/PS0/PWROK
  /SYS/PS0/PWROK
    Properties:
        type = Power Supply
        ipmi_name = PS0/PWROK
        class = Discrete Sensor
        value = State Asserted
        alarm_status = cleared


Example; To check for a  Power Supply Fan Fault:

 
-> show /SYS/PS0

 /SYS/PS0
    Targets:
        CUR_FAULT
        FAN_FAULT
        INPUT_POWER
        I_IN
        I_OUT
        OUTPUT_POWER
        PRSNT
        PWROK
        TEMP_FAULT
        VINOK
        VOLT_FAULT
        V_IN
        V_OUT

<output omitted>
 

-> show -d properties /SYS/PS0/FAN_FAULT
  /SYS/PS0/FAN_FAULT
    Properties:
        type = Power Supply
        ipmi_name = PS0/FAN_FAULT
        class = Discrete Sensor
        value = State Deasserted
        alarm_status = cleared



Step 2c

Log onto the the Browser User Interface (BUI) of the ILOM

Navigate to the System Monitoring Tab > Sensor readings

Then select "Type: Power Supply" as filter and check the relevant /SYS/PSx/PWROK and /SYS/PWx/VINOK entries.

An example is shown below:

"value = State Asserted" for the relevant power supply's PWROK indicates the power supply DC output is functioning

"value = State Asserted" for the relevant powers supply's VINOK indicates AC to the power supply is present

Check if any of the fault indicators are asserted which would indicate a hardware failure

 Power supply BUI display

 

 Step 3:

If a failure is not persistent, no further action is required. The SR will close in 14 days.

If the failure has been verified as persistent or is a cause of concern, then engage a support engineer by one of the following methods

a) Update the SR - A support engineer will be assigned to assist.

b) Phone your local Oracle support number and request the SR be assigned to the next available engineer.

If ILOM is version 3.0 or above you should upload a snapshot file to allow further analysis, otherwise upload output of relevant commands above and if possible provide the output of ipmitool command "sel elist"

 

Example alarm:

Hostname: SP_Host_name
Product Type: SUN FIRE X4450
Summary:ASR:Probable Power Supply Failure

sunHwTrapSystemIdentifier =
sunHwTrapChassisId = serial_number
sunHwTrapProductName = SUN FIRE X4450
sunHwTrapComponentName = /SYS/PS0/VOLT_FAULT
sunHwTrapAdditionalInfo = State Asserted
sunHwTrapAssocObjectId = .1.3.6.1.2.1.47.1.1.1.1.2.134


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback