Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-79-1942833.1
Update Date:2018-03-19
Keywords:

Solution Type  Predictive Self-Healing Sure

Solution  1942833.1 :   How to investigate the Auto Service Request alarm SPX86-8003-73  


Related Items
  • Exalogic Elastic Cloud X4-2 Full Rack
  •  
  • Exadata X4-2 Hardware
  •  
  • Exadata X3-8b Hardware
  •  
  • Netra Server X5-2
  •  
  • Exadata X3-8 Hardware
  •  
  • Exadata X3-2 Quarter Rack
  •  
  • Sun Server X4-2L
  •  
  • Exadata X3-2 Eighth Rack
  •  
  • Exadata X4-2 Quarter Rack
  •  
  • Exadata X4-2 Full Rack
  •  
  • Sun Server X3-2
  •  
  • Exadata X3-2 Half Rack
  •  
  • Exadata X3-2 Full Rack
  •  
  • Sun Server X4-2
  •  
  • Exadata X4-8 Hardware
  •  
  • Exadata X4-2 Half Rack
  •  
  • Exadata X3-2 Hardware
  •  
  • Sun Server X3-2L
  •  
  • Exadata X6-2 Hardware
  •  
  • Exadata X4-2 Eighth Rack
  •  
Related Categories
  • PLA-Support>Sun Systems>x86>Server>SN-x64: SERVER 64bit
  •  


This article describes activity required by a System Administrator to verify whether action has to be taken when the  X86 ASR power supply alarm SPX86-8003-73  has occurred on a system based on the Sun Server X3-2 or X4-2 architecture.

In this Document
Purpose
Scope
Details
 Description of the ASR Event:
 How to verify if the power supply alarm is persistent or was transient.
 Example alarm:


Applies to:

Exadata X3-2 Half Rack - Version All Versions to All Versions [Release All Releases]
Exadata X4-2 Hardware - Version All Versions to All Versions [Release All Releases]
Exadata X3-2 Full Rack - Version All Versions to All Versions [Release All Releases]
Exadata X4-8 Hardware - Version All Versions to All Versions [Release All Releases]
Exadata X4-2 Quarter Rack - Version All Versions to All Versions [Release All Releases]
x86

Purpose

 This article describes activity required by a System Administrator to verify whether action has to be taken on the X86 ASR power supply alarm SPX86-8003-73

Scope

 This document is intended for system administrators and support personnel.

Details

Auto Service Request (ASR) provides automatic failure detection and SR creation for Oracle X86 systems.  See http://www.oracle.com/us/asr/index.html for more information on ASR. Please find an example ASR alarm at the bottom of this document.

Description of the ASR Event:

Power supply events can be both transient or persistent. Additional checks may need to be performed in order to understand the cause of this ASR event. 

There is a small subset of situations where further action needs to be taken when a SPX86-8003-73 has created an Automatic Service Request (ASR) and this document outlines how this can be determined.

The alarm can be generated by external changes and actions, most notably by the removal of AC input from a power supply via removal of the AC power cord, or loss of external AC supply. If the event has been been caused by changes in site power or a similar event then no action need be taken. Upon restoration of AC input, the ILOM fault will automatically clear. This would be considered a transient failure.

If actions are taken to isolate and clear the fault, and the fault remains or clears and then faults again, then this would be considered a persistent failure. If a persistent failure has occurred, or if a power event or events cannot be explained by changes in the supplied power or work being carried out on the machine, then Oracle Support should be contacted to investigate and resolve the problem.

How to verify if the power supply alarm is persistent or was transient.

Step 1: Identify the system or systems that experienced this ASR Alarm.

The Auto Service Request (ASR) will be logged against the serial number of the machine that generated the alarm. The information provided by the alarm will contain the hostname of the machine or the machine's Service Processor.

Step 2: To verify that a power supply has a persistent failure, perform either Step2a or Step 2b of this document.

The ASR alarm will identify the power supply number that has generated the alarm. Use ILOM to determine if there are any open problems, indicating this is a persistent failure.  A transient failure will not show any open problems as the ILOM fault will automatically clear when AC power input is restored.

Step 2a

The ILOM web interface can be used to examine if any open problems exist:

 Open Problems

 

Step 2b

The ILOM cli interface can be used to examine if any open problems exist:

 

-> show /System/Open_Problems

Open Problems (1)
Date/Time                 Subsystems          Component
------------------------  ------------------  ------------
Fri Nov  7 14:14:36 2014  Power               PS0 (Power Supply 0)
        A loss of AC input to a power supply has occurred. (Probability: 100, UU
        ID: 3df33afb-7ed3-c9ee-8da4-b23a0899be76, Part Number: 7047410, Serial N
        umber: 476856F+1302CE01CN, Reference Document: http://www.sun.com/msg/SP
        X86-8003-73)

->

 

Step 3

This step details methods to use, to determine if the persistent alarm is expected.

If the SPX73-8003-73 is persistent then the input power should be checked for that power supply. Power supplies have an AC IN indicator that should be green and either on or flashing standby, when AC power into the power supply is operating normally. If the AC IN indicator is showing no indication or is showing fault with amber indicator, then there is a problem with the AC provided to the server, or the AC power cord attached to the server.  

If the persistent alarm "Loss of AC power" is displayed because the incoming power is not present then the AC input source and cabling should be checked.

If the persistent alarm "Loss of AC power" is displayed despite the presence of incoming power in the affected supply then Oracle Service should be contacted using the ASR opened for the alarm.

How to check using the web interface  of the ILOM System > Power > Details

The example below shows the input power is missing and the AC inputs or cabling need to be checked.

 

No input power

 

The example below shows  the input power is present but there is no power out.

The ASR should be updated with this information and to request that an Oracle Service Representative investigate further.

 

Suspect Power Supply

 

If the utility ipmitool is available then the ipmitool command "sdr list all" can be used to check the incoming AC and DC output of a power supply. It can also be used to check whether a particular supply is indicating a fault. Note that the actual command line to use ipmitool will depend on the configuration of the system and the host on which the command is run.

 

The example belows shows that PS0 is behaving as expected with input and output power:

PS1 is showing no input power and no output power and is flagging AC is lost. AC input to this power supply should be investigated. The power supply itself is unlikely to be faulty.

 

ipmitool  sdr list all

<output omitted>

PS0/P_IN         | 140 Watts         | ok
PS0/P_OUT        | 130 Watts         | ok
PS0/V_IN         | 248 Volts         | ok
PS0/V_12V        | 12 Volts          | ok
PS0/V_12V_STBY   | 11.82 Volts       | ok
PS0/T_OUT        | 23 degrees C      | ok
PS0/STATE        | 0 unspecified     | nc
PS1/P_IN         | 0 Watts           | ok
PS1/P_OUT        | 0 Watts           | ok
PS1/V_IN         | 0 Volts           | ok
PS1/V_12V        | 0 Volts           | ok
PS1/V_12V_STBY   | 0 Volts           | ok
PS1/T_OUT        | 20 degrees C      | ok
PS1/STATE        | 0 unspecified     | nc

<output omitted>

 

 

 The example below an example where Oracle support should be contacted.

PS1 is functioning correctly but PS0 has voltage and power entering power supply but no power or voltage output.

ipmitool sdr list all

<output omitted>

PS0/P_IN         | 90 Watts          | ok
PS0/P_OUT        | 0 Watts           | ok
PS0/V_IN         | 246 Volts         | ok
PS0/V_12V        | 0 Volts           | ok
PS0/V_12V_STBY   | 0 Volts           | ok
PS0/T_OUT        | 37 degrees C      | ok
PS0/STATE        | 0 unspecified     | nc
PS1/P_IN         | 220 Watts         | ok
PS1/P_OUT        | 200 Watts         | ok
PS1/V_IN         | 246 Volts         | ok
PS1/V_12V        | 12 Volts          | ok
PS1/V_12V_STBY   | 11.88 Volts       | ok
PS1/T_OUT        | 40 degrees C      | ok
PS1/STATE        | 0 unspecified     | nc

 

<output omitted>






 

Summary

If the failure was transient and is understood from known site power or service activities, then no further action is required. The SR will close in 14 days.

If the failure has been verified as persistent and cause is unknown, or there is concern about the original alarm occurring then engage Oracle Support by one of the following methods

a) Update the SR - A support engineer will be assigned to assist.

b) Phone your local Oracle support number and request the SR be assigned to the next available engineer.

An ILOM snapshot file should be uploaded to allow further analysis.

 

Disabling ASR During Maintenance

Did you know that during planned maintenance activities you can, if you wish, disable ASR to prevent these events being sent to Oracle (from release 5.4). Instructions on how to do this are in the ASR Managers Guide section 4.11 (http://docs.oracle.com/cd/E37710_01/install.41/e18475.pdf)

 

Example alarm:

Hostname: example-ilom
Product Type: SUN FIRE X4170 M3
Summary:ASR: Lack of AC input power.

Fault event knowledge article: https://support.oracle.com/msg/SPX86-8003-73

Fault event description: A loss of AC input to a power supply has occurred.
SunHwTrapFaultDiagnosed
Event Time = Thu Nov 6 06:55:46 2014
Fault Message ID = SPX86-8003-73
Fault UUID = zyz09673-999-4431-ed726-de232b392892
Knowledge Article URL = https://support.oracle.com/msg/SPX86-8003-73
Fault Description =
Fault Severity = 0
Product Manufacturer = Oracle Corporation
Product Name = Exalogic X3-2 Upg
Product Serial Number = AKEXAMPLE
Product Part Number = Exalogic X3-2 Upg
Component System Manufacturer = Oracle Corporation
Component System Name = SUN FIRE X4170 M3
Component System Serial Number = 123EXAMPLE
Component System Part Number = 7067084
Chassis Manufacturer = Oracle Corporation
Chassis Name = SUN FIRE X4170 M3
Chassis Serial Number = 1325EXAMPLE
Chassis Part Number = 7067084
DiagEntity = fdd(1)
SystemIdentifier = Oracle Exalogic X2-2 AKEXAMPLE
Hostname = example-ilom


SuspectCount = 1
Event
Suspect 1 Information
SuspectFruFaultCertainty = 100
SuspectFruFaultClass = fault.chassis.power.ext-fail
SuspectFruName =
SuspectFruLocation = /SYS/PS0
SuspectFruChassisId = 1325EXAMPLE
SuspectFruManufacturer =
SuspectFruPn = 07047410
SuspectFruSn = 476856F+1317CE0113
SuspectFruRevision = A256_Power_Supply
SuspectFruStatus = faulted(3)

System serial = AKEXAMPLE
System type = EXALOGIC X3-2 UPG



Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback