Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-79-1967601.1
Update Date:2016-09-15
Keywords:

Solution Type  Predictive Self-Healing Sure

Solution  1967601.1 :   How to investigate the Auto Service Request alarm SPX86-8003-EL  


Related Items
  • Exadata X3-8 Hardware
  •  
  • Exadata X5-2 Eighth Rack
  •  
  • Exadata X3-8b Hardware
  •  
  • Exadata X5-2 Hardware
  •  
  • Exadata X5-2 Quarter Rack
  •  
  • Exadata X4-2 Hardware
  •  
  • Exadata X5-2 Full Rack
  •  
  • Exadata X3-2 Quarter Rack
  •  
  • Sun Server X4-2L
  •  
  • Sun Server X3-2
  •  
  • Exadata X4-2 Full Rack
  •  
  • Exadata X4-2 Quarter Rack
  •  
  • Exadata X3-2 Eighth Rack
  •  
  • Sun Server X4-2
  •  
  • Exadata X3-2 Half Rack
  •  
  • Exadata X3-2 Full Rack
  •  
  • Exadata X4-2 Half Rack
  •  
  • Exadata X4-8 Hardware
  •  
  • Exadata X5-2 Half Rack
  •  
  • Sun Server X3-2L
  •  
  • Exadata X3-2 Hardware
  •  
  • Exadata X4-2 Eighth Rack
  •  
Related Categories
  • PLA-Support>Sun Systems>x86>Server>SN-x64: SERVER 64bit
  •  


This article describes activity required by a System Administrator to verify whether action has to be taken when the  X86 ASR power supply alarm SPX86-8003-73  has occurred on a system based on the Sun Server X3-2 or X4-2 architecture.

In this Document
Purpose
Scope
Details
 Description of the ASR Event:
 How to verify if the power supply alarm is persistent or transient.
 Example alarm:
References


Applies to:

Exadata X3-8 Hardware - Version All Versions to All Versions [Release All Releases]
Exadata X3-2 Half Rack - Version All Versions to All Versions [Release All Releases]
Exadata X4-2 Quarter Rack - Version All Versions to All Versions [Release All Releases]
Exadata X5-2 Eighth Rack - Version All Versions to All Versions [Release All Releases]
Exadata X5-2 Full Rack - Version All Versions to All Versions [Release All Releases]
x86

Purpose

 This article describes activity required by a System Administrator to verify whether action has to be taken on the X86 ASR power supply alarm SPX86-8003-EL

Scope

 This document is intended for system administrators and support personnel.

Details

Auto Service Request (ASR) provides automatic failure detection and SR creation for Oracle X86 systems.  See http://www.oracle.com/us/asr/index.html for more information on ASR. 

Description of the ASR Event:

Power supply events can be both transient or persistent. They can be generated by external changes and actions, most notably by the removal of AC from a power supply.

Additional checks may need to performed in order to understand the cause of this ASR event.  If a persistent failure has occurred, or if a power event or events cannot

be explained by changes in the supplied power or work being carried out on the machine, then further investigation by a support engineer may be required

If the event has been been caused by changes in site power or a similar event then no action need be taken.

There is a small subset of situations where further action need be taken when a SPX86-8003-EL has created an Automatic Service Request (ASR) and this document

outlines how this can be determined.

Please find an example ASR alarm at the bottom of this document

How to verify if the power supply alarm is persistent or transient.

 

Step 1: Identify the system or systems that experienced this ASR Alarm.

The Auto Service Request (ASR) will be logged against the serial number of the machine that generated the alarm.

The information provided by the alarm will contain the hostname of the machine or the machine's Service

Processor.

If the SPX73-8003-EL is persistent then the input power should be checked for that power supply.

If the input power is absent then the problem is with the AC provided to the server.

If the input power is present and the power supply is experiencing a persistent problem then

Oracle Support should be contacted to investigate and resolve the problem.

 

Step 2: To verify that a power supply has a persistent failure, perform either Step2a or Step 2b of this document.

The ASR alarm will identify the power supply number that has generated the alarm.

Step 2a

The web interface  of the ILOM can be used to examine if any open problems exist

 EL Example

 

Step 2b

The cli interface can also be used

 

 

Step 3

This step details methods to check to determine if the persistent alarm is expected.

If the persistent alarm "Loss of AC power" is displayed because the incoming power is

not present then the AC or cabling should be checked.

If the persistent alarm "Loss of AC power" is displayed despite the presence of incoming

power in the affected supply then Oracle Service should be contacted using the ASR opened

for the alarm.

 

-> show /System/Open_Problems

Open Problems (1)

Date/Time                 Subsystems          Component

------------------------  ------------------      ------------

Fri Nov  7 14:14:36 2014  Power               PS0 (Power Supply 0)        

        A loss of AC input to a power supply has occurred.(Probability: 100,
        UUID: 3df33afb-7ed3-c9ee-8da4-b23a0899be76, Part Number: 7047410,
        Serial Number: 476856F+1302CE01CN, Reference Document: http://www.sun.com/msg/SPX86-8003-EL)

->

How to check using the web interface  of the ILOM System > Power > Details

The example below shows the input power is missing and the AC inputs or cabling needs

to checked.

 

No input power

 

The example below shows  the input power is present but there is no power out.

The ASR should be updated with this information and to request an that an Oracle

Service Representative investigate further.

 

Suspect Power Supply

 

If the utility ipmitool is available then the ipmitool command "sdr list all" can be used to check the incoming AC and DC output of a power supply. It can also be used to check whether a particular supply is indicating a fault.

 

Note that the actual command line to use ipmitool will depend on the configuration of the system and the host

on which the command is run.

 

The example belows shows that PS0 is behaving as expected with input and output power

PS1 is showing no input power and no output power and is flagging AC is lost.

AC input to this power supply should be investigated. The power supply itself is unlikely to be faulty.

 

ipmitool  sdr list all

<output omitted>

PS0/P_IN         | 140 Watts         | ok
PS0/P_OUT        | 130 Watts         | ok
PS0/V_IN         | 248 Volts         | ok
PS0/V_12V        | 12 Volts          | ok
PS0/V_12V_STBY   | 11.82 Volts       | ok
PS0/T_OUT        | 23 degrees C      | ok
PS0/STATE        | 0 unspecified     | nc
PS1/P_IN         | 0 Watts           | ok
PS1/P_OUT        | 0 Watts           | ok
PS1/V_IN         | 0 Volts           | ok
PS1/V_12V        | 0 Volts           | ok
PS1/V_12V_STBY   | 0 Volts           | ok
PS1/T_OUT        | 20 degrees C      | ok
PS1/STATE        | 0 unspecified     | nc

<output omitted>

 

 

 The example below an example where Oracle support should be contacted.

PS1 is functioning correctly but PS0 has voltage and power entering power

supply but no power or voltage output.

ipmitool sdr list all

<output omitted>

PS0/P_IN         | 90 Watts          | ok
PS0/P_OUT        | 0 Watts           | ok
PS0/V_IN         | 246 Volts         | ok
PS0/V_12V        | 0 Volts           | ok
PS0/V_12V_STBY   | 0 Volts           | ok
PS0/T_OUT        | 37 degrees C      | ok
PS0/STATE        | 0 unspecified     | nc
PS1/P_IN         | 220 Watts         | ok
PS1/P_OUT        | 200 Watts         | ok
PS1/V_IN         | 246 Volts         | ok
PS1/V_12V        | 12 Volts          | ok
PS1/V_12V_STBY   | 11.88 Volts       | ok
PS1/T_OUT        | 40 degrees C      | ok
PS1/STATE        | 0 unspecified     | nc

 

<output omitted>






 

Summary

 

If the failure was transient and is understood from known site power or service activities, then no further action is required. The SR will close in 14 days.

 

If the failure has been verified as persistent and cause is unknown, or there is concern about the original alarm occurring then engage Oracle Support by one of the following methods

a) Update the SR - A support engineer will be assigned to assist. For the ZFS Storage Appliance, also collect a supportbundle (See Doc ID 1019887.1).

b) Phone your local Oracle support number and request the SR be assigned to the next available engineer.

A ILOM snapshot file should be uploaded to allow further analysis.

"Disabling ASR During Maintenance"

Did you know that during planned maintenance activities you can, if you wish, disable ASR to prevent these events being sent to Oracle (from release 5.4). Instructions on how to do this are in the ASR Managers Guide section 4.11 (http://docs.oracle.com/cd/E37710_01/install.41/e18475.pdf)

 

 

Example alarm:

Hostname: example-ilom
Product Type: SUN FIRE X4170 M3
Summary:ASR: Lack of AC input power.

Fault event knowledge article: https://support.oracle.com/msg/SPX86-8003-73

Fault event description: A loss of AC input to a power supply has occurred.
SunHwTrapFaultDiagnosed
Event Time = Thu Nov 6 06:55:46 2014
Fault Message ID = SPX86-8003-EL
Fault UUID = zyz09673-999-4431-ed726-de232b392892
Knowledge Article URL = https://support.oracle.com/msg/SPX86-8003-EL

Fault Description =
Fault Severity = 0
Product Manufacturer = Oracle Corporation
Product Name = Exalogic X3-2 Upg
Product Serial Number = AKEXAMPLE
Product Part Number = Exalogic X3-2 Upg
Component System Manufacturer = Oracle Corporation
Component System Name = SUN FIRE X4170 M3
Component System Serial Number = 123EXAMPLE
Component System Part Number = 7067084
Chassis Manufacturer = Oracle Corporation
Chassis Name = SUN FIRE X4170 M3
Chassis Serial Number = 1325EXAMPLE
Chassis Part Number = 7067084
DiagEntity = fdd(1)
SystemIdentifier = Oracle Exalogic X2-2 AKEXAMPLE
Hostname = example-ilom


SuspectCount = 1
Event
Suspect 1 Information
SuspectFruFaultCertainty = 100
SuspectFruFaultClass = fault.chassis.power.ext-fail
SuspectFruName =
SuspectFruLocation = /SYS/PS0
SuspectFruChassisId = 1325EXAMPLE
SuspectFruManufacturer =
SuspectFruPn = 07047410
SuspectFruSn = 476856F+1317CE0113
SuspectFruRevision = A256_Power_Supply
SuspectFruStatus = faulted(3)

System serial = AKEXAMPLE
System type = EXALOGIC X3-2 UPG



Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback