![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||
Solution Type Predictive Self-Healing Sure Solution 1967601.1 : How to investigate the Auto Service Request alarm SPX86-8003-EL
This article describes activity required by a System Administrator to verify whether action has to be taken when the X86 ASR power supply alarm SPX86-8003-73 has occurred on a system based on the Sun Server X3-2 or X4-2 architecture. In this Document
Applies to:Exadata X3-8 Hardware - Version All Versions to All Versions [Release All Releases]Exadata X3-2 Half Rack - Version All Versions to All Versions [Release All Releases] Exadata X4-2 Quarter Rack - Version All Versions to All Versions [Release All Releases] Exadata X5-2 Eighth Rack - Version All Versions to All Versions [Release All Releases] Exadata X5-2 Full Rack - Version All Versions to All Versions [Release All Releases] x86 PurposeThis article describes activity required by a System Administrator to verify whether action has to be taken on the X86 ASR power supply alarm SPX86-8003-EL ScopeThis document is intended for system administrators and support personnel. DetailsAuto Service Request (ASR) provides automatic failure detection and SR creation for Oracle X86 systems. See http://www.oracle.com/us/asr/index.html for more information on ASR. Description of the ASR Event:Power supply events can be both transient or persistent. They can be generated by external changes and actions, most notably by the removal of AC from a power supply. Additional checks may need to performed in order to understand the cause of this ASR event. If a persistent failure has occurred, or if a power event or events cannot be explained by changes in the supplied power or work being carried out on the machine, then further investigation by a support engineer may be required If the event has been been caused by changes in site power or a similar event then no action need be taken. There is a small subset of situations where further action need be taken when a SPX86-8003-EL has created an Automatic Service Request (ASR) and this document outlines how this can be determined. Please find an example ASR alarm at the bottom of this document How to verify if the power supply alarm is persistent or transient.
Step 1: Identify the system or systems that experienced this ASR Alarm. The Auto Service Request (ASR) will be logged against the serial number of the machine that generated the alarm. The information provided by the alarm will contain the hostname of the machine or the machine's Service Processor. If the SPX73-8003-EL is persistent then the input power should be checked for that power supply. If the input power is absent then the problem is with the AC provided to the server. If the input power is present and the power supply is experiencing a persistent problem then Oracle Support should be contacted to investigate and resolve the problem.
Step 2: To verify that a power supply has a persistent failure, perform either Step2a or Step 2b of this document. The ASR alarm will identify the power supply number that has generated the alarm. Step 2a The web interface of the ILOM can be used to examine if any open problems exist
Step 2b The cli interface can also be used
Step 3 This step details methods to check to determine if the persistent alarm is expected. If the persistent alarm "Loss of AC power" is displayed because the incoming power is not present then the AC or cabling should be checked. If the persistent alarm "Loss of AC power" is displayed despite the presence of incoming power in the affected supply then Oracle Service should be contacted using the ASR opened for the alarm.
-> show /System/Open_Problems Open Problems (1) Date/Time Subsystems Component ------------------------ ------------------ ------------ Fri Nov 7 14:14:36 2014 Power PS0 (Power Supply 0) A loss of AC input to a power supply has occurred.(Probability: 100, -> How to check using the web interface of the ILOM System > Power > Details The example below shows the input power is missing and the AC inputs or cabling needs to checked.
The example below shows the input power is present but there is no power out. The ASR should be updated with this information and to request an that an Oracle Service Representative investigate further.
If the utility ipmitool is available then the ipmitool command "sdr list all" can be used to check the incoming AC and DC output of a power supply. It can also be used to check whether a particular supply is indicating a fault.
Note that the actual command line to use ipmitool will depend on the configuration of the system and the host on which the command is run.
The example belows shows that PS0 is behaving as expected with input and output power PS1 is showing no input power and no output power and is flagging AC is lost. AC input to this power supply should be investigated. The power supply itself is unlikely to be faulty.
ipmitool sdr list all <output omitted> PS0/P_IN | 140 Watts | ok
The example below an example where Oracle support should be contacted. PS1 is functioning correctly but PS0 has voltage and power entering power supply but no power or voltage output. ipmitool sdr list all <output omitted> PS0/P_IN | 90 Watts | ok
<output omitted>
Summary
If the failure was transient and is understood from known site power or service activities, then no further action is required. The SR will close in 14 days.
If the failure has been verified as persistent and cause is unknown, or there is concern about the original alarm occurring then engage Oracle Support by one of the following methods a) Update the SR - A support engineer will be assigned to assist. For the ZFS Storage Appliance, also collect a supportbundle (See Doc ID 1019887.1). b) Phone your local Oracle support number and request the SR be assigned to the next available engineer. A ILOM snapshot file should be uploaded to allow further analysis. "Disabling ASR During Maintenance" Did you know that during planned maintenance activities you can, if you wish, disable ASR to prevent these events being sent to Oracle (from release 5.4). Instructions on how to do this are in the ASR Managers Guide section 4.11 (http://docs.oracle.com/cd/E37710_01/install.41/e18475.pdf)
Example alarm:Hostname: example-ilom Fault Description =
Attachments This solution has no attachment |
||||||||||||||||||||||||
|