Asset ID: |
1-72-1582782.1 |
Update Date: | 2015-10-17 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1582782.1
:
FMA and ASR incorrectly reporting L1/L0 CPU cache or internal CPU faults on Exadata CELLs during patching
Related Items |
- Exadata Database Machine X2-2 Qtr Rack
- Oracle Exalogic Elastic Cloud X2-2 Qtr Rack
- Oracle Exalogic Elastic Cloud X2-2 One-Eighth Rack
- Exadata Database Machine X2-2 Full Rack
- Oracle Exalogic Elastic Cloud X2-2 Full Rack
- Exadata Database Machine X2-8
- Exadata Database Machine X2-2 Half Rack
- Exadata Database Machine X2-2 Hardware
- Oracle Exalogic Elastic Cloud X2-2 Hardware
- Oracle Exalogic Elastic Cloud X2-2 Half Rack
|
Related Categories |
- PLA-Support>Sun Systems>x86>Engineered Systems HW>SN-x64: EXADATA
|
In this Document
Applies to:
Oracle Exalogic Elastic Cloud X2-2 One-Eighth Rack - Version X2 and later
Oracle Exalogic Elastic Cloud X2-2 Hardware - Version X2 and later
Exadata Database Machine X2-2 Qtr Rack - Version All Versions and later
Exadata Database Machine X2-2 Full Rack - Version All Versions and later
Exadata Database Machine X2-2 Hardware - Version All Versions and later
Information in this document applies to any platform.
Symptoms
When applying Exadata patch updates on CELLs, a L0/L1 CPU Cache or internal CPU fault occurs on either CPU0 or CPU1 during the first reboot of the HOST after the staging of the patch. Note: X4800 and X2-8 DB nodes are not affected by this issue, only the X2 CELLs are. You might see any of these FMA faults or others:
SPX86-8000-P5
SPX86-8000-LH
SPX86-8000-QS
SPX86-8000-F4
The Following FMA Error is an example of one fault that may be seen on either CPU0 or CPU1:
fma/@usr@local@bin@fmadm_faulty.out
------------------- ------------------------------------ -------------- --------
Time UUID msgid Severity
------------------- ------------------------------------ -------------- --------
2013-03-26/19:08:08 fd8b3d5d-ce70-e4a3-83cf-8e97b84c4ba2 SPX86-8000-P5 Critical
Fault class : fault.cpu.intel.l1cache
FRU : /SYS/MB/P1
(Part Number: 060C)
(Serial Number: unknown)
Description : A level 1 cache fault on a processor has occurred.
Response : If running Solaris, the processor will be off-lined upon
system reboot to prevent future interruptions.
Impact : System will panic and reset and system performance may be
impacted.
Action : The administrator should review the ILOM event log for
additional information pertaining to this diagnosis. Please
refer to the Details section of the Knowledge Article for
additional information.
ilom/@usr@local@bin@spshexec_show_-script_@X@logs@event@list.out
2302 Tue Mar 26 19:08:08 2013 Fault Fault critical
Fault detected at time = Tue Mar 26 20:08:08 2013. The suspect component:
/SYS/MB/P1 has fault.cpu.intel.l1cache with probability=100. Refer to ht
tp://www.sun.com/msg/SPX86-8000-P5 for details.
The Following ASR errors may be reported on CPU0 or CPU1:
Hostname: xda001cel01-ilom
Product Type: SUN FIRE X4270 M2 SERVER
Summary:ASR:Level 1 Cache Fault on Processor
Note: This is a component of an Exadata Device.
Note: This Service Request is for an Exadata product [http://www.oracle.com/exadata].
Refer to - http://www.sun.com/msg/SPX86-8000-P5
http://www.sun.com/msg/SPX86-8000-P5
sunHwTrapSystemIdentifier = Exadata Database Machine X2-2 AK00019313
sunHwTrapChassisId = 1119FMM06A
sunHwTrapProductName = SUN FIRE X4270 M2 SERVER
sunHwTrapSuspectComponentName = /SYS/MB/P0
sunHwTrapFaultClass = fault.cpu.intel.l1cache
sunHwTrapFaultCertainty = 100
sunHwTrapFaultMessageID = http://www.sun.com/msg/SPX86-8000-P5
sunHwTrapFaultUUID = 363d10d7-e0ac-eafd-9596-be78fc105f78
sunHwTrapAssocObjectId = .1.3.6.1.2.1.47.1.1.1.1.2.9
sunHwTrapAdditionalInfo =
Alerts received in last 30 days (limit 10)
Date Summary SR
ASR:Internal Processor Fault
Hostname: serverx4270m2
Product Type: SUN FIRE X4270 M2 SERVER
SunHwTrapFaultDiagnosed
Note: This is a component of an Exadata Device.
Note: This Service Request is for an Exadata product [http://www.oracle.com/exadata].
Details: Refer to https://support.oracle.com/msg/SPX86-8000-F4
Event Time = Thu Oct 31 00:58:31 2013
Fault Message ID = SPX86-8000-F4
Fault UUID = b754e4e5-ff44-49cc-f8b1-8e1cede3216a
Knowledge Article URL = https://support.oracle.com/msg/SPX86-8000-F4
Fault Description =
Fault Severity = 0
Product Manufacturer = Oracle Corporation
Product Name = SUN FIRE X4270 M2 SERVER
Product Serial Number = 1234FMM0RX
Product Part Number = 602-4982-02
Chassis Manufacturer = Oracle Corporation
Chassis Name = SUN FIRE X4270 M2 SERVER
Chassis Serial Number = 1234FMM0RX
Chassis Part Number = 602-4982-02
DiagEntity = fdd(1)
SystemIdentifier = Exadata Database Machine X2-2 AK00000465
SuspectCount = 1
Event Suspect 1 Information
SuspectFruFaultCertainty = 100
SuspectFruFaultClass = fault.cpu.intel.internal
SuspectFruName = Intel(R) Xeon(R) CPU L5640 @ 2.27GHz
SuspectFruLocation = /SYS/MB/P0
SuspectFruChassisId = 1234FMM0RX
SuspectFruManufacturer =
SuspectFruPn = 060C
SuspectFruSn =
SuspectFruRevision =
SuspectFruStatus = faulted(3)
Alerts received in last 30 days (limit 10)
Date Summary SR
Changes
Initial reboot after applying Exadata patch updates.
It is believed this event is a false positive and evidence suggests there is nothing wrong with the CPU
Cause
It is believed during the image update cache is left in an unclean state. This is still under investigation.
Solution
Workaround: Clear the fault in ILOM, and continue the patching process. If the system continues normally, then no CPU replacement is needed.
Reference document for clearing Ilom faults:
PSH Procedural Article for ILOM-Based Diagnosis (Doc ID 1155200.1)
Reference document for CPU Cache fault:
SPX86-8000-P5 - Level 1 Cache Fault on Processor (Doc ID 1021226.1)
SPX86-8000-LH - Level 0 Instruction Cache Fault on Processor (Doc ID 1021244.1)
SPX86-8000-QS - Level 1 Data Cache Fault on Processor (Doc ID 1021227.1)
SPX86-8000-F4 - Internal Processor Fault (Doc ID 1021232.1)
Attachments
This solution has no attachment