Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1582782.1
Update Date:2015-10-17
Keywords:

Solution Type  Problem Resolution Sure

Solution  1582782.1 :   FMA and ASR incorrectly reporting L1/L0 CPU cache or internal CPU faults on Exadata CELLs during patching  


Related Items
  • Exadata Database Machine X2-2 Qtr Rack
  •  
  • Oracle Exalogic Elastic Cloud X2-2 Qtr Rack
  •  
  • Oracle Exalogic Elastic Cloud X2-2 One-Eighth Rack
  •  
  • Exadata Database Machine X2-2 Full Rack
  •  
  • Oracle Exalogic Elastic Cloud X2-2 Full Rack
  •  
  • Exadata Database Machine X2-8
  •  
  • Exadata Database Machine X2-2 Half Rack
  •  
  • Exadata Database Machine X2-2 Hardware
  •  
  • Oracle Exalogic Elastic Cloud X2-2 Hardware
  •  
  • Oracle Exalogic Elastic Cloud X2-2 Half Rack
  •  
Related Categories
  • PLA-Support>Sun Systems>x86>Engineered Systems HW>SN-x64: EXADATA
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Applies to:

Oracle Exalogic Elastic Cloud X2-2 One-Eighth Rack - Version X2 and later
Oracle Exalogic Elastic Cloud X2-2 Hardware - Version X2 and later
Exadata Database Machine X2-2 Qtr Rack - Version All Versions and later
Exadata Database Machine X2-2 Full Rack - Version All Versions and later
Exadata Database Machine X2-2 Hardware - Version All Versions and later
Information in this document applies to any platform.

Symptoms

When applying Exadata patch updates on CELLs, a L0/L1 CPU Cache or internal CPU fault occurs on either CPU0 or CPU1 during the first reboot of the HOST after the staging of the patch.  Note: X4800 and X2-8 DB nodes are not affected by this issue, only the X2 CELLs are.  You might see any of these FMA faults or others:

SPX86-8000-P5

SPX86-8000-LH

SPX86-8000-QS

SPX86-8000-F4

 

The Following FMA Error is an example of one fault that may be seen on either CPU0 or CPU1:

 

fma/@usr@local@bin@fmadm_faulty.out

------------------- ------------------------------------ -------------- --------
Time                UUID                                 msgid          Severity
------------------- ------------------------------------ -------------- --------
2013-03-26/19:08:08 fd8b3d5d-ce70-e4a3-83cf-8e97b84c4ba2 SPX86-8000-P5  Critical

Fault class : fault.cpu.intel.l1cache

FRU         : /SYS/MB/P1
             (Part Number: 060C)
             (Serial Number: unknown)

Description : A level 1 cache fault on a processor has occurred.

Response    : If running Solaris, the processor will be off-lined upon
             system reboot to prevent future interruptions.

Impact      : System will panic and reset and system performance may be
             impacted.

Action      : The administrator should review the ILOM event log for
             additional information pertaining to this diagnosis.  Please
             refer to the Details section of the Knowledge Article for
             additional information.

ilom/@usr@local@bin@spshexec_show_-script_@X@logs@event@list.out

2302   Tue Mar 26 19:08:08 2013  Fault     Fault     critical
      Fault detected at time = Tue Mar 26 20:08:08 2013. The suspect component:
       /SYS/MB/P1 has fault.cpu.intel.l1cache with probability=100. Refer to ht
      tp://www.sun.com/msg/SPX86-8000-P5 for details.

 

The Following ASR errors may be reported on CPU0 or CPU1:

Hostname: xda001cel01-ilom
Product Type: SUN FIRE X4270 M2 SERVER
Summary:ASR:Level 1 Cache Fault on Processor

Note: This is a component of an Exadata Device.
Note: This Service Request is for an Exadata product [http://www.oracle.com/exadata].
Refer to - http://www.sun.com/msg/SPX86-8000-P5
http://www.sun.com/msg/SPX86-8000-P5
sunHwTrapSystemIdentifier = Exadata Database Machine X2-2 AK00019313
sunHwTrapChassisId = 1119FMM06A
sunHwTrapProductName = SUN FIRE X4270 M2 SERVER
sunHwTrapSuspectComponentName = /SYS/MB/P0
sunHwTrapFaultClass = fault.cpu.intel.l1cache
sunHwTrapFaultCertainty = 100
sunHwTrapFaultMessageID = http://www.sun.com/msg/SPX86-8000-P5
sunHwTrapFaultUUID = 363d10d7-e0ac-eafd-9596-be78fc105f78
sunHwTrapAssocObjectId = .1.3.6.1.2.1.47.1.1.1.1.2.9
sunHwTrapAdditionalInfo =

Alerts received in last 30 days (limit 10)
Date Summary SR           

ASR:Internal Processor Fault
Hostname: serverx4270m2
Product Type: SUN FIRE X4270 M2 SERVER

SunHwTrapFaultDiagnosed
Note: This is a component of an Exadata Device.
Note: This Service Request is for an Exadata product [http://www.oracle.com/exadata].
Details: Refer to https://support.oracle.com/msg/SPX86-8000-F4

Event Time = Thu Oct 31 00:58:31 2013
Fault Message ID = SPX86-8000-F4
Fault UUID = b754e4e5-ff44-49cc-f8b1-8e1cede3216a
Knowledge Article URL = https://support.oracle.com/msg/SPX86-8000-F4
Fault Description =
Fault Severity = 0
Product Manufacturer = Oracle Corporation
Product Name = SUN FIRE X4270 M2 SERVER
Product Serial Number = 1234FMM0RX
Product Part Number = 602-4982-02
Chassis Manufacturer = Oracle Corporation
Chassis Name = SUN FIRE X4270 M2 SERVER
Chassis Serial Number =
1234FMM0RX
Chassis Part Number = 602-4982-02
DiagEntity = fdd(1)
SystemIdentifier = Exadata Database Machine X2-2 AK00000465

SuspectCount = 1
Event Suspect 1 Information
SuspectFruFaultCertainty = 100
SuspectFruFaultClass = fault.cpu.intel.internal
SuspectFruName = Intel(R) Xeon(R) CPU L5640 @ 2.27GHz
SuspectFruLocation = /SYS/MB/P0
SuspectFruChassisId = 1234FMM0RX
SuspectFruManufacturer =
SuspectFruPn = 060C
SuspectFruSn =
SuspectFruRevision =
SuspectFruStatus = faulted(3)


Alerts received in last 30 days (limit 10)
Date Summary SR


Changes

Initial reboot after applying Exadata patch updates.

It is believed this event is a false positive and evidence suggests there is nothing wrong with the CPU

Cause

 It is believed during the image update cache is left in an unclean state.  This is still under investigation.

Solution

Workaround: Clear the fault in ILOM, and continue the patching process. If the system continues normally, then no CPU replacement is needed.

 

Reference document for clearing Ilom faults:

PSH Procedural Article for ILOM-Based Diagnosis (Doc ID 1155200.1)


Reference document for CPU Cache fault:

SPX86-8000-P5 - Level 1 Cache Fault on Processor (Doc ID 1021226.1)

SPX86-8000-LH - Level 0 Instruction Cache Fault on Processor (Doc ID 1021244.1)

SPX86-8000-QS - Level 1 Data Cache Fault on Processor (Doc ID 1021227.1)

SPX86-8000-F4 - Internal Processor Fault (Doc ID 1021232.1)


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback