Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1637430.1
Update Date:2018-01-25
Keywords:

Solution Type  Problem Resolution Sure

Solution  1637430.1 :   Exadata X3-2 : A processor gets an "Intel Thermtrip Failure" after a SP reset  


Related Items
  • Exadata X3-2 Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>x86>Engineered Systems HW>SN-x86: Exadata ASR
  •  


On Exadata X3-2, a cpu thermal trip fault SPX86-8003-K5 can be observed after a SP/ILOM reset.
The document is intended to provide a workaround and fix for customers hitting this issue.

In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-8658214941>

Applies to:

Exadata X3-2 Hardware - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

On an Exadata X3-2, CPU Thermal Trip fault.cpu.intel.thermtrip with ILOM fault SPX86-8003-K5 a short time after an ILOM reset.

This is a known issue which has been seen mostly on /SYS/MB/P1 but it has also been occasionally seen on P0 and sometimes on both CPU's at similar time.

Ex :

faultmgmtsp> fmadm faulty
------------------- ------------------------------------ -------------- --------
Time                UUID                                 msgid          Severity
------------------- ------------------------------------ -------------- --------
2014-03-06/07:29:15 220d2d69-6878-e897-97e4-b81ca34ae521 SPX86-8003-K5  Critical

Fault class : fault.cpu.intel.thermtrip

ASRU        : /SYS/MB/P1
               faulted

FRU         : /SYS/MB/P1
             (Part Number: 060D)
             (Serial Number: unknown)  100%
               faulty

Description : A thermtrip signal has occurred on a server component.

Response    : The service-required LEDs for the affected component,
             TEMP_FAULT, and chassis will be illuminated.

Impact      : The server will be powered down immediately.

Action      : Please refer to the associated reference document at
             http://www.sun.com/msg/SPX86-8003-K5 for the latest service
             procedures and policies regarding this diagnosis.

 

Changes

The customer did a SP reset.

Cause

During the ILOM reset the fan control is not working properly and a CPU can actually temporarily heat up and trip the threshold. 

Solution

Workaround: Clear the fault with "set clear_fault_action=true" and reboot the server.

-> set /SYS/MB/P1 clear_fault_action=true
Are you sure you want to clear /SYS/MB/P1 (y/n)? y
Set 'clear_fault_action' to 'true'

REFERENCE
http://docs.oracle.com/cd/E19860-01/E21549/z40013e61440963.html

 

Resolution: Update ILOM/BIOS contained in OS Image 11.2.3.3.0 

References

<NOTE:1501450.1> - INTERNAL Exadata Database Machine Hardware Current Product Issues (X3-2, X4-2, X3-8, X4-8 w/X4-2L)
<BUG:16634342> - X3-2: THERMTRIP EVENT ON P1 JUST AFTER SP RESET
<NOTE:1433134.1> - SPX86-8003-K5 - Intel Thermtrip Failure.

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback