Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2321249.1
Update Date:2017-12-11
Keywords:

Solution Type  Problem Resolution Sure

Solution  2321249.1 :   Exalogic X3-2: "fault.cpu.intel.thermtrip" Fault In Compute Node ILOM After SP Reset  


Related Items
  • Exalogic Elastic Cloud X3-2 Hardware
  •  
  • Oracle Exalogic Elastic Cloud Software
  •  
Related Categories
  • PLA-Support>Eng Systems>Exalogic/OVCA>Oracle Exalogic>MW: Exalogic Core
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-15990793781>

Applies to:

Exalogic Elastic Cloud X3-2 Hardware - Version X3 to X3 [Release X3]
Oracle Exalogic Elastic Cloud Software - Version 2.0.0.0.1 to 2.0.6.2.0
Linux x86-64
Oracle Virtual Sever (64-bit)

Symptoms

On Exalogic X3-2 Compute Nodes CPU Thermal Trip ILOM fault message "fault.cpu.intel.thermtrip" is seen after ILOM SP reset.

Almost the same time when getting an error "Intel Thermtrip Failure", system may reboot to prevent overheat.

From fmdump_-v.out in ILOM Snapshot, we can see the error 'fault.cpu.intel.thermtrip' as follows which shows as repaired/resolved several minutes after system reboots.

2017-10-24/01:07:15 5e17fe16-7506-6b75-b404-a0f838ebb787 SPX86-8003-K5

fault = fault.cpu.intel.thermtrip@/SYS/MB/P1
certainty = 100.0 %
FRU = /SYS/MB/P1
ASRU = /SYS/MB/P1
resource = /SYS/MB/P1
_list_sz = 1
_list_idx = 0
system_component_serial_number = XXXXXXXXXX
system_component_part_number = XXXXXX
system_component_name = SUN FIRE X4170 M3
system_component_manufacturer = Oracle Corporation
chassis_serial_number = XXXXXXXXX
chassis_part_number = XXXXXXX
chassis_name = SUN FIRE X4170 M3
chassis_manufacturer = Oracle Corporation
system_serial_number = AKXXXXXXXX
system_part_number = Exalogic X3-2
system_name = Exalogic X3-2
system_manufacturer = Oracle Corporation
fru_name = Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz
fru_part_number = 060D
[skipped fruid update]

2017-10-24/01:14:33 5e17fe16-7506-6b75-b404-a0f838ebb787 SPX86-8003-K5 Repaired

2017-10-24/01:14:33 5e17fe16-7506-6b75-b404-a0f838ebb787 SPX86-8003-K5 Resolved

Above error been seen mostly on /SYS/MB/P1 but it has also been occasionally seen on P0 and sometimes on both CPU's at similar time.

From host_debug_err.log in ILOM Snapshot, ILOM and system reset can be verified as follows:

Tue Oct 24 01:01:21 2017 ID ffff
**** SP Boot ****

Tue Oct 24 01:07:14 2017 ID ffff
**** Host Boot ****

Tue Oct 24 01:14:50 2017 ID ffff P0 Fatals GFERRST 0x00000000,GFFERRST 0x00000000,GFNERRST 0x00000000, 0
Tue Oct 24 01:14:51 2017 ID ffff P0 Correctable:FIRST:DMI:DMI:PCIE_XP:76:PCI link bandwidth changed
Tue Oct 24 01:14:51 2017 ID ffff P0 PCIE_XP:DMI:DMI: XPCORERR:00000001
Tue Oct 24 01:14:51 2017 ID ffff P0 Nonfatal:FIRST:DMI:DMI:PCIE_XP:80:Received PCIE completion with UR
Tue Oct 24 01:14:51 2017 ID ffff P0,DMI:DMI PCIE_XP: : XPUNCERR:00000040
Tue Oct 24 01:14:51 2017 ID ffff P0, GN 0x00100000,GNF 0x00100000,GNN 0x00000000
Tue Oct 24 01:14:52 2017 ID ffff P1 Fatals GFERRST 0x00000000,GFFERRST 0x00000000,GFNERRST 0x00000000, 0
Tue Oct 24 01:14:56 2017 ID ffff

**** Host Boot **** 

Changes

 SP reset

Cause

This is an ILOM known issue Sun Server X3-3

Bug 16634342 - X3-2: Thermtrip event on P1 just after SP reset

During the ILOM reset the fan control is not working properly and a CPU can actually temporarily heat up and trip the threshold. 

Solution

ILOM/BIOS version 3.1.2.10.C includes the fix.

In Exalogic environment, PSU 2.0.6.2.1 or above have fix for this issue. To resolve this issue upgrade to PSU 2.0.6.2.1 or later.

More information about Exalogic PSU, please refer below document.

Exalogic Patch Set Updates (PSU) Master Note (Doc ID 1314535.1)
 

References

<BUG:16634342> - X3-2: THERMTRIP EVENT ON P1 JUST AFTER SP RESET
<NOTE:1314535.1> - Exalogic Patch Set Updates (PSU) Master Note
<NOTE:1530781.1> - Exalogic Infrastructure Physical and Virtual Releases/PSUs – Software and Firmware Version Information
<NOTE:1268557.1> - Exalogic Elastic Cloud Software Known Issues

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback