Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1588791.1
Update Date:2015-04-02
Keywords:

Solution Type  Problem Resolution Sure

Solution  1588791.1 :   36p QDR InfiniBand switch running version 1.3.3-2 inaccurately reports temperature alert  


Related Items
  • Sun Datacenter InfiniBand Switch 36
  •  
  • Oracle SuperCluster T5-8 Hardware
  •  
  • Oracle Exadata Hardware
  •  
  • Exalogic Elastic Cloud X3-2 Hardware
  •  
  • Exadata X3-2 Half Rack
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Exadata>DB: Exadata_EST
  •  




Applies to:

Sun Datacenter InfiniBand Switch 36 - Version Not Applicable to Not Applicable [Release N/A]
Oracle Exadata Hardware - Version 11.2.0.2 and later
Exalogic Elastic Cloud X3-2 Hardware
Oracle SuperCluster T5-8 Hardware - Version All Versions to All Versions [Release All Releases]
Exadata X3-2 Half Rack
Information in this document applies to any platform.

Symptoms

36 port QDR InfiniBand switches running software version less than 2.1.3-4 in Oracle engineered systems, including Exadata, Exalogic, and SuperCluster, may inaccurately report a temperature alert. The temperature alert may be seen in one of the following locations:

  • Oracle Enterprise Manager
  • /var/log/messages on the switch
  • ILOM event log of the switch

 

Example Oracle Enterprise Manager alert

Target type=Oracle Infiniband Switch
Target name=dm01sw-ibb0
Categories=Fault
Message=The aggregate sensor /SYS/TEMP_ATTN has a fault.
Severity=Critical

 

Example /var/log/messages entry

Aug 12 10:32:06 dm01sw-ibb0 envd[1559]: SP temperature 61 too high
Aug 12 10:32:09 dm01sw-ibb0 PEF: Sensor 'CHASSIS_STATUS' Type 0xC0 Data1 0x01 Entity 7 Instance 0
Aug 12 10:32:09 dm01sw-ibb0 PEF: Sensor 'TEMP_ATTN' Type 0xC0 Data1 0x01 Entity 7 Instance 0
Aug 12 10:32:12 dm01sw-ibb0 envd[1559]: SP temperature 60
Aug 12 10:32:19 dm01sw-ibb0 PEF: Sensor 'CHASSIS_STATUS' Type 0xC0 Data1 0x00 Entity 7 Instance 0
Aug 12 10:32:19 dm01sw-ibb0 PEF: Sensor 'TEMP_ATTN' Type 0xC0 Data1 0x00 Entity 7 Instance 0

 

Example ILOM event log entry

1438   Mon Aug 12 10:32:19 2013  IPMI      Log       critical
       ID =  525 : 08/12/2013 : 10:32:19 : OEM sensor : TEMP_ATTN : State Deasserted
1437   Mon Aug 12 10:32:19 2013  IPMI      Log       critical
       ID =  524 : 08/12/2013 : 10:32:19 : OEM sensor : CHASSIS_STATUS : State Deasserted
1436   Mon Aug 12 10:32:09 2013  IPMI      Log       critical
       ID =  523 : 08/12/2013 : 10:32:09 : OEM sensor : TEMP_ATTN : State Asserted
1435   Mon Aug 12 10:32:09 2013  IPMI      Log       critical
       ID =  522 : 08/12/2013 : 10:32:09 : OEM sensor : CHASSIS_STATUS : State Asserted

 

 

Cause

A temperature sensor in 36 port QDR InfiniBand switches running firmware version less than 2.1.3-4 is prone to high variability, which can lead to an inaccurate temperature alert. 

This is registered in the unpublished bug 17347360 for Exadata.

The unpublished bug 14041434 also exists for a similar issue on Exalogic.

Solution

TEMP_ATTN sensor alert may be ignored until InfiniBand switch firmware 2.1.3-4 (or later) is adopted by your Oracle engineered system platform.  The switch will raise a different temperature alert if any component temperature sensor is genuinely out of range.  InfiniBand switch firmware 2.1.3-4 uses a different sensor for TEMP_ATTN to more accurately determine high temperatures that will cause switch shutdown.

With switch firmware version 1.3.3-2, the back and front board temperature sensors should be used as indicators of switch ambient temperature.  The proper ambient temperature range is 10 degrees C to 35 degrees C.  Back and front board temperature sensor readings can be obtained with the showtemps command.

# showtemps
Back temperature 18
Front temperature 18

 

 

References

<BUG:17347360> - MANAGEMENT CPU (SERVICE PROCESSOR) TEMPERATURE READOUT VARIABILITY
<BUG:14041434> - INFINIBAND SWITCH BLOWING HOT AIR IN RACK AND CAUSES TEMPERATURE ISSUE

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback