![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Problem Resolution Sure Solution 1588791.1 : 36p QDR InfiniBand switch running version 1.3.3-2 inaccurately reports temperature alert
Applies to:Sun Datacenter InfiniBand Switch 36 - Version Not Applicable to Not Applicable [Release N/A]Oracle Exadata Hardware - Version 11.2.0.2 and later Exalogic Elastic Cloud X3-2 Hardware Oracle SuperCluster T5-8 Hardware - Version All Versions to All Versions [Release All Releases] Exadata X3-2 Half Rack Information in this document applies to any platform. Symptoms36 port QDR InfiniBand switches running software version less than 2.1.3-4 in Oracle engineered systems, including Exadata, Exalogic, and SuperCluster, may inaccurately report a temperature alert. The temperature alert may be seen in one of the following locations:
Example Oracle Enterprise Manager alert Target type=Oracle Infiniband Switch
Target name=dm01sw-ibb0 Categories=Fault Message=The aggregate sensor /SYS/TEMP_ATTN has a fault. Severity=Critical
Example /var/log/messages entry Aug 12 10:32:06 dm01sw-ibb0 envd[1559]: SP temperature 61 too high
Aug 12 10:32:09 dm01sw-ibb0 PEF: Sensor 'CHASSIS_STATUS' Type 0xC0 Data1 0x01 Entity 7 Instance 0 Aug 12 10:32:09 dm01sw-ibb0 PEF: Sensor 'TEMP_ATTN' Type 0xC0 Data1 0x01 Entity 7 Instance 0 Aug 12 10:32:12 dm01sw-ibb0 envd[1559]: SP temperature 60 Aug 12 10:32:19 dm01sw-ibb0 PEF: Sensor 'CHASSIS_STATUS' Type 0xC0 Data1 0x00 Entity 7 Instance 0 Aug 12 10:32:19 dm01sw-ibb0 PEF: Sensor 'TEMP_ATTN' Type 0xC0 Data1 0x00 Entity 7 Instance 0
Example ILOM event log entry 1438 Mon Aug 12 10:32:19 2013 IPMI Log critical
ID = 525 : 08/12/2013 : 10:32:19 : OEM sensor : TEMP_ATTN : State Deasserted 1437 Mon Aug 12 10:32:19 2013 IPMI Log critical ID = 524 : 08/12/2013 : 10:32:19 : OEM sensor : CHASSIS_STATUS : State Deasserted 1436 Mon Aug 12 10:32:09 2013 IPMI Log critical ID = 523 : 08/12/2013 : 10:32:09 : OEM sensor : TEMP_ATTN : State Asserted 1435 Mon Aug 12 10:32:09 2013 IPMI Log critical ID = 522 : 08/12/2013 : 10:32:09 : OEM sensor : CHASSIS_STATUS : State Asserted
CauseA temperature sensor in 36 port QDR InfiniBand switches running firmware version less than 2.1.3-4 is prone to high variability, which can lead to an inaccurate temperature alert. This is registered in the unpublished bug 17347360 for Exadata. The unpublished bug 14041434 also exists for a similar issue on Exalogic. SolutionTEMP_ATTN sensor alert may be ignored until InfiniBand switch firmware 2.1.3-4 (or later) is adopted by your Oracle engineered system platform. The switch will raise a different temperature alert if any component temperature sensor is genuinely out of range. InfiniBand switch firmware 2.1.3-4 uses a different sensor for TEMP_ATTN to more accurately determine high temperatures that will cause switch shutdown. With switch firmware version 1.3.3-2, the back and front board temperature sensors should be used as indicators of switch ambient temperature. The proper ambient temperature range is 10 degrees C to 35 degrees C. Back and front board temperature sensor readings can be obtained with the showtemps command. # showtemps
Back temperature 18 Front temperature 18
References<BUG:17347360> - MANAGEMENT CPU (SERVICE PROCESSOR) TEMPERATURE READOUT VARIABILITY<BUG:14041434> - INFINIBAND SWITCH BLOWING HOT AIR IN RACK AND CAUSES TEMPERATURE ISSUE Attachments This solution has no attachment |
||||||||||||
|