Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1950178.1
Update Date:2017-05-01
Keywords:

Solution Type  Problem Resolution Sure

Solution  1950178.1 :   T5120/T5220/T5140/T5240 and Netras System Shuts Down Due to Sensor Errors on /T_TCORE and /T_BCORE  


Related Items
  • Sun Netra T5440 Server
  •  
  • Sun SPARC Enterprise T5240 Server
  •  
  • Sun SPARC Enterprise T5220 Server
  •  
  • Sun Netra T5220 Server
  •  
  • Sun SPARC Enterprise T5120 Server
  •  
  • Sun SPARC Enterprise T5140 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T5xx0
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-9919090501>

Applies to:

Sun Netra T5440 Server - Version Not Applicable to Not Applicable [Release NA]
Sun SPARC Enterprise T5240 Server - Version All Versions to All Versions [Release All Releases]
Sun SPARC Enterprise T5220 Server - Version All Versions to All Versions [Release All Releases]
Sun SPARC Enterprise T5140 Server - Version All Versions to All Versions [Release All Releases]
Sun SPARC Enterprise T5120 Server - Version Not Applicable to Not Applicable [Release NA]
Information in this document applies to any platform.

Symptoms

System will have an outage and shut down and on the sp you may see various problems in the logs.


Sensor fluctuations on T_TCORE and T_BCORE (taken from -> show /SP/logs/event/list)

857    Tue Nov 25 08:51:49 2014  IPMI      Log       major          ID =   72 : 11/25/2014 : 08:51:49 : Temperature : /MB/CMP1/T_TCORE : Upper Critical going high : reading 97 "= threshold 96 degrees C
856    Tue Nov 25 08:51:47 2014  IPMI      Log       major          ID =   71 : 11/25/2014 : 08:51:47 : Temperature : /MB/CMP1/T_BCORE : Upper Critical going high : reading 96 "= threshold 96 degrees C
855    Tue Nov 25 08:51:45 2014  IPMI      Log       major          ID =   70 : 11/25/2014 : 08:51:45 : Temperature : /MB/CMP0/T_TCORE : Upper Critical going high : reading 97 "= threshold 96 degrees C
854    Tue Nov 25 08:51:44 2014  IPMI      Log       major          ID =   6f : 11/25/2014 : 08:51:44 : Temperature : /MB/CMP0/T_BCORE : Upper Critical going high : reading 98 "= threshold 96 degrees C
853    Tue Nov 25 08:51:30 2014  IPMI      Log       minor          ID =   6e : 11/25/2014 : 08:51:26 : Temperature : /MB/CMP1/T_TCORE : Upper Non-critical going high : reading 86 "= threshold 86 degrees C
852    Tue Nov 25 08:51:30 2014  IPMI      Log       minor          ID =   6d : 11/25/2014 : 08:51:24 : Temperature : /MB/CMP1/T_BCORE : Upper Non-critical going high : reading 86 "= threshold 86 degrees C
851    Tue Nov 25 08:51:30 2014  IPMI      Log       minor          ID =   6c : 11/25/2014 : 08:51:22 : Temperature : /MB/CMP0/T_TCORE : Upper Non-critical going high : reading 87 "= threshold 86 degrees C
850    Tue Nov 25 08:51:20 2014  IPMI      Log       minor          ID =   6b : 11/25/2014 : 08:51:20 : Temperature : /MB/CMP0/T_BCORE : Upper Non-critical going high : reading 87 "= threshold 86 degrees C

!!! This is what causes the outages

 

 CPU Throttling (taken from -> show /SP/logs/event/list)

861    Tue Nov 25 08:52:05 2014  Chassis   Log       major          CPU1 too hot: throttling to 1/8
860    Tue Nov 25 08:52:05 2014  Chassis   Log       major          CPU0 too hot: throttling to 1/8
859    Tue Nov 25 08:51:54 2014  Chassis   Log       major          CPU1 too hot: throttling to 1/2
858    Tue Nov 25 08:51:54 2014  Chassis   Log       major          CPU0 too hot: throttling to 1/2

 

It's also possible some FRUs are going to have problems being identified (taken from /ipmi/@usr@local@bin@ipmiint_fru_print.out)

FRU Device Description : /SASBP (LUN 0, ID 88)
Device not present (Requested sensor, data, or record not found)

FRU Device Description : /PDB (LUN 0, ID 26)
Device not present (Requested sensor, data, or record not found)

FRU Device Description : /PS0 (LUN 0, ID 22)
Device not present (Requested sensor, data, or record not found)

FRU Device Description : /PS1 (LUN 0, ID 42)
Device not present (Requested sensor, data, or record not found)

FRU Device Description : /FANBD0 (LUN 0, ID 78)
Device not present (Requested sensor, data, or record not found)

FRU Device Description : /FANBD1 (LUN 0, ID 52)
Device not present (Requested sensor, data, or record not found)

 

Some of the sensors may also show up na instead of with a number (taken from /ipmi/@usr@local@bin@ipmiint_sensor_list.out in the snapshot)

/MB/CMP0/T_TCORE | na         | degrees C  | na    | -14.000   | -9.000    | -4.000    | 86.000    | 96.000    | 106.000  
/MB/CMP0/T_BCORE | na         | degrees C  | na    | -14.000   | -9.000    | -4.000    | 86.000    | 96.000    | 106.000  
/MB/CMP1/T_TCORE | na         | degrees C  | na    | -14.000   | -9.000    | -4.000    | 86.000    | 96.000    | 106.000  
/MB/CMP1/T_BCORE | na         | degrees C  | na    | -14.000   | -9.000    | -4.000    | 86.000    | 96.000    | 106.000  

/FB0/FM0/F0/TACH | na         | RPM        | na    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB0/FM0/F1/TACH | na         | RPM        | na    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB0/FM1/F0/TACH | na         | RPM        | na    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB0/FM1/F1/TACH | na         | RPM        | na    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB0/FM2/F0/TACH | na         | RPM        | na    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB0/FM2/F1/TACH | na         | RPM        | na    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB1/FM0/F0/TACH | na         | RPM        | na    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB1/FM0/F1/TACH | na         | RPM        | na    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB1/FM1/F0/TACH | na         | RPM        | na    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB1/FM1/F1/TACH | na         | RPM        | na    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB1/FM2/F0/TACH | na         | RPM        | na    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB1/FM2/F1/TACH | na         | RPM        | na    | 2400.000  | na        | 4000.000  | na        | na        | na        

and this is what it's supposed to look like in a healthy system

/MB/CMP0/T_TCORE | 40.000     | degrees C  | ok    | -14.000   | -9.000    | -4.000    | 86.000    | 97.000    | 106.000  
/MB/CMP0/T_BCORE | 40.000     | degrees C  | ok    | -14.000   | -9.000    | -4.000    | 86.000    | 97.000    | 106.000  
/MB/CMP1/T_TCORE | 41.000     | degrees C  | ok    | -14.000   | -9.000    | -4.000    | 86.000    | 97.000    | 106.000  
/MB/CMP1/T_BCORE | 41.000     | degrees C  | ok    | -14.000   | -9.000    | -4.000    | 86.000    | 97.000    | 106.000  

/FB0/FM0/F0/TACH | 8700.000   | RPM        | ok    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB0/FM0/F1/TACH | 8500.000   | RPM        | ok    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB0/FM1/F0/TACH | 8700.000   | RPM        | ok    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB0/FM1/F1/TACH | 8600.000   | RPM        | ok    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB0/FM2/F0/TACH | 8800.000   | RPM        | ok    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB0/FM2/F1/TACH | 9200.000   | RPM        | ok    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB1/FM0/F0/TACH | 8800.000   | RPM        | ok    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB1/FM0/F1/TACH | 8700.000   | RPM        | ok    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB1/FM1/F0/TACH | 9200.000   | RPM        | ok    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB1/FM1/F1/TACH | 9000.000   | RPM        | ok    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB1/FM2/F0/TACH | na         | RPM        | na    | 2400.000  | na        | 4000.000  | na        | na        | na        
/FB1/FM2/F1/TACH | na         | RPM        | na    | 2400.000  | na        | 4000.000  | na        | na        | na    


 

Cause

Some of these issues are caused by sensor tuning parameters that are set wrong and are fixed by firmware.

If it's not fixed by firmware it may be a different problem.

Solution

Start by updating the firmware if the system is below 7.4.x.

If that doesn't resolve the issue (or your firmware is already up to date), open a service request with Oracle Support. Once the service request is open, an ilom snapshot is required to investigate the issue.

References

<BUG:15562398> - SUNBT6840250-TRUNK HURON CPU CORE OVERTEMP, "SC ALERT FOR UPPER CRITICAL GOING H
<BUG:15677825> - SUNBT6995542 CPU0 TOO HOT: THROTTLING TO 1/2 SEEN ON T5140

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback