Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1596395.1
Update Date:2017-12-12
Keywords:

Solution Type  Problem Resolution Sure

Solution  1596395.1 :   Sun Fire[TM] V440 Server Power Supply failure and "TEMP_SENSOR @ ... T_CORE has exceeded high warning threshold." warning messages, possibly followed by system outage  


Related Items
  • Sun Fire V440 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Workgroup Servers>SN-SPARC: SF-V4x0
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-7992012811>

Applies to:

Sun Fire V440 Server - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

Power Supply fails

"TEMP_SENSOR" High temperature warning messages are logged to ALOM messages

SC may request to automatically power off Host in order to avoid overheating cause damage to internal components.

Standby Power LED amber light lit

Errors messages from ALOM similar to:


In the case PS0 has failed:

OCT 23 17:19:56 server_name: 0004004f: "Indicator PS0.SERVICE is now ON"
OCT 23 17:19:59 server_name: 00040066: "PSU @ PS0 has FAILED."
OCT 23 17:19:59 server_name: 0004004f: "Indicator SYS.SERVICE is now ON"
OCT 23 17:21:59 server_name: 00040002: "Host System has Reset"
OCT 23 17:26:17 server_name: 00040002: "Host System has Reset"
OCT 23 17:26:52 server_name: 0004006b: "TEMP_SENSOR @ C0.P0.T_CORE has exceeded high warning threshold."
OCT 23 17:28:00 server_name: 0004006f: "SC initiating soft host system shutdown due to fault at C0.P0.T_CORE."
OCT 23 17:28:01 server_name: 00040000: "SC Request to Power Off Host."
OCT 23 17:28:02 server_name: 0004006c: "TEMP_SENSOR @ C0.P0.T_CORE has exceeded high soft shutdown threshold."
OCT 23 17:28:25 server_name: 0004000b: "Host System has read and cleared bootmode."
OCT 23 17:29:11 server_name: 00040029: "Host system has shut down."
OCT 23 17:29:36 server_name: 00040065: "PSU @ PS0 is OK."
OCT 23 17:29:36 server_name: 0004004f: "Indicator PS0.SERVICE is now OFF"
OCT 23 17:29:36 server_name: 0004004f: "Indicator PS1.POK is now OFF"

In the case PS1 has failed:

OCT 23 11:32:05 server_name: 0004004f: "Indicator PS1.SERVICE is now ON"
OCT 23 11:32:08 server_name: 00040066: "PSU @ PS1 has FAILED."
OCT 23 11:32:08 server_name: 0004004f: "Indicator SYS.SERVICE is now ON"
OCT 23 11:33:43 server_name: 00040002: "Host System has Reset"
OCT 23 11:36:32 server_name: 0004006b: "TEMP_SENSOR @ C3.P0.T_CORE has exceeded high warning threshold."
OCT 23 11:37:12 server_name: 0004006f: "SC initiating soft host system shutdown due to fault at C3.P0.T_CORE."
OCT 23 11:37:13 server_name: 00040000: "SC Request to Power Off Host."
OCT 23 11:37:14 server_name: 0004006c: "TEMP_SENSOR @ C3.P0.T_CORE has exceeded high soft shutdown threshold."
OCT 23 11:38:24 server_name: 00040029: "Host system has shut down."
OCT 23 11:38:48 server_name: 0004004f: "Indicator PS0.POK is now OFF"
OCT 23 11:38:48 server_name: 0004004f: "Indicator PS1.SERVICE is now OFF"
OCT 23 11:38:49 server_name: 00040065: "PSU @ PS1 is OK." 

 Notice PS0 fault may eventually affect temperature on CPU0, and PS1 fault may eventually affect temperature on CPU3

Cause

The PSU failure may affect CPU cooling. If temperature readings from sensors in CPUs exceed high warning threshold, the host will be shut down.

Consider that CPU high temperature readings in this case do not indicate a CPU module failure, but a side effect of degraded cooling capability.
 

Solution

Open a Service Request in My Oracle Support and request diagnosis in order to request service repair.

Technical Support Engineer:

This document created from a real case scenario. We were able to verify the temperature issue migrated from C0 to C3 after swapping the power supplies.

This scenario indicates a Power Supply replacement should eliminate the CPU high temperature symptom as well. If the problem with CPU temperature is not solved after replacing the faulty Power Supply, then additional troubleshooting is required. Check environment conditions, component failures, etc as per:

"Sun Fire V440 Server Diagnostics and Troubleshooting Guide" -> http://docs.oracle.com/cd/E19088-01/v440.srvr/816-7730-10/816-7730-10.pdf

 

Some early PS had an issue with early life failure on the fan used.  Delta removed Minebea as a fan supplier from the BOM in early 2006 and this failure has not been seen in the new fans from Nidec.  The Minebea fan does not have a shielded bearing to prevent dust from getting into the bearings.

Parts Affected:
The affected systems will have one of the following Delta power supplies with Minebea fan (Fan P/N  3620907109; Model : DPS-680CB A), manufactured prior to November 1st 2005, i.e., S/N prior to 0017527-0545xxxxxx):

300-1501-07
300-1501-08
300-1501-09

References

<NOTE:1271047.1> - Sun Fire V440 Power Supply Installation and Removal:ATR:782:1 [Video]

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback