Asset ID: |
1-72-1944134.1 |
Update Date: | 2018-03-05 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1944134.1
:
High fan speed in an M5000 server and no error status
Related Items |
- Sun SPARC Enterprise M4000 Server
- Sun SPARC Enterprise M5000 Server
|
Related Categories |
- PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: Mx000
|
In this Document
Created from <SR 3-9858886871>
Applies to:
Sun SPARC Enterprise M5000 Server - Version All Versions to All Versions [Release All Releases]
Sun SPARC Enterprise M4000 Server - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.
Symptoms
"High speed" or "Full speed" of all FANs in your M5000 platform, but no fault of any component can be found.
Reboot of the XSCF unit as well as an update of XCP firmware does not change the behavior. The command
showenvironment is as follows:
XSCF> showenvironment Fan
FAN_A#0:High speed
FAN_A#0: 5502rpm
FAN_A#1:High speed
FAN_A#1: 5246rpm
FAN_A#2:High speed
FAN_A#2: 5640rpm
FAN_A#3:High speed
FAN_A#3: 5246rpm
PSU#0
PSU#0:Full speed
PSU#0: 8231rpm
PSU#0: 8035rpm
PSU#1
PSU#1:Full speed
PSU#1: 8035rpm
PSU#1: 8035rpm
PSU#2
PSU#2:High speed
PSU#2: 5192rpm
PSU#2: 5357rpm
PSU#3
PSU#3:High speed
PSU#3: 5192rpm
PSU#3: 5192rpm
Also the environmental values, which have influence to fan speed under normal operations are within a normal range,
hence the fan speed isn't set due to thermal abnormality of inlet temperature and / or altitude settings:
examples:
XSCF> showenvironment temp
Temperature:22.50C
MBU_B
CPUM#0-CHIP#0:33.00C
CPUM#0-CHIP#1:32.05C
CPUM#2-CHIP#0:39.17C
CPUM#2-CHIP#1:38.60C
CPUM#3-CHIP#0:37.60C
CPUM#3-CHIP#1:39.17C
IOU#0:27.00C
IOU#1:30.50C
XSCF> showaltitude
100m
Changes
This behavior was observed after data center power outage. The affected system lost sufficient power abruptly.
No relevant hardware events were found in FMA on the XSCF unit:
XSCF> fmdump
TIME UUID MSG-ID
[ there maybe events without significance for the power outage ]
and the system status looks pretty much normal, as 'showstatus' indicates there are no faults.:
XSCF> showstatus
No failures found in System Initialization.
Cause
The behavior was caused through an inconsistant temporary status of one or more FAN components versus
the status reflected in the XSCF database ( BDB ), due to the unpredictable nature of a power outage.
With respect to document 1019147.1 we know the Fan failure behavior within their cooling groups and we would
expect an issue with PSU#0 or PSU#1 in terms of the above example, because their fans are on full speed,
while all other fans of all other cooling groups are on high speed.
Solution
In the described scenario it is recommended to verify the overall hardware status of
all FANs in power supplies and FAN trays by running a hardware test to get the inconsistency
resolved for these components.
For the power supplies and fans this can be done by issuing the "replacefru" command
without physically pulling any components.
This needs to be done for each one, until speed has changed and issue is resolved.
It may happened after the first test or after the last one,this is not predictable.
There is a complete lab example for the first FAN_A#0 in a M4000:
XSCF> replacefru
----------------------------------------------------------------------
Maintenance/Replacement Menu
Please select a type of FRU to be replaced.
1. FAN (Fan Unit)
2. PSU (Power Supply Unit)
----------------------------------------------------------------------
Select [1,2|c:cancel] :1
----------------------------------------------------------------------
Maintenance/Replacement Menu
Please select a FAN to be replaced.
No. FRU Status
--- --------------- ------------------
1. FAN_A#0 Normal
2. FAN_A#1 Normal
3. FAN_B#0 Normal
4. FAN_B#1 Normal
----------------------------------------------------------------------
Select [1-4|b:back] :1
You are about to replace FAN_A#0.
Do you want to continue?[r:replace|c:cancel] :r
Please confirm the Check LED is blinking.
If this is the case, please replace FAN_A#0.
After replacement has been completed, please select[f:finish] :f
Diagnostic tests for FAN_A#0 have started.
[This operation may take up to 3 minute(s)]
(progress scale reported in seconds)
0..... 30..done
----------------------------------------------------------------------
Maintenance/Replacement Menu
Status of the replaced FRU.
FRU Status
------------- --------
FAN_A#0 Normal
----------------------------------------------------------------------
The replacement of FAN_A#0 has completed normally.[f:finish] :f
----------------------------------------------------------------------
Maintenance/Replacement Menu
Please select a type of FRU to be replaced.
1. FAN (Fan Unit)
2. PSU (Power Supply Unit)
----------------------------------------------------------------------
Select [1,2|c:cancel] :
It may happen the check LED is lit for a short period of time while replacefru runs the
hardware test of the chosen component. If the the FAN speed goes to normal afterwards
it can be assume the problem is resolved and no more component needs to be tested.
References
<NOTE:1019147.1> - Sun SPARC Enterprise(R) M3000/M4000/M5000/M8000/M9000 Servers: Fan/fantray temperature and Over-temperature failure behavior
Attachments
This solution has no attachment