Asset ID: |
1-71-1019147.1 |
Update Date: | 2017-05-01 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1019147.1
:
Sun SPARC Enterprise(R) M3000/M4000/M5000/M8000/M9000 Servers: Fan/fantray temperature and Over-temperature failure behavior
Related Items |
- Sun SPARC Enterprise M4000 Server
- Sun SPARC Enterprise M9000-32 Server
- Sun SPARC Enterprise M9000-64 Server
- Sun SPARC Enterprise M5000 Server
- Sun SPARC Enterprise M8000 Server
- Sun SPARC Enterprise M3000 Server
|
Related Categories |
- PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: Mx000
- _Old GCS Categories>Sun Microsystems>Servers>OPL Servers
|
PreviouslyPublishedAs
235941
Sun SPARC Enterprise(R) M3000/M4000/M5000/M8000/M9000 Servers: Fan/fantray temperature and failure behavior
In this Document
Applies to:
Sun SPARC Enterprise M3000 Server - Version All Versions to All Versions [Release All Releases]
Sun SPARC Enterprise M9000-64 Server - Version All Versions to All Versions [Release All Releases]
Sun SPARC Enterprise M4000 Server - Version All Versions to All Versions [Release All Releases]
Sun SPARC Enterprise M5000 Server - Version All Versions to All Versions [Release All Releases]
Sun SPARC Enterprise M8000 Server - Version All Versions to All Versions [Release All Releases]
All Platforms
Sun SPARC Enterprise(R) Mx000 Servers: Fan/fantray temperature and failure behavior
Goal
This document describes the fan / fan tray redundancy, the behavior of the systems in case of fan / fan tray failure and the fan speed control depending on inlet temperature.
Solution
This section describes the environment and failure behavior.
The showenvironment command is used to display various temperatures and fan speeds.
XSCF> showenvironment
XSCF> showenvironment temp
XSCF> showenvironment Fan
The showaltitude command is used to display the altitude setting.
XSCF> showaltitude
Note: In the following tables, threshold temperatures are temperatures where an action is taken, the sign ">" means greater or equal and the sign "<" means smaller or equal in this context.
M3000
M3000 systems have a single inlet temperature sensor on the Operator Panel (OPNL)
M3000 systems have the exhaust temperature sensor on the Motherboard (MBU). The CPU chip has a temperature sensor.
Fans on M3000 systems have 10 speeds: level 1 .. level 10
Fan failure behavior
- Failure of a FAN_A, and if the other FAN_A in the cooling group is operational, then this other fan's speed is raised to full speed (level 10) and the speed of all other fans on the platform is raised to high speed (level 9).
- If the failure is due to a failure of a fan in a PSU, and if all the other fans in the PSUs of the cooling group are operational, then the other fan of this PSU has its speed raised to full speed and the speed of all other fans on the platform is raised to high speed.
- If the second fan in a cooling group becomes non-operational, then XSCF shuts down the domain and powers off the platform.
M3000 Cooling Groups
|
CG#1
|
CG#2
|
Fans
|
FAN_A#0 FAN_A#1
|
Fan in PSU#0 Fan in PSU#1
|
Hardware
|
MBU DIMMs PCI slots
|
DVD HDDs PSUs
|
M3000
|
Typical fan speed
|
Fan speed
|
Standby
|
level1
|
level2
|
level3
|
Level4 *
|
level5
|
level6
|
level7
|
level8
|
level9
|
level10
|
RPM FAN_A#
|
0
|
3500
|
3550
|
3600
|
3700
|
3800
|
4100
|
4400
|
5000
|
5800
|
6600
|
RPM Fan PSU
|
3400
|
6000
|
6000
|
6000
|
6000
|
6200
|
6350
|
6550
|
6700
|
6900
|
12000
|
* Level 4 is the initial value after power on
M3000
|
Minimum fan speed below which a fan is declared failed.
|
Fan speed
|
Standby
|
level1
|
level2
|
level3
|
Level4
|
level5
|
level6
|
level7
|
level8
|
level9
|
level10
|
RPM FAN_A#
|
N/A
|
2130
|
2160
|
2190
|
2220
|
2290
|
2460
|
2670
|
3000
|
3500
|
3950
|
RPM Fan PSU
|
N/A
|
4800
|
4800
|
4800
|
4800
|
4960
|
5120
|
5280
|
5440
|
5600
|
9440
|
M3000
|
Fan speed relation to inlet temperature (°C) 500 m or below
|
Fan speed
|
Standby
|
level1
|
level2
|
level3
|
Level4
|
level5
|
level6
|
level7
|
level8
|
level9
|
level10
|
none
|
low
|
low
|
low
|
low
|
middle
|
middle
|
high
|
high
|
high
|
full
|
threshold temp.
|
Domain power off
|
< 19
|
> 20 < 21
|
> 22 < 23
|
> 24 < 25
|
> 26 < 27
|
> 28 < 29
|
> 30 < 31
|
> 32 < 33
|
> 34
|
*
|
Inlet overtemperature set: > 38 Inlet overtemperature reset: < 35
|
M3000
|
Fan speed relation to inlet temperature (°C) 501 m to 1000 m
|
Fan speed
|
Standby
|
level1
|
level2
|
level3
|
Level4
|
level5
|
level6
|
level7
|
level8
|
level9
|
level10
|
none
|
low
|
low
|
low
|
low
|
middle
|
middle
|
high
|
high
|
high
|
full
|
threshold temp.
|
Domain power off
|
< 17
|
> 18 < 16
|
> 20 < 21
|
> 22 < 23
|
> 24 < 25
|
> 26 < 27
|
> 28 < 29
|
> 30 < 31
|
> 32
|
*
|
Inlet overtemperature set: > 36 Inlet overtemperature reset: < 33
|
M3000
|
Fan speed relation to inlet temperature (°C) 1001 m to 1500 m
|
Fan speed
|
Standby
|
level1
|
level2
|
level3
|
Level4
|
level5
|
level6
|
level7
|
level8
|
level9
|
level10
|
none
|
low
|
low
|
low
|
low
|
middle
|
middle
|
high
|
high
|
high
|
full
|
threshold temp.
|
Domain power off
|
< 15
|
> 16 < 14
|
> 18 < 16
|
> 20 < 21
|
> 22 < 23
|
> 24 < 25
|
> 26 < 27
|
> 28 < 29
|
> 30
|
*
|
Inlet overtemperature set: > 34 Inlet overtemperature reset: < 31
|
M3000
|
Fan speed relation to inlet temperature (°C) 1501m to 3000 m
|
Fan speed
|
Standby
|
level1
|
level2
|
level3
|
Level4
|
level5
|
level6
|
level7
|
level8
|
level9
|
level10
|
none
|
low
|
low
|
low
|
low
|
middle
|
middle
|
high
|
high
|
high
|
full
|
threshold temp.
|
Domain power off
|
< 13
|
> 14 < 12
|
> 16 < 14
|
> 18 < 16
|
> 20 < 21
|
> 22 < 23
|
> 24 < 25
|
> 26 < 27
|
> 28
|
*
|
Inlet overtemperature set: > 32 Inlet overtemperature reset: < 29
|
* Full speed is never set based on temperature, it is only used in case of a fan failure in the cooling group.
M3000
|
Overtemperature behaviour (°C)
|
Overtemperature status
|
Overtemperature
|
Overtemperature warning
|
Overtemperature fail
|
condition inlet temperature
|
see tables above
|
-
|
-
|
condition CPU temperature
|
set: > 77 reset: < 64
|
set: > 82
|
set: > 102
|
condition MBU temperature
|
set: > 64 reset: < 48
|
set: > 69
|
set: > 79
|
Action
|
Set all fans to high speed
|
Shutdown domain then power off platform
|
emergency power off platform
|
ereport
|
chassis.env.temp.ot@ <location>
|
chassis.env.temp.otw@ <location>
|
chassis.env.temp.otf@ <location>
|
M4000 & M5000
M4000 & M5000 systems have a single inlet temperature sensor on the Operator Panel (OPNL) FRU.
There is an exhaust temperature sensor on each IOU.
Each CPU has a temperature sensor.
Fan failure behavior
- Failure of a FAN_A, and if the other FAN_A in the cooling group is operational, then this other fan's speed is raised to full speed and the speed of all other fans on the platform is raised to high speed.
- If the failure is due to a failure of a fan in a PSU, and if all the other fans in the PSUs of the cooling group are operational, then the other fan of this PSU has its speed raised to full speed and the speed of all other fans on the platform is raised to high speed.
- If the second fan in a cooling group becomes non-operational, then XSCF sends a shutdown request to all domains in the system and powers off the system.
- FAN_B are specific for M4000 platform.
M4000 Cooling Groups
|
CG#1
|
CG#2
|
CG#3
|
Fans
|
FAN_A#0 FAN_A#1
|
Fan in PSU#0 Fan in PSU#1
|
FAN_B#0 FAN_B#1
|
Hardware
|
MBU CPUM#0 CPUM#1 MEB#0..4
|
PSUs IOU
|
XSCFU HDD DVDU TAPEU
|
M5000 Cooling Groups
|
CG#1
|
CG#2
|
CG#3
|
CG#4
|
Fans
|
FAN_A#0 FAN_A#1
|
FAN_A#2 FAN_A#3
|
Fan in PSU#0 Fan in PSU#1
|
Fan in PSU#2 Fan in PSU#3
|
Hardware
|
1/2 MBU CPUM#0 CPUM#1 MEB#0..3 HDD#0..3
|
1/2 MBU CPUM#2 CPUM#3 MEB#4..7 TAPEU DVDU
|
PSU#0 PSU#1 IOU0 XSCFU
|
PSU#2 PSU#3 IOU1
|
M4000 M5000
|
Typical fan speed
|
Fan speed
|
Standby
|
Low *
|
middle
|
high
|
full
|
RPM Fan PSU
|
2400
|
3600
|
4200
|
5200
|
8400
|
RPM FAN_A#
|
0
|
3200
|
4200
|
5300
|
5900
|
RPM FAN_B#
|
5000
|
10000
|
10000
|
10000
|
12000
|
* low is the initial value after power on
M4000 M5000
|
Minimum fan speed below which a fan is declared failed.
|
Fan speed
|
Standby
|
low
|
middle
|
high
|
full
|
RPM Fan PSU
|
1520
|
2680
|
3160
|
4200
|
6400
|
RPM FAN_A#
|
N/A
|
2560
|
3280
|
4080
|
4880
|
RPM FAN_B#
|
3000
|
7520
|
7520
|
7520
|
9440
|
M4000 M5000
|
Fan speed relation to inlet temperature 500 m or below (with air filters installed, lower all threshold temperatures by 3 °C)
|
Fan speed
|
Standby
|
low
|
middle
|
high
|
full
|
threshold temp.
|
Domains powered off
|
< 23
|
> 25 < 28
|
> 30
|
*
|
Inlet overtemperature set: > 38 Inlet overtemperature reset: < 35
|
M4000 M5000
|
Fan speed relation to inlet temperature (°C) 501 m to 1000 m (with air filters installed, lower all threshold temperatures by 3 °C)
|
Fan speed
|
Standby
|
low
|
middle
|
high
|
full
|
threshold temp.
|
Domains powered off
|
< 21
|
> 23 < 26
|
> 28
|
*
|
Inlet overtemperature set: > 36 Inlet overtemperature reset: < 33
|
M4000 M5000
|
Fan speed relation to inlet temperature (°C) 1001 m to 1500 m (with air filters installed, lower all threshold temperatures by 3 °C)
|
Fan speed
|
Standby
|
low
|
middle
|
high
|
full
|
threshold temp.
|
Domains powered off
|
< 19
|
> 21 < 24
|
> 26
|
*
|
Inlet overtemperature set: > 34 Inlet overtemperature reset: < 31
|
M4000 M5000
|
Fan speed relation to inlet temperature (°C) 1501m to 3000 m (with air filters installed, lower all threshold temperatures by 3 °C)
|
Fan speed
|
Standby
|
low
|
middle
|
high
|
full
|
threshold temp.
|
Domains powered off
|
< 17
|
> 19 < 22
|
> 24
|
*
|
Inlet overtemperature set: > 32 Inlet overtemperature reset: < 29
|
* Full speed is never set based on temperature, it is only used in case of a fan failure in the cooling group.
M4000 M5000
|
Overtemperature behaviour (°C)
|
Overtemperature status
|
Overtemperature
|
Overtemperature warning
|
Overtemperature fail
|
condition inlet temperature
|
see tables above
|
-
|
-
|
condition CPU temperature
|
set: > 79 reset: < 71
|
set: > 82
|
set: > 104
|
condition IOU temperature
|
set: > 60 reset: < 49
|
set: > 65
|
set: > 75
|
Action
|
Set all fans to high speed
|
Shutdown all domains then power off platform
|
Emergency power off platform
|
ereport
|
chassis.env.temp.ot@ <location>
|
chassis.env.temp.otw@ <location>
|
chassis.env.temp.otf@ <location>
|
M8000 & M9000
M8000 & M9000 systems have a single inlet temperature sensor on the Sensor (SNSU) FRU which is located at the front, bottom left, close to the DVD
M8000 systems have exhaust temperature sensors for each CMU.
M9000 systems have exhaust temperature sensors for each CMU and each XBU.
Each CPU has a temperature sensor.
M8000 Cooling Groups
|
CG#1
|
CG#2
|
Fan Trays
|
FAN_A#0 FAN_A#1 FAN_A#2 FAN_A#3 FAN_B#0 FAN_B#1
|
FAN_B#2 FAN_B#3 FAN_B#4 FAN_B#5 FAN_B#6 FAN_B#7
|
Hardware
|
CMU#0 CMU#1 CMU#2 CMU#3 DDC_A#0 DDC_A#1 XSCFU
|
IOU#0 IOU#1 IOU#2 IOU#3
|
M9000 Cooling Groups
|
CG#1
|
CG#2
|
CG#3
|
Fan Trays
base cabinet
|
FAN_A#0 FAN_A#1 FAN_A#2 FAN_A#3
|
FAN_A#4 FAN_A#5 FAN_A#6 FAN_A#7 FAN_A#8 FAN_A#9
|
FAN_A#10 FAN_A#11 FAN_A#12 FAN_A#13 FAN_A#14 FAN_A#15
|
Hardware
base cabinet
|
XBUs CLCKUs XSCFUs IOU#0 IOU#2 IOU#4 IOU#6
|
CMU#0 CMU#1 CMU#2 CMU#3 IOU#1 IOU#3
|
CMU#4 CMU#5 CMU#6 CMU#7 IOU#5 IOU#7
|
Fan Trays
expansion cabinet
|
FAN_A#20 FAN_A#21 FAN_A#22 FAN_A#23
|
FAN_A#24 FAN_A#25 FAN_A#26 FAN_A#27 FAN_A#28 FAN_A#29
|
FAN_A#30 FAN_A#31 FAN_A#32 FAN_A#33 FAN_A#34 FAN_A#35
|
Hardware
expansion cabinet
|
XBUs CLCKUs XSCFUs IOU#8 IOU#10 IOU#12 IOU#14
|
CMU#8 CMU#9 CMU#10 CMU#11 IOU#9 IOU#11
|
CMU#12 CMU#13 CMU#14 CMU#15 IOU#13 IOU#15
|
Fan failure behavior:
M8000 and M9000
- The fan trays have 2 fans (FAN_B) or 3 fans (FAN_A), the fans in the fan trays are N+1 redundant, which means after the first fan failure the fan tray should be replaced as soon as possible to avoid any domain/platform outage. Fan trays are not redundant.
- The fans in the PSUs do not have their speed controlled by XSCF. The fan speed on these fans is controlled by a microcontroller internal to the PSU.If a fan in a PSU fails, then the PSU is powered off and deconfigured.
- If there are insufficient operational PSUs to power the platform, then the platform is powered down.
M8000 specific
- A failure in cooling group #1 or cooling group #2 will affect the entire platform.
- If second fan in a specific fantray becomes non-operational or if the fan tray itself fails it then XSCF will not permit to power up the platform. If the failure happens while the platform is powered up, all fan speed in the platform will be raised to high, domains will not be shut down, only warning messages will be issued. In this situation, the system relies on the overtemperature behavior described later in this document.
M9000 specific
- A failure in cooling group #1 will affect the entire platform. A failure in cooling group #2 or cooling group#3 will affect only their specific FRUs, hence the domains being cooled by that cooling group.
- If second fan in a specific fantray becomes non-operational or if the fan tray itself fails it then XSCF will not permit to power up the platform or specific FRUs. If the failure happens while the platform is powered up, all fan speed in the platform will be raised to high, domains will not be shut down, only warning messages will be issued. In this situation, the system relies on the overtemperature behavior described later in this document.
M8000 M9000
|
Typical fan speed
|
Fan speed
|
Normal
|
High
|
RPM FAN_A# FAN_B#
|
3700
|
5500
|
M8000 M9000
|
Minimum fan speed below which a fan is declared failed.
|
Fan speed
|
Normal
|
High
|
RPM FAN_A# FAN_B#
|
3100
|
4200
|
M8000 M9000
|
Fan speed relation to inlet temperature (°C) 1500 m or below
|
Fan speed
|
Normal
|
High
|
threshold temp.
|
< 24
|
> 27
|
Inlet overtemperature set: > 36 Inlet overtemperature reset: < 32
|
M8000 M9000
|
Fan speed relation to inlet temperature (°C) 1501 m to 2000 m
|
Fan speed
|
Normal
|
High
|
threshold temp.
|
< 22
|
>25
|
Inlet overtemperature set: > 34 Inlet overtemperature reset: < 30
|
M8000 M9000
|
Fan speed relation to inlet temperature (°C) 2001 m to 2500 m
|
Fan speed
|
Normal
|
High
|
threshold temp.
|
< 20
|
> 23
|
Inlet overtemperature set: > 32 Inlet overtemperature reset: < 28
|
M8000 M9000
|
Fan speed relation to inlet temperature (°C) 2501 m to 3000 m
|
Fan speed
|
Normal
|
High
|
threshold temp.
|
< 18
|
> 21
|
Inlet overtemperature set: > 30 Inlet overtemperature reset: < 26
|
M8000 M9000
|
Overtemperature behaviour (°C)
|
Overtemperature status
|
Overtemperature
|
Overtemperature warning
|
Overtemperature fail
|
condition inlet temperature
|
see tables above
|
-
|
-
|
-
|
-
|
condition CPU temperature
|
set: > 79 reset: < 71
|
set: > 82
|
-
|
set: > 104
|
-
|
condition CMU temperature
|
set: > 61 reset: < 49
|
set: > 66
|
-
|
set: > 76
|
-
|
condition XBU temperature
|
set: > 61 reset: < 49
|
-
|
set: > 66
|
-
|
set: > 76
|
Action
|
Set all fans to high speed
|
Shutdown domains in CG then power off all hardware in CG
|
Shutdown all domains then power off platform
|
emergency power off all hardware in CG
|
emergency power off platform
|
ereport
|
chassis.env.temp.ot@<location>
|
chassis.env.temp.otw@ <location>
|
chassis.env.temp.otf@ <location>
|
Cooling requirements:
-
|
Rated Power
|
Cooling Requirements
|
Flow
|
|
W
|
BTU/h
|
KJ/h
|
m3/min
|
M3000
|
470
|
1603
|
1692
|
1.75
|
M4000
|
2350
|
8018
|
8046
|
7
|
M5000
|
4590
|
16036
|
16524
|
14
|
M8000
|
10500
|
35857
|
37800
|
94
|
M9000-32
|
21300
|
72740
|
76680
|
102
|
M9000-64
|
42600
|
145479
|
153360
|
205
|
Required environment:
|
Temperature (°C)
|
Relative Humidity (%)
|
|
Non-Op.
|
Operating
|
Non-Op.
|
Operating
|
|
Range
|
Range
|
Best Range
|
Range
|
Range
|
Best Range
|
M3000
|
0 to 50
-20 to 60 (packed)
|
0-500 m: 5-35 501-100 m: 5-33 1001-1500 m: 5-31 1501-3000 m: 5-29
|
21-23
|
0-93
|
20-80
|
45-50
|
M4000/M5000
|
0 to 50
-20 to 60 (packed)
|
0-500 m: 5-35 501-100 m: 5-33 1001-1500 m: 5-31 1501-3000 m: 5-29
|
21-23
|
0-93
|
20-80
|
45-50
|
M8000/M9000
|
0 to 50
|
0-1500 m: 5-32 1501-2000 m: 5-30 2001-2500 m: 5-28 2501-3000 m: 5-26
|
21-23
|
8-80
|
20-80
|
45-50
|
Internal section
This document describes the state as of XCP 1090
Additional troubleshooting information:
A. In case of:
- multiple fan failures occurred in a short timeframe
- all fans of same type rotating exactly at the same speed
please pay attention and consider a fan controller issue before replacing all the affected fan tray(s).
B. FF platforms (M5000 and M4000) have different fan backplanes:
- M5000 has one Fan Backplane (for 172mm Fans, FAN_A type) that includes the fan controller
- M4000 has two Fan Backplanes: one for 172mm Fans, one for 60mm Fans (FAN_B type) that includes the fan controller
Keep this in mind when dealing with multiple or repeated fan failures (i.e.: repeated FAN_A fault on M5000 may be due to failed controller, that is not included on the FAN_A backplane); check below the showhardconf output bot both platforms:
M5000
FANBP_C Status:Normal; Ver:0501h; Serial:NN110224CD;
+ FRU-Part-Number:CF00541-3099 01 /541-3099-01 ;
FAN_A#0 Status:Normal;
FAN_A#1 Status:Normal;
FAN_A#2 Status:Normal;
FAN_A#3 Status:Normal;
M4000
FAN_A#0 Status:Normal;
FAN_A#1 Status:Normal;
FANBP_B Status:Normal; Ver:0201h; Serial:BF0844MV6G ;
+ FRU-Part-Number:CF00541-0909 04 /541-0909-04 ;
FAN_B#0 Status:Normal;
FAN_B#1 Status:Normal;
FAN CR's:
- 6875469 - On OPL M3000 Ikkaku, fan speed is set to level 4 upon domain power up, regardless of conditions.
- 6870490 - On M4000/M5000/M8000/M9000, fan alarm condition chgs while XSCF is down are ignored on XSCF boot
References:
- OPL FF & DC Environment preso (Andre Beusch): Environment TOI
- Troubleshooting a Noisy FAN on OPL Servers: Doc 1339901.1
- Tracking page for cases where customers complain about fans being too noisy in M4000 / M5000 systems. Noisy Fan Tracking Page.
Keywords:
thermal temp, fan, fantray, tray, redundancy, M4000, M5000, M8000, M9000, M9000+ speed
References
<BUG:15584816> - SUNBT6875469 ON OPL M3000 IKKAKU, FAN SPEED IS SET TO LEVEL 4 UPON DOMAIN POWER
<BUG:15581764> - SUNBT6870490 ON M4000/M5000/M8000/M9000, FAN ALARM CONDITION CHGS WHILE XSCF IS
Attachments
This solution has no attachment