Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2208651.1
Update Date:2018-05-22
Keywords:

Solution Type  Problem Resolution Sure

Solution  2208651.1 :   Troubleshooting high fan speed in T-series systems  


Related Items
  • SPARC T3-4
  •  
  • SPARC T3-1
  •  
  • SPARC S7-2
  •  
  • SPARC T7-1
  •  
  • SPARC T4-2
  •  
  • Netra SPARC T4-2 Server
  •  
  • SPARC T5-8
  •  
  • SPARC T8-2
  •  
  • SPARC T8-1
  •  
  • Netra SPARC T4-1 Server
  •  
  • Netra SPARC S7-2
  •  
  • SPARC T7-2
  •  
  • SPARC T7-4
  •  
  • SPARC T5-4
  •  
  • SPARC T5-2
  •  
  • SPARC T3-2
  •  
  • SPARC T8-4
  •  
  • SPARC T4-1
  •  
  • SPARC S7-2L
  •  
  • SPARC T4-4
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T5
  •  




Created from <SR 3-13547494091>

Applies to:

SPARC T3-1 - Version All Versions and later
SPARC T3-2 - Version All Versions and later
SPARC S7-2 - Version All Versions and later
SPARC T8-4 - Version All Versions and later
SPARC T8-2 - Version All Versions and later
Information in this document applies to any platform.

Symptoms

All fans of a T-series server spin at the maximum speed.  This can be seen via Snapshot file:

ipmi/@usr@local@bin@ipmiint_sensor_list.out
FBD0/FM0/TACH | 10624.000 | RPM | ok | 3968.000 | na | na | na | na | na
FBD0/FM1/TACH | 10624.000 | RPM | ok | 3968.000 | na | na | na | na | na
FBD0/FM2/TACH | 10624.000 | RPM | ok | 3968.000 | na | na | na | na | na
FBD0/FM3/TACH | 10496.000 | RPM | ok | 3968.000 | na | na | na | na | na
FBD0/FM4/TACH | 10496.000 | RPM | ok | 3968.000 | na | na | na | na | na
FBD1/FM0/TACH | 10368.000 | RPM | ok | 3968.000 | na | na | na | na | na
FBD1/FM1/TACH | 10624.000 | RPM | ok | 3968.000 | na | na | na | na | na
FBD1/FM2/TACH | 10496.000 | RPM | ok | 3968.000 | na | na | na | na | na
FBD1/FM3/TACH | 10496.000 | RPM | ok | 3968.000 | na | na | na | na | na
FBD1/FM4/TACH | 10368.000 | RPM | ok | 3968.000 | na | na | na | na | na

Fans typically spin at 5000 RPMs for most systems, but can spin at 10,000 RPM when max cooling is needed.
Fans of 1 RU high systems (like the SPARC S7-2) will typically spin at 8000RPMs & have a high rate near 19,000 RPM.

Cause

Ensure that the system does not have excessive dust that blocks the CPU heat sink or other parts!!! 

Ensure that the fans are installed properly & not revered by accident.

Ensure that the system's ambient temperature is much lower than the max operating temperature (typically 35C).

The ILOM monitors the internal temperature & current sensors to control the speed of the fans.  One must determine which sensor(s) has triggered the fans to spin up.  The CPUs are typically throttled when the fans spin fast.

Solution

If a Flash Accelerator (like an Aura 7: 7335943) then the fan speed is raised to 100%.

If the heat sinks are blocked by dust, have them cleaned to allow proper airflow.  This would be a high probability if VCore temperatures are high when the system is fairly idle (>90%), as shown below.

sysconfig/vmstat_3_3.out
 kthr        memory             page              disk          faults             cpu
r b w   swap       free   re   mf pi po fr de sr s3 s4 s5 s6   in     sy     cs  us sy id
0 0 0 720785672 187163616 476 1584 0  2  2  0  0 14 14  0  0 130590 201183 132616 5  1 93
0 0 0 667312752 135396168 301  622 0  0  0  0  0 42 42  0  0  87797  83865  84365 0  1 99
0 0 0 667312240 135392560 215 1154 0  0  0  0  0 48 48  0  0  96485 125361  95570 1  1 98

 

In one case, a T5-2 had a fan mounted in reverse so that one was pushing while the other was pulling air.  Ensure that all fans direct airflow toward the rear of the chassis.

 

System ambient temperature can be obtained by a Snapshot:

ipmi/@usr@local@bin@ipmiint_sensor_list.out
/SYS/T_AMB | 18.000 | degrees C | ok | na | na | na | na | 48.000 | 55.000

The ambient temperature should be around 20C, but fan speeds could increase if the customer's ambient temperature is above 25-28C.

 

Output from Dynamic Voltage Frequency Scaling (DVFS) can indicate over temp or over current CPU cores.  SR 3-13547494091 contains an example of a system where the fans operate at full speed, due to excessive CPU currents. The following Snapshot data can be used to indicate if any cores have been throttled due to excessive current over time on some cores:

ilom/statistics/@usr@local@bin@statistics_-p.out
name=hwpcap/HOST/cpu0/total_throttle_count val=3309168 created on=Sun Dec 21 20:07:27 2014 last updated on=Thu Oct 27 17:20:09 2016
name=hwpcap/HOST/cpu0/iwarn_throttle_count val=2908030 created on=Sun Dec 21 20:07:27 2014 last updated on=Thu Oct 27 17:20:10 2016
name=hwpcap/HOST/cpu1/total_throttle_count val=7528362 created on=Sun Dec 21 20:07:27 2014 last updated on=Thu Oct 27 17:20:10 2016
name=hwpcap/HOST/cpu1/iwarn_throttle_count val=6531687 created on=Sun Dec 21 20:07:27 2014 last updated on=Thu Oct 27 17:20:10 2016
name=hwpcap/HOST/cpu2/total_throttle_count val=365273 created on=Sun Dec 21 20:07:27 2014 last updated on=Thu Oct 27 17:20:10 2016
name=hwpcap/HOST/cpu2/iwarn_throttle_count val=355371 created on=Sun Dec 21 20:07:27 2014 last updated on=Thu Oct 27 17:20:10 2016
name=hwpcap/HOST/cpu5/total_throttle_count val=3127171 created on=Sun Dec 21 20:07:27 2014 last updated on=Thu Oct 27 17:20:10 2016
name=hwpcap/HOST/cpu5/iwarn_throttle_count val=2666900 created on=Sun Dec 21 20:07:27 2014 last updated on=Thu Oct 27 17:20:10 2016
name=hwpcap/HOST/cpu6/total_throttle_count val=113971 created on=Sun Dec 21 20:07:27 2014 last updated on=Thu Oct 27 17:20:10 2016
name=hwpcap/HOST/cpu6/iwarn_throttle_count val=113914 created on=Sun Dec 21 20:07:27 2014 last updated on=Thu Oct 27 17:20:10 2016

In this case, CPUs 0, 1 & 5 were regularly throttled back due to over current (iwarn), and CPUs 2 & 6 were scaled back some times.


A Snapshot should be gathered when the fans are at high speed so that the temperature & current sensors can also be checked. In the case below, P0C1 (CPU1) has very high current:

ipmi/@usr@local@bin@ipmiint_sensor_list.out
P0C0/I_VCORE | 194.000 | Amps | ok | na | na | na | na | na | na
P0C0/T_CORE_VIRT | 76.000 | degrees C | ok | na | na | na | na | na | na
P0C1/I_VCORE | 200.000 | Amps | ok | na | na | na | na | na | na
P0C1/T_CORE_VIRT | 75.000 | degrees C | ok | na | na | na | na | na | na
P1C0/I_VCORE | 160.000 | Amps | ok | na | na | na | na | na | na
P1C0/T_CORE_VIRT | 65.000 | degrees C | ok | na | na | na | na | na | na
P1C1/I_VCORE | 144.000 | Amps | ok | na | na | na | na | na | na
P1C1/T_CORE_VIRT | 58.000 | degrees C | ok | na | na | na | na | na | na
P2C0/I_VCORE | 162.000 | Amps | ok | na | na | na | na | na | na
P2C0/T_CORE_VIRT | 67.000 | degrees C | ok | na | na | na | na | na | na
P2C1/I_VCORE | 170.000 | Amps | ok | na | na | na | na | na | na
P2C1/T_CORE_VIRT | 68.000 | degrees C | ok | na | na | na | na | na | na
P3C0/I_VCORE | 170.000 | Amps | ok | na | na | na | na | na | na
P3C0/T_CORE_VIRT | 68.000 | degrees C | ok | na | na | na | na | na | na
P3C1/I_VCORE | 166.000 | Amps | ok | na | na | na | na | na | na
P3C1/T_CORE_VIRT | 63.000 | degrees C | ok | na | na | na | na | na | na

In this case, CPU0 & CPU1 have much higher currents than the others so are seen as the reason for fan spin-up.

 

If a small number of CPUs are affected, then an Explorer from the control domain can be used to determine the configuration of the LDoms.  Inspect the following file to view the configuration:

sysconfig/ldm_list_-l.out
NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME
primary active -n-cv- UART 320 654080M 8.3% 8.3% 47d 25m
  Procs: 0 - 4 40 Cores: 0 to 71
ss-dom2 active -n---- 5002 192 384G 0.3% 0.3% 47d 13m
  Procs: 2 - 7 24 Cores: 32 to 127
ss-dom3 active -n---- 5003 64 128G 0.1% 0.1% 47d 20m
  Proc: 4 8 Cores: 72 to 79
ss-dom4 active -n---- 5004 224 448G 7.2% 7.2% 19d 8h 15m
  Procs: 2 - 5 28 Cores: 36 to 95
ss-dom5 active -n--v- 5005 224 448G 2.5% 2.5% 47d 20m
  Procs: 6 - 7 28 Cores: 96 to 125
ssccn1-dom1 inactive ------ 32 64G

The current configuration could be resolve by providing more processor resources to the primary domain, but please note that CPUs 2, 5, & 6 have also caused throttling in the past.  Rebalancing the CPU resources may not be easy in this case.

The customer should optimize LDom memory & cache configurations by configuring LDoms on one processor & by not configuring across core clusters.

Also note that the addition of certain PCI cards, such as Flash Accelerator cards will automatically cause the fans to spin up when present in the system.

 

Related documents:
  HOWTO Configure OVM (LDom) to Minimize Memory Latency (Doc ID 2030516.1)
  How to Save Power on SPARC T5, SPARC M5, and SPARC M6 Servers (Doc ID 1610270.1)
  Analyzing Performance of Chip Multi-Threading (CMT) Servers (Doc ID 1343999.1)
  Oracle® VM Server for SPARC 3.1 Administration Guide http://docs.oracle.com/cd/E38405_01/html/E38406/pmfeatures.html#indexterm-id-760
  Bug 18811376 : NEED TO CHANGE IFC MIN_FAN_SPEED IF AURA2 CARD IS INSTALLED.


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback