![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Technical Instruction Sure Solution 1011650.1 : Sun Enterprise[TM] 3X00-6X00 Servers: Board Temperature Information
PreviouslyPublishedAs 215972 Applies to:Sun Enterprise 5000 Server - Version All Versions and laterSun Enterprise 5500 Server - Version All Versions and later Sun Enterprise 6000 Server - Version All Versions and later Sun Enterprise 6500 Server - Version All Versions and later Solaris SPARC Operating System - Version 8.0 and later All Platforms GoalThis document provides an optimal temperature specification for Sun Enterprise[TM] classic systems. This document also describes how to tune the system environment to eliminate most known, transient errors. Fix
The sampled temperature is used to drive the speed of the cooling fans enclosed in the 300 watt Power Cooling Modules (PCMs). Note: A memory-only CPU/memory board does not provide any temperature data because no thermistors are installed for monitoring the temperatures of DIMMs. However, memory DIMMs do not generate a significant amount of heat, so system reliability is not adversely affected in any way. Software control is performed using a polling mechanism implemented in the Solaris[TM] Operating System (Solaris OS) that reads the temperature registers every 2 seconds. If the temperature reaches a "Yellow Zone" threshold, the system, using console messages, emits warnings. If the temperature reaches a "Red Zone" threshold, the system continues and repeats the warning. If the temperature for the affected component stays in the red zone for 20 seconds or longer, the system either powers down the component or powers itself down entirely, depending on the implementation level of the product. Monitoring software sets the "Yellow Zone" at 60 degrees celsius for CPU/memory Boards, I/O boards and the clock board. The "Red Zone" is set to temperatures at 68 degrees celsius on all boards. Finding Nominal Temperatures of the System Boards: A recent study of transient system errors shows that these errors could be greatly reduced by maintaining an environment that is optimal for the hardware. Temperature and humidity auditing can provide you with data to achieve these optimal temperatures. An intake temperature of 70 Deg F or 21.11 Deg C and an RH% 45% - 50% should bring a Sun Enterprise classic server into compliance to achieve optimal numbers. 2) The next step is to find the present nominal temperature. To find the present nominal temperature, obtain the output of a prtdiag -v command from the suspect system. The output should be current and obtained after at least 168 hours (7 days) of up time. Simply add the temperatures of all the CPU/Memory boards (because the CPU/Memory boards contain the most temperature critical components), and then divide the total by the number of CPU/Memory boards to get an average. For Example:
System Temperatures (Celsius) ----------------------------- Board State Current Min Max Trend --- ------- ------- --- --- ------ 0 OK 29 28 32 stable <-temp should vary less then 5.5 Deg. C. 1 OK 39 38 48 stable < I/O Board not included in calculation 2 OK 29 30 32 stable 3 OK 41 40 45 stable < I/O Board not included in calculation 4 OK 30 28 32 stable 5 OK 31 32 35 stable 6 OK 32 31 36 stable 7 OK 32 31 33 stable CLK OK 33 30 34 stable < Clock Board not included in calculation. In the example, the nominal temperature of this system's system boards is 30.5C: 29+29+30+31+32+32+33=183/6=30.5 ASIC Revisions Brd FHC AC SBus0 SBus1 PCI0 PCI1 FEPS Board Type --- --- -- ----- ----- ---- ---- ---- ---------- 0 1 5 CPU 1 1 5 1 1 22 Dual-SBus-SOC+ <-I/O board not calculated 2 1 5 CPU 3 1 5 1 1 22 Dual-SBus-SOC+ <-I/O board not calculated 4 1 5 CPU 5 1 5 CPU 6 1 5 CPU 7 1 5 CPU Specifications for Temperature Zones Solaris OS 2.5.1 with patch 103640-33 or later Solaris OS 2.6 with sysctrl driver patch 105181-25 or later Solaris OS 7 with kernel patch 106541-11 or later Solaris OS 8 with fhc driver patch 108528-04 or later When the patches are installed, warning messages appear at 60 degrees C, and a power down sequence of overheated CPU modules occurs at a new danger limit setting of 68 degrees C. These temperatures are lower than the standard default limits of 73 degrees C (for warning messages) and 83 degrees C (for a danger limit). --------------------------------------------------------------------------- Board Type Yellow Temps Red Temps Optimal Temps Optimal/Nominal --------------------------------------------------------------------------- CPU 24C - 60C 68C 28C-32C 30C I/O 24C - 60C 68C 28C-46C 38C CLOCK 24C - 60C 68C 28C-32C 30C Note: 1) Severe temperature or relative humidity swings should be avoided. 2) A CPU temperature above 40 degrees centigrade might be within *** Operating: 5 C to 35 C (41 F to 95 F) *** If the room is generally in compliance and the system boards are running above 40 degrees C., it is possible that the machine is installed in a "hotspot." You should investigate and consider managing cooling around the machine to bring it into compliance. 0 OK 29 28 32 stable<-temp shouldn't vary more then 5.5 Deg C
Attachments This solution has no attachment |
||||||||||||
|