Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1544182.1
Update Date:2015-12-08
Keywords:

Solution Type  Problem Resolution Sure

Solution  1544182.1 :   EXADATA HW: High Temperature (CL_TEMP) Alert From One Of The Cell Nodes  


Related Items
  • Exadata X3-2 Hardware
  •  
  • Exadata Database Machine X2-2 Qtr Rack
  •  
  • Exadata X3-2 Half Rack
  •  
  • Exadata X3-2 Full Rack
  •  
  • Exadata Database Machine X2-8
  •  
  • Exadata Database Machine X2-2 Full Rack
  •  
  • Exadata X3-8 Hardware
  •  
  • Exadata Database Machine X2-2 Half Rack
  •  
  • Exadata X3-2 Eighth Rack
  •  
  • Exadata Database Machine X2-2 Hardware
  •  
  • Exadata X3-2 Quarter Rack
  •  
  • Exadata Database Machine V2
  •  
Related Categories
  • PLA-Support>Sun Systems>x86>Engineered Systems HW>SN-x64: EXADATA
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-7036165931>

Applies to:

Exadata Database Machine X2-8 - Version All Versions and later
Exadata Database Machine V2 - Version All Versions and later
Exadata X3-2 Hardware - Version All Versions and later
Exadata Database Machine X2-2 Qtr Rack - Version All Versions and later
Exadata Database Machine X2-2 Half Rack - Version All Versions and later
Information in this document applies to any platform.

Symptoms

 The cell server generates an alert for the server's ambient operating temperature:

Critical: Threshold Alert 7_1

2013-04-10T05:23:53-04:00

 

The critical threshold for the following metric has been crossed.

Metric Name

CL_TEMP

Metric Description

Temperature (Celsius) of the server, provided by the BMC

Object Name

cell1

Current Value

32.4 C

Threshold Value

32.0 C

 

Name

cell1

Server Model

Oracle Corporation SUN FIRE X4270 M2 SERVER SAS

Chassis Serial Number

S0MESERNUM

Release Version

11.2.3.2.1

Release Label

OSS_11.2.3.2.1_LINUX.X64_130109

 

This will also be recorded in the cell's "alerthistory" log:

cell1: 7_1 2013-04-10T05:23:53-04:00 critical "The critical threshold for the following metric has been crossed.
Metric Name        : CL_TEMP  Metric Description : Temperature (Celsius) of the server, provided by the BMC
Object Name        : cell1  Current Value      : 32.4 C  Threshold Value    : 32.0 C  "

Cause

The cell servers contain an ambient temperature sensor that monitors the ambient air temperature that the system is located in. When the air has become too warm, the sensor reading may cross the pre-determined threshold of 32.0C and generate the critical alert. 

Solution

The cell server monitors the values that ILOM reads, and generates the alert when the reading is above 32.0 C. Despite the messaging in the alert indicating it may shutdown, no Exadata system or individual cell system action other than the alert is triggered based on this specific ambient threshold. This warning may not be immediately critical as described but is intended to be a warning to the data center administrator that the Exadata system is operating in an environment with a potential cooling problem they need to investigate. Cooling control for the system is based on internal sensors near the temperature sensitive components, so immediate shutdown actions would only occur if those components reached their own thresholds, which should not occur until the system gets much warmer than 32C.

The ambient sensor location on the cell is on the left front ear with the power button/LED's on X3-2L and X4270M2 based Storage Cells (X3-2, X2-2, X3-8, X2-8), or located on the fan board in X4275-based Storage Cells (V2). Normally this will report close to actual ambient of the room, however its possible in some scenarios this sensor can vary slightly from the actual ambient in the room due to the location and transfer of heat to the sensor.  If any variation is observed, the variation should not be more than 3-5C. Some cells may also be more susceptible to triggering this alert due to their location in the rack, compared to the source of the cool air flow being used to cool the system. 

Regardless of any individual server programmed ILOM thresholds, Oracle Exadata Database Machine has an operating range of 5 to 32 degrees Celsius (41 to 89.6 degrees Fahrenheit), which this alert is based upon. The threshold for this alert is not adjustable, and cannot be disabled. Operating Oracle Exadata Database Machine for extended periods at or near the operating range limits could significantly reduce the lifetime of the hardware components and increase power utilization. The ambient temperature range of 21 to 23 degrees Celsius (70 to 74 degrees Fahrenheit) is optimal for server reliability and operator comfort. Most computer equipment can operate in a wide temperature range, but near 22 degrees Celsius (72 degrees Fahrenheit) is desirable because it is easier to maintain safe humidity levels. Operating in this temperature range should maintain the ambient sensor reading well below the 32C threshold, and never generate such an alert unless there is an external facility failure with the cooling system.

There are 3 possible actions that could be taken if this alert is being reported:

1. Investigate and adjust the data center conditions so that the Oracle Exadata Database Machine is operating in its specified range. If investigation leads to identifying a specific one-time problem in the data center cooling conditions, this should be rectified by the data center management. 

If the alert is being generated by multiple cells in the Exadata rack, and with regular occurance, then something non-obvious may be the cause and may require more rigourous investigation and experimentation. This may include, but not limited to, increasing air flow into the rack, removing any potential blockage from foreign matter, reducing the possibility of hot-aisle/cold-aisle air mixing creating artificially warm conditions, reducing the normal room ambient on the cold-aisle side, increasing air extraction out from the hot-aisle, or relocation of the rack to a different part of the data center. 

Note: Due to the location of the sensor on X4270M2 based Storage Cells used on X2-2 and X2-8 systems, the temperature alert is disabled in OS Image 11.2.3.3.0 and later, to avoid false positives. Upgrading to this version may be the appropriate solution if data center conditions cannot be adjusted further. See Bug 16745871 for details

2. Monitor the data center conditions independently, and use that for controlling the AC and air-flow. If the monitoring indicates the Exadata system is still operating within an ambient of 5 to 32C then the alert can be safely ignored. This may be the preferred option if only one cell is generating the occasional alert.

3. On X3-2 and X3-8 systems, a manufacturing assembly problem with the ambient sensor cable may generate falsely high readings that do not reflect actual ambient sensors. If this is suspected, then a SR should be opened for Oracle Support to investigate and correct as needed.

References

<BUG:16745871> - DISABLE THE BUILT-IN CELL AMBIENT THRESHOLD

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback