![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Technical Instruction Sure Solution 1618516.1 : How to investigate the Auto Service Request "ASR: Thermal over-temperature warning" on External I/O Expansion Unit
Applies to:Sun SPARC Enterprise M8000 Server - Version All Versions and laterSun SPARC Enterprise M5000 Server - Version All Versions and later Sun SPARC Enterprise M9000-32 Server - Version All Versions and later Sun SPARC Enterprise M9000-64 Server - Version All Versions and later Sun External I/O Expansion Unit - Version Not Applicable to Not Applicable [Release N/A] Information in this document applies to any platform. GoalThis article describes activity required by a System Administrator to verify what action has to be taken on a External I/O Expansion Unit encountering an over-temperature condition. SolutionAuto Service Request (ASR) provides automatic failure detection and SR creation for Oracle SPARC systems. See https://www.oracle.com/asr for more information on ASR. This particular ASR event has been created in Auto-Close mode, i.e. unless you update the SR, it will automatically close within two weeks. Description of the ASR Event:Thermal events on the IO Expansion Box need to be physically investigated on the platform for visual signs of blocked airflow, or other environmental issues. Additional checks need to be performed in order to understand the cause of this ASR event. If a persistent failure has occurred the Service Request needs to be assigned to a Support Engineer for further investigation. Alternatively if thermal event or events cannot be explained by changes to airflow around the platform or work being carried out on the machine then please update the SR with your findings and it will be assigned to a Support Engineer. If the event has been been caused by changes in site environmentals or a similar event then no action need be taken and the SR will automatically close within two weeks. Please find an example ASR alarm at the bottom of this document. 1) Ensure the current ambient temperature of your system is within the recommended range. Run the ioxadm command, as shown below, to get a full report of the current temperature values. If the temperature values are back in range, the platform needs to be monitored for any other temperature issues. If this warning seems to be an isolated event the issue can be cleared. Example: XSCF> ioxadm -v env Location Sensor Min Min Alarm Value Max Alarm Max Units IOX@X031 ACTIVE - - On - - LED IOX@X031 LOCATE - - Off - - LED IOX@X031 OVERTEMP - - Off - - LED IOX@X031 SERVICE - - Off - - LED IOX@X031/PS0 DCOK - - On - - LED IOX@X031/PS0 POWER - - On - - LED IOX@X031/PS0 RDY2RM - - Off - - LED IOX@X031/PS0 SERVICE - - Off - - LED IOX@X031/PS0 T_AMBIENT -128.000 - 26.000 37.000 127.000 C IOX@X031/PS0 T_CHIP -128.000 - 27.000 37.000 127.000 C IOX@X031/PS0 T_HOTSPOT -128.000 - 29.000 90.000 127.000 C IOX@X031/PS0 SWITCH - - On - - SWITCH ... 2) Physical inspection of the platform and surrounding area maybe needed to determine if this temperature warning is isolated to the platform, or to the area around the platform. If the surrounding area around the platform does not seem to be high in temperature in general, there maybe debris or some kind of obstruction in the way of airflow. Any other surrounding platforms may also have their exhaust directly impacting the air temperature around the platform having the warning. If other nearby platforms also are experiencing higher then expected temperatures, investigation into the data-centers air flow will be needed to help control the surrounding temperature. 3) Ensure there are no existing Fan faults. Step 1:Collect the fault message. Example: XSCF> fmdump -m MSG-ID: IOXSCF-8000-NH , TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Tue Mar 27 05:59:59 PDT 2007 PLATFORM: SPARC-Enterprise, CSN: BE80601000, HOSTNAME: server-0 SOURCE: sde, REV: 1.12 EVENT-ID: e37f42ad-946d-4e52-8952-3eb3e4c7da21 DESC: A thermal sensor is above the high warning threshold in an External I/O Expansion Unit FRU Refer to http://www.sun.com/msg/IOXSCF-8000-NH for more information. AUTO-RESPONSE: Domains using the affected hardware may be shut down. IMPACT: Interruption of service to the attached domains if the domain is shut down. REC-ACTION: Check ambient temperatures in the environment. Step 2:Collect the "fmdump" output. Example: XSCF> fmdump -vu e37f42ad-946d-4e52-8952-3eb3e4c7da21 TIME UUID MSG-ID Mar 27 05:59:59.1975 e37f42ad-946d-4e52-8952-3eb3e4c7da21IOXSCF-8000-NH 100% fault.chassis.iox.env.temp.over-warn Problem in: hc:///iox=983392/ps=0/thermctrl=0/t_ambient=0 Affects: - FRU: hc://:product-id=SPARC-Enterprise:chassisid=BE80601000:server-id=server- 0:serial=T00560:part=3001701:revision=02/component=IOX@X031/PS0 Step 3: If a failure is not persistent, no further action is required. The SR will automatically close within two weeks.. If the failure has been verified as persistent or is a cause of concern, please update the SR with your findings and it will be assigned to a Support Engineer. References<NOTE:1021477.1> - IOXSCF-8000-NH - Thermal over-temperature warningAttachments This solution has no attachment |
||||||||||||
|