Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1577237.1
Update Date:2016-01-07
Keywords:

Solution Type  Problem Resolution Sure

Solution  1577237.1 :   Running bdacheckcluster on Oracle Big Data Appliance raises: Hardware errors reported by ILOM : fault.chassis.env.temp.over-fail  


Related Items
  • Big Data Appliance X3-2 Full Rack
  •  
  • Big Data Appliance X3-2 In-Rack Expansion
  •  
  • Big Data Appliance Hardware
  •  
  • Big Data Appliance X3-2 Starter Rack
  •  
Related Categories
  • PLA-Support>Eng Systems>BDA>Big Data Appliance>DB: BDA_EST
  •  




In this Document
Symptoms
Cause
Solution
References


Applies to:

Big Data Appliance X3-2 Starter Rack - Version All Versions and later
Big Data Appliance Hardware - Version All Versions and later
Big Data Appliance X3-2 Full Rack - Version All Versions and later
Big Data Appliance X3-2 In-Rack Expansion - Version All Versions and later
Linux x86-64

Symptoms

Running the bdacheckcluster utility to verify the health of the BDA cluster raises: "WARNING: Hardware errors reported by ILOM : fault.chassis.env.temp.over-fail" for one or more servers in the cluster:

# bdacheckcluster
  
...
nodebda12: WARNING: Hardware errors reported by ILOM : fault.chassis.env.temp.over-fail
nodebda12: INFO: Run 'ipmitool sunoem cli "show faulty"' to see the full error
nodebda12: WARNING: Big Data Appliance warnings during hardware validation checks
...
nodebda10: WARNING: Hardware errors reported by ILOM : fault.chassis.env.temp.over-fail
nodebda10: INFO: Run 'ipmitool sunoem cli "show faulty"' to see the full error
nodebda10: WARNING: Big Data Appliance warnings during hardware validation checks
...


Runnng the badcheckhw utility to check the hardware profile for the server(s) reporting the fault reports the same.  For example running "bdacheckhw" on nodebda12:

# bdacheckhw
... 
WARNING: Hardware errors reported by ILOM : fault.chassis.env.temp.over-fail
INFO: Run 'ipmitool sunoem cli "show faulty"' to see the full error
WARNING: Big Data Appliance warnings during hardware validation checks



Further investigation with ipmitool on the node reporting the fault confirms:

# ipmitool sunoem cli "show faulty"
  
Connected. Use ^D to exit.
-> show faulty
Target              | Property               | Value
--------------------+------------------------+---------------------------------
/SP/faultmgmt/0     | fru                    | /SYS
/SP/faultmgmt/0/    | class                  | fault.chassis.env.temp.over-fail
 faults/0           |                        |
...

  

 

Cause

This hardware issue can result from several conditions:
- The specified hardware is overheating
- There is a faulty sensor
- The firmware reports a false alert

Solution

Try clearing the fault by 1) logging into the ILOM for the server with the "fault.chassis.env.temp.over-fail" and 2) clearing the fault.  If the fault condition returns, raise a SR with Oracle Support.


1. Login to the Oracle ILOM Web GUI as root.  For details see:

How to Use the Oracle Integrated Lights Out Manager (ILOM) on Oracle Big Data Appliance (Doc ID 1475201.1)


2. The Overview page shows a System Status:  Faulted

View the Faulted Hardware:

ilom 1

3. Clear the Fault:

a) Select the 'Components' tab

b) Select the component with "Fault Status"

c) Choose 'Clear Faults' from the Actions Dropdown

Select 'Clear Faults' from 'Components' tab:

ilom 2


d) Confirm that you want to clear the fault by selecting 'OK'


Confirm:

ilom 3


4. Return to 'Overview' tab to confirm the fault is cleared

Verify Fault is Cleared:

new ilom 4


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback