Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1621318.1
Update Date:2016-06-01
Keywords:

Solution Type  Problem Resolution Sure

Solution  1621318.1 :   Oracle Big Data Appliance Server Unreachable due to Crashing/Powering Down - ILOM Snapshot Reports: fault.chassis.security.enclosure  


Related Items
  • Big Data Appliance X3-2 Hardware
  •  
Related Categories
  • PLA-Support>Eng Systems>BDA>Big Data Appliance>DB: BDA_EST
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-8434381840>

Applies to:

Big Data Appliance X3-2 Hardware - Version All Versions and later
Linux x86-64

Symptoms

One of the Oracle Big Data Appliance server/s continually crashes and powers down until:

1. It can no longer be reached via ssh

2. From the ILOM "start /SYS" does not bring up the node

-> start /SYS


3. From the ILOM "show faulty" shows fault.chassis.security.enclosure

-> show faulty
  
Target              | Property               | Value
--------------------+------------------------+---------------------------------
...
/SP/faultmgmt/0/    | class                  | fault.chassis.security.enclosure
...

Note: no other software applications/hadoop/etc. may be running at this point.

Many cases of "fault.chassis.security.enclosure" are observed from the ILOM Snapshot.  In this state the ILOM Snapshot for the server needs to be collected from the ILOM following the steps in: How to run an ILOM Snapshot on a Sun/Oracle X86 System (Doc ID 1448069.1)


1. From "ilom/@usr@local@bin@spshexec_show_-script_@X@logs@event@list.out", output is like:

597    Fri Jan 24 08:30:31 2014  Power     Log       critical
       Host power-on denied because of chassis intrusion.
596    Thu Jan 23 16:35:28 2014  Power     Log       critical
       Host power-on denied because of chassis intrusion.
...
592    Thu Jan 23 16:27:50 2014  Power     Log       critical
       Host power-on denied because of chassis intrusion.
591    Thu Jan 23 16:19:29 2014  Power     Log       critical
       Host power-on denied because of chassis intrusion.
590    Thu Jan 23 14:50:23 2014  Fault     Fault     critical
       Fault detected at time = Thu Jan 23 14:50:23 2014. The suspect component:
        /SYS has fault.chassis.security.enclosure.open with probability=100. Ref
       er to http://www.sun.com/msg/SPX86-8003-8C for details.


2. From ""ilom/@usr@local@bin@spshexec_show_faulty.out":

/SP/faultmgmt/0/    | class                  | fault.chassis.security.enclosure


3. From "ilom/@usr@local@bin@spshexec_show_-d_properties_-level_all_@.out":       

/SP/faultmgmt/0/faults/0
    Properties:
        class = fault.chassis.security.enclosure.open
        sunw-msg-id = SPX86-8003-8C



Cause

From the ILOM Snapshot output at: "fma/@usr@local@bin@fmadm_faulty.out" the problem looks to be due to an open chassis. 

$ more fma/@usr@local@bin@fmadm_faulty.out
------------------ ------------------------------------ -------------- --------
Time UUID msgid Severity
------------------- ------------------------------------ -------------- --------
2014-01-23/14:50:23 <id> SPX86-8003-8C Critical

Fault class : fault.chassis.security.enclosure.open

ASRU : /SYS
  faulted

FRU : /SYS
  (Part Number: <part number>)
  (Serial Number: <serial number>) 100%
  faulty

Description : The top cover of server was opened while AC input was still
  applied to the power supplies.

Response : The service-required LED on the chassis will be illuminated.

Impact : The server will be powered down immediately.

Action : Please refer to the associated reference document at
  http://www.sun.com/msg/SPX86-8003-8C for the latest service
  procedures and policies regarding this diagnosis.

  

The fault "fault.chassis.security.enclosure.open" indicates that the node is refusing to power on because the chassis is open.  It is a security measure built into the system that it will not power on with an open chassis.

Verify in the lab if the chassis on the server unable to power on is open.  If the chassis looks to be closed then the fault reported can be due to a:

faulty sensor, switch
issue with the enclosure i.e. it is not completely closed/not tightly shut/etc.,
cable connection
magnet
...

  

 

Solution

File an SR in My Oracle Support with the BDA team to have an Onsite Field Engineer check for any of the above and to make sure that nothing is loose, faulty, etc. and replace any parts as needed.

References

<NOTE:1448069.1> - How to run an ILOM Snapshot on a Sun/Oracle X86 System

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback