Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1455473.1
Update Date:2015-04-28
Keywords:

Solution Type  Troubleshooting Sure

Solution  1455473.1 :   How to Reboot/Power Cycle ILOM When it's Hung  


Related Items
  • Exadata X3-8 Hardware
  •  
  • Oracle Exadata Hardware
  •  
  • Oracle Exalogic Elastic Cloud X2-2 Half Rack
  •  
  • Exadata X3-8b Hardware
  •  
  • Exalytics In-Memory Machine X2-4
  •  
  • Exalytics In-Memory Machine X3-4
  •  
  • Exalogic Elastic Cloud X3-2 Hardware
  •  
  • Exadata X4-2 Hardware
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Exadata>DB: Exadata_EST
  •  
  • _Old GCS Categories>ST>Server>Engineered Systems>Exadata>Administration and Configuration
  •  




Applies to:

Exalytics In-Memory Machine X3-4 - Version All Versions and later
Oracle Exadata Hardware - Version 11.2.0.1 to 11.2.0.3 [Release 11.2]
Exalytics In-Memory Machine X2-4
Exalogic Elastic Cloud X3-2 Hardware
Oracle Exalogic Elastic Cloud X2-2 Half Rack
Information in this document applies to any platform.

Purpose

<p >*Purpose
<span >Enter a clear description of what the document is trying to achieve.

 Present the different options available when ILOM is hung or not responding

Troubleshooting Steps

Symptoms

  • ipmitool commands will hang
  • Example of messages reported on ms-odl.trc file
[2012-05-03T06:00:16.854-07:00] [ossmgmt] [NOTIFICATION] [CELL-05507] [ms.core.MSCoreImpl] [tid: 15] [ecid: 10.215.44.48:66829:1336006280766:5,0] MS-CELLSRV syncDiskOnce completed.
[2012-05-03T07:00:11.499-07:00] [ossmgmt] [ERROR] [] [ms.core.MSCellMetricTimerTask] [tid: 15] [ecid: 10.215.44.48:66829:1336006280766:5,0] oracle.ossmgmt.common.core.SageException: CELL-00506: Call updateHealth first.[[
    at oracle.ossmgmt.common.hwadapter.HardwareImpl.getSensorOp(HardwareImpl.java:763)
    at oracle.ossmgmt.common.hwadapter.SN1HardwareImpl.getTempValue(SN1HardwareImpl.java:140)
    at oracle.ossmgmt.common.hwadapter.HardwareImpl.getTempReading(HardwareImpl.java:247)
    at oracle.ossmgmt.ms.hwadapter.MSHardwareImpl.getTempReading(MSHardwareImpl.java:582)
    at oracle.ossmgmt.ms.core.MSCellMetricDef.collect(MSCellMetricDef.java:425)
    at oracle.ossmgmt.ms.core.MSCellMetricTimerTask.run(MSCellMetricTimerTask.java:107)
    at java.util.TimerThread.mainLoop(Timer.java:512)
    at java.util.TimerThread.run(Timer.java:462)

]]
[2012-05-03T08:00:13.062-07:00] [ossmgmt] [NOTIFICATION] [] [ms.core.MSHwPollTimerTask] [tid: 15] [ecid: 10.215.44.48:66829:1336006280766:5,0] An exception occurred during ILOM mining[[
oracle.ossmgmt.common.core.SageException: CELL-04001: The command "ipmitool sunoem cli start /SP/faultmgmt/shell y fmadm faulty" returned an exit status: 1.
    at oracle.ossmgmt.ms.hwadapter.bmcadp.MSIpmiIlomBmcAdapterImpl.returnCmd(MSIpmiIlomBmcAdapterImpl.java:1620)
    at oracle.ossmgmt.ms.hwadapter.bmcadp.MSIpmiIlomBmcAdapterImpl.mineForILOMFaults(MSIpmiIlomBmcAdapterImpl.java:699)
    at oracle.ossmgmt.ms.hwadapter.bmcadp.MSIpmiIlomBmcAdapterImpl.mineForMissedBMCTraps(MSIpmiIlomBmcAdapterImpl.java:680)
    at oracle.ossmgmt.ms.hwadapter.MSHardwareImpl.mineForMissedAlerts(MSHardwareImpl.java:727)
    at oracle.ossmgmt.ms.core.MSHwPollTimerTask.run(MSHwPollTimerTask.java:195)
    at java.util.TimerThread.mainLoop(Timer.java:512)
    at java.util.TimerThread.run(Timer.java:462)

]]
[2012-05-03T10:46:30.583-07:00] [ossmgmt] [NOTIFICATION] [] [ms.core.MSOSSComm] [tid: 16] [ecid: 10.215.44.48:66829:1336067190548:8,0] Got oss_open err 12. Retry once.
[2012-05-03T10:46:30.585-07:00] [ossmgmt] [NOTIFICATION] [] [ms.core.MSOSSComm] [tid: 16] [ecid: 10.215.44.48:66829:1336067190548:8,0] OSS Version compatible. MS-OSS Communication successful.
[2012-05-03T10:46:30.586-07:00] [ossmgmt] [NOTIFICATION] [CELL-05506] [ms.core.MSCoreImpl] [tid: 16] [ecid: 10.215.44.48:66829:1336067190548:8,0] Sync disk lists from CELLSRV.
[2012-05-03T11:00:00.582-07:00] [ossmgmt] [WARNING] [] [ms.core.MSHwPollTimerTask] [tid: 15] [ecid: 10.215.44.48:66829:1336006280766:5,0] Error updating health from BMC.[[
oracle.ossmgmt.common.core.SageException: CELL-04001: The command "ipmitool sunoem led get LOCATE" returned an exit status: 1.
    at oracle.ossmgmt.ms.hwadapter.bmcadp.MSIpmiIlomBmcAdapterImpl.returnCmd(MSIpmiIlomBmcAdapterImpl.java:1620)
    at oracle.ossmgmt.ms.hwadapter.bmcadp.MSIpmiIlomBmcAdapterImpl.getChassisIDLEDState(MSIpmiIlomBmcAdapterImpl.java:1464)
    at oracle.ossmgmt.common.hwadapter.HardwareImpl.getChassisIDLEDState(HardwareImpl.java:415)
    at oracle.ossmgmt.ms.hwadapter.MSHardwareImpl.getChassisIDLEDState(MSHardwareImpl.java:647)
    at oracle.ossmgmt.ms.core.MSHwPollTimerTask.run(MSHwPollTimerTask.java:274)
    at java.util.TimerThread.mainLoop(Timer.java:512)
    at java.util.TimerThread.run(Timer.java:462)

 

If MS suspect that the ILOM is hung or not responding, then it will try below methods/commands to reset the ILOM:

  • First will try cold reset method:

 #ipmitool mc reset cold

  • if above method fails to reset, then  try different path/interface method:

#ipmitool sunoem cli 'reset -script /SP'

  • If both methods are not much useful to bring the ILOM back to normal, then MS will send cell alert/ASR message like below to reset the ILOM manually.

 

Info: "ILOM has stopped responding, and did not reset after issuing reset commands"
Action:
"Manual intervention is necessary to power cycle the ILOM. Use SSH to connect to the ILOM from this cell or another machine.
At the ILOM prompt, enter 'reset /SP'. 
If unable to connect using SSH, then try res etting ilomserver by login to ILOM/Remote console (Go to tab Maintenance -> Rese tSP -> and click on 'ResetSP' button).
If that also doesn't help, then unplug the ILOM power supply. This action power cycles the server as well as the ILOM."
 
  • At this stage, the final option is:

    The SP does have a watchdog, so a reset should be forced if the SP hangs during normal operation -

    There is a physical button that will reset the SP

V2 & X2-2 (for both DB nodes and Storage Cells)

there are three 'pin holes' between NET0 and NET MGT
The one insided of the red circle will cause the SP to reset. The other two are NMI and Host Reset.

 SunFireX4170_M2

 

X3-2 (db node) is here, under Net Mgt port

SunFireX4170_M3

 

X3-2L (cell node) is here, under the Net Mgt port and USB ports in the middle

 SunFireX4270_M3

References

http://docs.oracle.com/cd/E22368_01/pdf/E22359.pdf
http://docs.oracle.com/cd/E19762-01/E27205/z4000c5e1045174.html

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback