![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||
Solution Type Problem Resolution Sure Solution 1636570.1 : BDA Server Inaccessible - ILOM Power Cycle Fails with "Power to server is not available due to a malfunctioning component detected by CPLD"
In this Document
Created from <SR 3-8649479168> Applies to:Big Data Appliance X3-2 Hardware - Version All Versions and laterBig Data Appliance X5-2 Starter Rack Linux x86-64 SymptomsAn X3-2 server on an Oracle Big Data Appliance (BDA) Oracle NoSQL DB cluster goes down. Trying to bring up the server via, "How to Power Cycle Oracle Big Data Appliance Node using ILOM when the node is NOT reachable using Ping/SSH (Doc ID 1550440.1)", fails with the following error on the ILOM console: Description: Power to server is not available due to a malfunctioning component detected by CPLD
Other symptoms are: 1. bdacheckcluster reports an ILOM fault.chassis.domain.boot.power- hardware error: # bdacheckcluster
... WARNING: Hardware errors reported by ILOM : fault.chassis.domain.boot.power- INFO: Run 'ipmitool sunoem cli "show faulty"' to see the full error WARNING: Big Data Appliance warnings during hardware validation checks ... All other components verify as "healthy". 2. /root/BDA_REBOOT_WARNINGS on the server which will not reboot shows the same: fault.chassis.domain.boot.power- # ls /root/BDA_REBOOT_WARNINGS
WARNING: Hardware errors reported by ILOM : fault.chassis.domain.boot.power- INFO: Run 'ipmitool sunoem cli "show faulty"' to see the full error WARNING: Big Data Appliance warnings during hardware validation checks
3. On the server which can not reboot, logging into the ILOM and running "show faulty" reports a fault.chassis.domain.boot.power- fault. -> show faulty
Target | Property | Value --------------------+------------------------+--------------------------------- /SP/faultmgmt/0 | fru | /SYS/MB /SP/faultmgmt/0/ | class | fault.chassis.domain.boot.power- faults/0 | | off-unexpected
CauseThe error message on the ILOM console: Description: Power to server is not available due to a malfunctioning component detected by CPLD
and other symptoms can be indicative of a hardware issue. Solution1. Clear the fault on the SP, and attempt a power cycle of the node to see if the issue repeats. Follow the steps in:
2. If Clearing the alerts allows the node to reboot, then this is likely not a hardware issue and more likely an ILOM hiccup of some sort. In this case no other action is required. In the case of a hardware issue, the server likely would not restart and the fault would be raised again. 3. If the node will not restart after clearing the alerts, please open a SR with Oracle Support for further investigation. It is likely in this case that a "Full" ILOM snapshot will be required. You can collect a "Full" ILOM snapshot via, "How to run an ILOM Snapshot on a Sun/Oracle X86 System (Doc ID 1448069.1)" but replace Data Set "NORMAL" with Data Set "FULL", and upload that to the SR. To recap from 1448069.1, from the ILOM in a browser (<Node>-ilom): Selecting "OK" to collect the "Full" snapshot, might or might not cause the host to reboot. In the case of a reboot, when a non-critical server (node 5-18 on a full rack) is taken down the cluster will not report healthy due to the missing DataNode and TaskTracker but will be fully functional. If rebooting Name Node servers see: 1573109.1 - "Steps to Reboot Oracle Big Data Appliance High Availability Name Nodes Simultaneously", for detailed steps.
When investigating the "Full" ILOM snapshot note the below symptoms may be present: a) fma/@usr@local@bin@fmdump_-ev.out reports one voltage rail error like: <timestamp> ereport.chassis.device.cpld.voltage-rail-error@/SYS/MB/P0
HWdiag - Version 5.21.74388 (Built Jun 19 2012 at 15:39:12)
CPLD Version - 2.3 CPU 0 - Present CPU 1 - Present Normal operation, Host powered on Voltage Rail Address:Value Status Condition ------------------------------------------------------------------------ 1.5v Standby (0x50):0x0b ON OK 1.8v Standby (0x51):0x0b ON OK 1.26v Standby (0x52):0x0b ON OK 5v Standby (0x53):0x0b ON OK 4DBP 5v Power (0x55):0x0b ON OK SAS Expander Power (0x56):0x0b ON OK Rear IO Power (0x57):0x0b ON OK NICPWR_0_0 Standby (0x58):0x0b ON OK NICPWR_0_1 Standby (0x59):0x0b ON OK NICPWR_1_0 Standby (0x5a):0x0b ON OK NICPWR_1_1 Standby (0x5b):0x0b ON OK NICPWR_2_0 Standby (0x5c):0x0b ON OK NICPWR_2_1 Standby (0x5d):0x0b ON OK NICPWR_3_0 Standby (0x5e):0x0b ON OK NICPWR_3_1 Standby (0x5f):0x0b ON OK PSU0 (0x80):0x8f ON OK PSU1 (0x81):0x8f ON OK 3.3v HOST (0x83):0x0b ON OK 1.1v HOST (0x84):0x0b ON OK 5.0v HOST (0x85):0x0b ON OK 1.5v HOST (0x86):0x0b ON OK 3.3v PCI (0x87):0x0b ON OK VDDIO_0 HOST (0x88):0x0b ON OK VDDIO_1 HOST (0x89):0x0b ON OK VDDIO_2 HOST (0x8a):0x0b ON OK VDDIO_3 HOST (0x8b):0x0b ON OK VTT_0 HOST (0x8c):0x0b ON OK VTT_1 HOST (0x8d):0x0b ON OK VTT_2 HOST (0x8e):0x0b ON OK VTT_3 HOST (0x8f):0x0b ON OK VSA_0 HOST (0xb0):0x0b ON OK VSA_1 HOST (0xb1):0x0b ON OK VCCPLL_0 HOST (0xb2):0x0b ON OK VCCPLL_1 HOST (0xb3):0x0b ON OK VCORE_0 HOST (0xb4):0x0b ON OK VCORE_1 HOST (0xb5):0x0b ON OK CPUVTT_0 HOST (0xb6):0x0b ON OK CPUVTT_1 HOST (0xb7):0x0b ON OK
Attachments This solution has no attachment |
||||||||||||||||
|