Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1569754.1
Update Date:2016-10-11
Keywords:

Solution Type  Problem Resolution Sure

Solution  1569754.1 :   Oracle Big Data Appliance Node NOT reachable due to both OS Disks in Unconfigured Firmware State  


Related Items
  • Big Data Appliance Hardware
  •  
Related Categories
  • PLA-Support>Eng Systems>BDA>Big Data Appliance>DB: BDA_EST
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-7462452741>

Applies to:

Big Data Appliance Hardware - Version All Versions and later
Linux x86-64

Symptoms

Cloudera Manager(CM) shows that one of the Oracle Big Data Appliance(BDA) nodes is in a "BAD" state. Cannot ssh or ping that node from any of the working BDA nodes or any Client Machine.  Note that the node cannot be reached via ssh or ping using any of its 3 IP's (public, private and admin).

Further investigation of the "BAD" node  from the Remote Console via the ILOM GUI shows that  the /dev/sda disk is bad and is being restored from the mirror.  It is also the case that from the Remote Console that the node got rebooted automatically.

Once the node is rebooted:

1. Logging in as root(user set password) or as an existing OS user does not work.

2. Logging into the "bad" node via the ILOM remote console with root/welcome1 does work however.

Cause

The system is being booted from a rescue partition.  Below commands are executed in executed in ILOM Remote Console.
 
a) 'ls /' command shows the boot is from a USB disk('THIS_IS_THE_USB_DISK')

Boot from USB


b) Both OS disks are in a "bad" state i.e. they have a Foreign State of Foreign and Firmware state of Unconfigured.

>MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign"
Slot Number: 0
Firmware state: Unconfigured(bad)
Foreign State: Foreign
Slot Number: 1
Firmware state: Unconfigured(good) , Spun Up
Foreign State: Foreign

Solution

1) Determine enclosure id

> MegaCli64 pdlist a0 | grep "^Encl" 


For example, assuming the enclosure id is 20, the output looks like:

Enclosure Device ID: 20
Enclosure position: 0
...

 

Follow below steps to configure the disks  in order from the lowest to the highest slot number.

2) For Disk0  follow below steps
configure the disk

a) As Disk0 has Firmware state of Unconfigured(bad) , execute below command to change the state to good.

---After this command is run the Firmware state becomes Unconfigured(good)

> MegaCli64 pdmakegood physdrv[20:0] a0

If above command fails then the disk in slot0 needs to be replaced. But first need to configure disk in slot1(follow step b to clear foreign state and jump to step3) , reboot and see if the system comes backup.

b) Clear the Foreign State of Foreign

-- This command will clear the Foreign state of all the disks that are Unconfigured(good), so in this case Foreign State is cleared for both disk 0 and 1.
> MegaCli64 CfgForeign clear a0

c) Check the physical drive information and see if the disks are in a Firmware state of Unconfigured(good)  and Foreign State of None.

> MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign"
Slot Number: 0
Firmware state: Unconfigured(good)
Foreign State: None
Slot Number: 1
Firmware state: Unconfigured(good) , Spun Up
Foreign State: None

d) Add disk 0.

MegaCli64 CfgLdAdd r0[20:0] a0

 

Note:

If CfgLdAdd command fails because of cached data present, use MegaCli -DiscardPreservedCache -Ln a0 to clear the cache for the logical drive where n is the number of the slot used. For the above example it would be:

       MegaCli64 -DiscardPreservedCache -L0 a0

Do NOT use CfgEachDskRaid0, this command reverses the slot and virtual disk numbers.

 

3)  Add disk 1. For Disk1 execute below command to add the disk.

MegaCli64 CfgLdAdd r0[20:1] a0

 
4) Check if the disks are ONLINE.

# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign"
Slot Number: 0
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 1
Firmware state: Online, Spun Up
Foreign State: None

 

5) If at least one of the OS disks is Online then reboot the node. After the reboot the node should be accessible using ssh.

6) If Disk 0 was determined to be bad in Step 2a then replace the disk and follow steps in 1514231.1 and  1532128.1 to configure the disk.

ID 1514231.1 Steps for Replacing a Disk Drive and Determining its Function on the Oracle Big Data Appliance V2

ID 1532128.1 How to Configure a Server Disk After Disk Replacement as an Operating System Disk for /u01 and /dev/sda on Oracle Big Data Appliance V2.x

 


 

References

<NOTE:1514231.1> - Steps for Replacing a Disk Drive and Determining its Function on the Oracle Big Data Appliance V2.0.1/2.1.0/2.1.1/2.1.2
<NOTE:1569762.1> - After OS Disk Replacement on Oracle Big Data Appliance bdachecksw/bdacheckhw Commands Fail with 'Wrong slot mapping to HBA target' Error
<NOTE:1515041.1> - How to Configure a Server Disk After Disk Replacement as an Operating System Disk /u02 and /dev/sdb on Oracle Big Data Appliance V2.0.1/2.1.0/2.1.1/2.1.2
<NOTE:1532128.1> - How to Configure a Server Disk After Disk Replacement as an Operating System Disk for /u01 and /dev/sda on Oracle Big Data ApplianceV2.0.1/2.1.0/2.1.1/2.1.2

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback