Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2227128.1
Update Date:2017-01-27
Keywords:

Solution Type  Problem Resolution Sure

Solution  2227128.1 :   After HDFS Disk Replacement with Multiple Disk Failures bdacheckhw Raises "Wrong slot mapping to HBA target" Errors for Slots After the Replaced Disk and "ERROR: Wrong number of virtual disks : 13"  


Related Items
  • Big Data Appliance X4-2 Hardware
  •  
Related Categories
  • PLA-Support>Eng Systems>BDA>Big Data Appliance>DB: BDA_EST
  •  




In this Document
Symptoms
Cause
Solution
 An overview of the steps for hdfs disks
 Detailed steps for hdfs disks
 Unmount the disks which are mapped incorrectly
 Address the incorrect slot mappings by removing the incorrectly mapped virtual drives in the order highest to lowest
 Re-add the removed virtual drives in the order lowest to highest
 Reboot when all the disks are online spun-up
References


Created from <SR 3-14075044781>

Applies to:

Big Data Appliance X4-2 Hardware - Version All Versions and later
Linux x86-64

Symptoms

The general scenario leading to this condition is that more than one hdfs disk reports a non-healthy state and the corresponding non-healthy disks have been replaced.

The example scenario used in this note is that the disk is slot 2 was in a "bad"/"failing" state and the disk in slot 3 was "Unconfigured(good) as per the "MegaCli64 pdlist a0" output below:

Slot Number: 2
...
Firmware state: Unconfigured(bad)
...
Foreign State: Foreign

and

Slot Number: 3
...
Firmware state: Unconfigured(good), Spun Up
...
Foreign State: Foreign

In the example here, after replacing the disk in "Slot Number: 2" the disks were configured in the order "lowest" to "highest" i.e. "Slot Number: 2" then "Slot Number: 3" as per: Steps for Replacing a Disk Drive and Determining its Function on the Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x (Doc ID 1581331.1).  The final steps of the process are to run bdacheckhw and bdachecksw.  However in the case here, both fail. 

bdacheckhw fails with two main issues:

1. 13 Virtual Disks are reported.
and
2. Disks in slots 3-11, the slots after the one where the disk was replaced, are not mapped correctly.

bdacheckhw output looks like:

SUCCESS: Correct disk 0 status : Online, Spun Up No alert
SUCCESS: Correct disk 1 status : Online, Spun Up No alert
SUCCESS: Correct disk 2 status : Online, Spun Up No alert
SUCCESS: Correct disk 3 status : Online, Spun Up No alert
SUCCESS: Correct disk 4 status : Online, Spun Up No alert
SUCCESS: Correct disk 5 status : Online, Spun Up No alert
SUCCESS: Correct disk 6 status : Online, Spun Up No alert
SUCCESS: Correct disk 7 status : Online, Spun Up No alert
SUCCESS: Correct disk 8 status : Online, Spun Up No alert
SUCCESS: Correct disk 9 status : Online, Spun Up No alert
SUCCESS: Correct disk 10 status : Online, Spun Up No alert
SUCCESS: Correct disk 11 status : Online, Spun Up No alert
...
ERROR: Wrong number of virtual disks : 13
INFO: Expected number of virtual disks : 12
SUCCESS: Correct slot 0 mapping to HBA target : 0
SUCCESS: Correct slot 1 mapping to HBA target : 1
SUCCESS: Correct slot 2 mapping to HBA target : 2
ERROR: Wrong slot 4 mapping to HBA target : 3
INFO: Expected slot 4 mapping to HBA target : 4
ERROR: Wrong slot 5 mapping to HBA target : 4
INFO: Expected slot 5 mapping to HBA target : 5
ERROR: Wrong slot 6 mapping to HBA target : 5
INFO: Expected slot 6 mapping to HBA target : 6
ERROR: Wrong slot 7 mapping to HBA target : 6
INFO: Expected slot 7 mapping to HBA target : 7
ERROR: Wrong slot 8 mapping to HBA target : 7
INFO: Expected slot 8 mapping to HBA target : 8
ERROR: Wrong slot 9 mapping to HBA target : 8
INFO: Expected slot 9 mapping to HBA target : 9
ERROR: Wrong slot 10 mapping to HBA target : 9
INFO: Expected slot 10 mapping to HBA target : 10
ERROR: Wrong slot 11 mapping to HBA target : 10
INFO: Expected slot 11 mapping to HBA target : 11
ERROR: Wrong slot 3 mapping to HBA target : 11
INFO: Expected slot 3 mapping to HBA target : 3
SUCCESS: Correct Host Channel Adapter model : Mellanox Technologies MT27500 Family [ConnectX-3]

Other symptoms include:

1. From lsscsi: 

# lsscsi
  
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm
[0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdc
[0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sdd
[0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sde
[0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdf
[0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdg
[0:2:8:0] disk LSI MR9261-8i 2.13 /dev/sdh
[0:2:9:0] disk LSI MR9261-8i 2.13 /dev/sdi
[0:2:10:0] disk LSI MR9261-8i 2.13 /dev/sdj
[0:2:11:0] disk LSI MR9261-8i 2.13 /dev/sdk
[0:2:12:0] disk LSI MR9261-8i 2.13 /dev/sdn
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl

2. From "mount -l": 

# mount -l
  
/dev/md2 on / type ext4 (rw,noatime)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
tmpfs on /dev/shm type tmpfs (rw)
/dev/md0 on /boot type ext4 (rw)
/dev/sda4 on /u01 type ext4 (rw,nodev,noatime) [/u01]
/dev/sdb4 on /u02 type ext4 (rw,nodev,noatime) [/u02]
/dev/sdd1 on /u05 type ext4 (rw,nodev,noatime) [/u05]
/dev/sde1 on /u06 type ext4 (rw,nodev,noatime) [/u06]
/dev/sdf1 on /u07 type ext4 (rw,nodev,noatime) [/u07]
/dev/sdg1 on /u08 type ext4 (rw,nodev,noatime) [/u08]
/dev/sdh1 on /u09 type ext4 (rw,nodev,noatime) [/u09]
/dev/sdi1 on /u10 type ext4 (rw,nodev,noatime) [/u10]
/dev/sdj1 on /u11 type ext4 (rw,nodev,noatime) [/u11]
/dev/sdk1 on /u12 type ext4 (rw,nodev,noatime) [/u12]

  

Cause

The cause is not fully known. There may have been a problem in the disk configuration steps taken. In the case here, when replacing the disk in slot 2 there was a kernel panic and reboot which might have led to reordering the drives post slot 2.

Solution


The resolution is based on After OS Disk Replacement on Oracle Big Data Appliance bdachecksw/bdacheckhw Commands Fail with 'Wrong slot mapping to HBA target' Error (Doc ID 1569762.1).

Perform all steps on the node with the "bad"/"failing" disks as 'root' user unless otherwise specified.

Note: This MOS document is only to be used for hdfs disks.  If OS disks i.e. either of the first two disks (in slot 0 or slot 1) are affected by being out of order, it is necessary to follow the steps in: After OS Disk Replacement on Oracle Big Data Appliance bdachecksw/bdacheckhw Commands Fail with 'Wrong slot mapping to HBA target' Error (Doc ID 1569762.1).  For OS disks is it necessary to reboot into the rescue image before proceeding with the steps presented in this document since the OS disks (first 2 disks in slot 0 and slot 1) can not be unmounted.

An overview of the steps for hdfs disks

1. Unmount the disks which are mapped incorrectly.
2. Address the 13 virtual drives i.e. "ERROR: Wrong number of virtual disks : 13" by removing virtual drive 12.
3. Address the incorrect slot mappings by removing the incorrectly mapped virtual drives in the order highest to lowest.
4. Re-add the removed virtual drives in the order lowest to highest.
5. Reboot when all the disks are online spun-up.

Detailed steps for hdfs disks

Unmount the disks which are mapped incorrectly

In the case here:

# umount /u04
# umount /u05
# umount /u06
# umount /u07
# umount /u08
# umount /u09
# umount /u10
# umount /u11
# umount /u12

  

Note: umount may not work if any processes or Cloudera Manager(CM) roles are accessing the data directories. 

To stop the processes or Cloudera Manager roles accessing the data directories you can do one of the following:

1. You can remove the mounts from the DataNode Data Directory dfs.data.dir, dfs.datanode.data.dir field in Cloudera Manager.  See: Steps for Replacing a Disk Drive and Determining its Function on the Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x (Doc ID 1581331.1), Step "7. Complete these steps in Cloudera Manager:" in the Prerequisites for Replacing a Working / Failing Disk section. After this umount should work.

or

2. In Cloudera Manager stop the roles on the node with the bad disks like the DataNode and NodeManager roles and then umount should work.

For example:

a) Find the roles on the node in CM:

Navigate: Hosts > <host> > processes > check the process e.g. find for example:

Region Server
Impala Daemon
NodeManager
DataNode

b) Stop the roles by navigating:

hbase > Instances > <RegionServer for the host> > Actions for Selected > Stop
impala > Instances > <Impala Daemon for the host> > Actions for Selected > Stop
yarn > Instances > <Node Manager for the host> > Actions for Selected > Stop
hdfs > Instances > <DataNode for the host> > Actions for Selected > Stop

iii. Once on of the above are completed try to umount the disks again.  This time umount should be successful.

Address the incorrect slot mappings by removing the incorrectly mapped virtual drives in the order highest to lowest

This step also includes the step to address the 13 virtual drives i.e. "ERROR: Wrong number of virtual disks : 13" by removing virtual drive 12.

1. There are a few things to note:

a) It should not be necessary to take the disks offline.  Taking disks offline changes the physical state of the physical disk and should not be necessary to change the ordering of the logical disks which are the ones that are in the wrong order.  Hence when trying to fix the ordering, the CfgLdDel / CfgLdAdd commands should be the only ones necessary and PDOffline commands should not be necessary. If PDOffline commands are made, then it will be necessary to call corresponding PDOnline command before doing the remounts.

b) After removing the incorrectly mapped virtual drive verify with:

i. "lsscsi" to confirm the virtual drive is removed.

ii.  "MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign"" to verify that the "Firmware state" changes from "Online, Spun Up" to "Unconfigured(good), Spun Up" as each virtual drive is removed.

c)  It may be necessary to use the "-force option when removing a virtual drive. For details see: Removing a Virtual Drive for a Non-OS Drive with "MegaCli64 CfgLdDel Ln a0" Raises "Virtual Disk n is an OS drive - cannot be deleted without force option." (Doc ID 2227244.1).  If removing any hdfs virtual drive raises "Virtual Disk <x> is an OS drive" consult (Doc ID 2227244.1).  From that note, you must verify the virtual drive being deleted is not an OS drive. Once fully confirmed that the virtual drive is not an OS drive, use "-force" as per the referenced document.

d) In the steps below first try removing the virtual drive without "-force".  If an error is raised, refer to (Doc ID 2227244.1).

2. First remove virtual drive 12, to remove the extra virtual drive listed in bdacheckhw.

# MegaCli64 CfgLdDel L12 a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output when -force is required:

# MegaCli64 CfgLdDel L12 -force a0
Adapter 0: Deleted Virtual Drive-12(target id-12)

Exit Code: 0x00

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm
[0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdc
[0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sdd
[0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sde
[0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdf
[0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdg
[0:2:8:0] disk LSI MR9261-8i 2.13 /dev/sdh
[0:2:9:0] disk LSI MR9261-8i 2.13 /dev/sdi
[0:2:10:0] disk LSI MR9261-8i 2.13 /dev/sdj
[0:2:11:0] disk LSI MR9261-8i 2.13 /dev/sdk
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl  

3.  Remove virtual drive 11 to address the slot mapping error:

# MegaCli64 CfgLdDel L11 -force a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output when -force is required:

# MegaCli64 CfgLdDel L11 -force a0
Adapter 0: Deleted Virtual Drive-11(target id-11)

Exit Code: 0x00

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm
[0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdc
[0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sdd
[0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sde
[0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdf
[0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdg
[0:2:8:0] disk LSI MR9261-8i 2.13 /dev/sdh
[0:2:9:0] disk LSI MR9261-8i 2.13 /dev/sdi
[0:2:10:0] disk LSI MR9261-8i 2.13 /dev/sdj
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl

4. Remove virtual drive 10 to address the slot mapping error:

# MegaCli64 CfgLdDel L10 -force a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output when -force is required:

# MegaCli64 CfgLdDel L10 -force a0
Adapter 0: Deleted Virtual Drive-10(target id-10)

Exit Code: 0x00

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm
[0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdc
[0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sdd
[0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sde
[0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdf
[0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdg
[0:2:8:0] disk LSI MR9261-8i 2.13 /dev/sdh
[0:2:9:0] disk LSI MR9261-8i 2.13 /dev/sdi
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl

5. Remove virtual drive 9 to address the slot mapping error:

# MegaCli64 CfgLdDel L9 -force a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output when -force is required:

# MegaCli64 CfgLdDel L9 -force a0
Adapter 0: Deleted Virtual Drive-9(target id-9)

Exit Code: 0x00

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm
[0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdc
[0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sdd
[0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sde
[0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdf
[0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdg
[0:2:8:0] disk LSI MR9261-8i 2.13 /dev/sdh
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl  

6. Remove virtual drive 8 to address the slot mapping error:

# MegaCli64 CfgLdDel L8 -force a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output when -force is required:

# MegaCli64 CfgLdDel L8 -force a0
Adapter 0: Deleted Virtual Drive-8(target id-8)

Exit Code: 0x00

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm
[0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sdd
[0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sde
[0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdf
[0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdg
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl

7. Remove virtual drive 7 to address the slot mapping error:

# MegaCli64 CfgLdDel L7 -force a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output when -force is required:

# MegaCli64 CfgLdDel L7 -force a0
Adapter 0: Deleted Virtual Drive-7(target id-7)

Exit Code: 0x00

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm
[0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sdd
[0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sde
[0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdf
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl

8. Remove virtual drive 6 to address the slot mapping error:

# MegaCli64 CfgLdDel L6 -force a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output when -force is required:

# MegaCli64 CfgLdDel L6 -force a0
Adapter 0: Deleted Virtual Drive-6(target id-6)

Exit Code: 0x00

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm
[0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sdd
[0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sde
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl

9. Remove virtual drive 5 to address the slot mapping error:

# MegaCli64 CfgLdDel L5 -force a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output when -force is required:

# MegaCli64 CfgLdDel L5 -force a0
Adapter 0: Deleted Virtual Drive-5(target id-5)

Exit Code: 0x00

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm
[0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sdd
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl

10.  Remove virtual drive 4 to address the slot mapping error:

# MegaCli64 CfgLdDel L4 a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output when -force is required:

# MegaCli64 CfgLdDel L4 -force a0
Adapter 0: Deleted Virtual Drive-4(target id-4)

Exit Code: 0x00

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl

11. Based on the above lsscsi output there is no need to remove virtual drive 3, as it is not present.

12. In the case here, it is necessary to remove virtual drive 2 as it is incorrectly mapped above.

# MegaCli64 CfgLdDel L2 a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output when -force is required:

# MegaCli64 CfgLdDel L2 -force a0
Adapter 0: Deleted Virtual Drive-2(target id-2)

Exit Code: 0x00

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl  

13.  When finished, Verify the OS disks in slot 0, 1 are "Online, Spun Up" and that the hdfs disks in slot 3-11 are "Unconfigured(good), Spun Up". 

# MegaCli64 pdlist a0 | egrep "^Firm|^Foreign|^Slot"
  
Slot Number: 0
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 1
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 2
Firmware state: Unconfigured(good), Spun Up
Foreign State: None
Slot Number: 3
Firmware state: Unconfigured(good), Spun Up
Foreign State: None
Slot Number: 4
Firmware state: Unconfigured(good), Spun Up
Foreign State: None
Slot Number: 5
Firmware state: Unconfigured(good), Spun Up
Foreign State: None
Slot Number: 6
Firmware state: Unconfigured(good), Spun Up
Foreign State: None
Slot Number: 7
Firmware state: Unconfigured(good), Spun Up
Foreign State: None
Slot Number: 8
Firmware state: Unconfigured(good), Spun Up
Foreign State: None
Slot Number: 9
Firmware state: Unconfigured(good), Spun Up
Foreign State: None
Slot Number: 10
Firmware state: Unconfigured(good), Spun Up
Foreign State: None
Slot Number: 11
Firmware state: Unconfigured(good), Spun Up
Foreign State: None

  

Note that prior to this point, " MegaCli64 pdlist a0 | egrep "^Firm|^Foreign|^Slot"" will return the disks that are still "Online, Spun Up". As each virtual drive is removed the "Firmware state" will change to: "Unconfigured(good), Spun Up".

Re-add the removed virtual drives in the order lowest to highest

Confirm with: "MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign"" that the "Firmware state" changes from "Unconfigured(good), Spun Up" to "Online, Spun Up" as the virtual drives are readded.

1. Recreate drive 2:

# MegaCli64 CfgLdAdd r0[20:2] a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output:

# MegaCli64 CfgLdAdd r0[20:2] a0
Adapter 0: Created VD 2

Adapter 0: Configured the Adapter!!

Exit Code: 0x00

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl

2. Recreate drive 3:

# MegaCli64 CfgLdAdd r0[20:3] a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output:

# MegaCli64 CfgLdAdd r0[20:3] a0
Adapter 0: Created VD 3

Adapter 0: Configured the Adapter!!

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc
[0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdd
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl

3. Recreate drive 4:

# MegaCli64 CfgLdAdd r0[20:4] a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output:

# MegaCli64 CfgLdAdd r0[20:4] a0
Adapter 0: Created VD 4

Adapter 0: Configured the Adapter!!

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc
[0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdd
[0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sde
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl  

4. Recreate drive 5:

# MegaCli64 CfgLdAdd r0[20:5] a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output:

# MegaCli64 CfgLdAdd r0[20:5] a0
Adapter 0: Created VD 5

Adapter 0: Configured the Adapter!!

Exit Code: 0x00

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc
[0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdd
[0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sde
[0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sdf
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl

5. Recreate drive 6:

# MegaCli64 CfgLdAdd r0[20:6] a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output:

# MegaCli64 CfgLdAdd r0[20:6] a0
Adapter 0: Created VD 6

Adapter 0: Configured the Adapter!!

Exit Code: 0x00

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc
[0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdd
[0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sde
[0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sdf
[0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdg
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl

6. Recreate drive 7:

# MegaCli64 CfgLdAdd r0[20:7] a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output:

# MegaCli64 CfgLdAdd r0[20:7] a0
Adapter 0: Created VD 7

Adapter 0: Configured the Adapter!!

Exit Code: 0x00

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc
[0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdd
[0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sde
[0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sdf
[0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdg
[0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdh
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl

7. Recreate drive 8:

# MegaCli64 CfgLdAdd r0[20:8] a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output:

# MegaCli64 CfgLdAdd r0[20:8] a0
Adapter 0: Created VD 8

Adapter 0: Configured the Adapter!!

Exit Code: 0x00

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc
[0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdd
[0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sde
[0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sdf
[0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdg
[0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdh
[0:2:8:0] disk LSI MR9261-8i 2.13 /dev/sdi
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl

8. Recreate drive 9:

# MegaCli64 CfgLdAdd r0[20:9] a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output:

# MegaCli64 CfgLdAdd r0[20:9] a0
Adapter 0: Created VD 9

Adapter 0: Configured the Adapter!!

Exit Code: 0x00

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc
[0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdd
[0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sde
[0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sdf
[0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdg
[0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdh
[0:2:8:0] disk LSI MR9261-8i 2.13 /dev/sdi
[0:2:9:0] disk LSI MR9261-8i 2.13 /dev/sdj
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl

9. Recreate drive 10:

# MegaCli64 CfgLdAdd r0[20:10] a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output:

# MegaCli64 CfgLdAdd r0[20:10] a0
Adapter 0: Created VD 10

Adapter 0: Configured the Adapter!!

Exit Code: 0x00

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc
[0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdd
[0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sde
[0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sdf
[0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdg
[0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdh
[0:2:8:0] disk LSI MR9261-8i 2.13 /dev/sdi
[0:2:9:0] disk LSI MR9261-8i 2.13 /dev/sdj
[0:2:10:0] disk LSI MR9261-8i 2.13 /dev/sdk
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl  

10. Recreate drive 11:

# MegaCli64 CfgLdAdd r0[20:11] a0
# lsscsi
# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" 

Output:

# MegaCli64 CfgLdAdd r0[20:11] a0
Adapter 0: Created VD 11

Adapter 0: Configured the Adapter!!

Exit Code: 0x00

# lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -
[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda
[0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb
[0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc
[0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdd
[0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sde
[0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sdf
[0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdg
[0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdh
[0:2:8:0] disk LSI MR9261-8i 2.13 /dev/sdi
[0:2:9:0] disk LSI MR9261-8i 2.13 /dev/sdj
[0:2:10:0] disk LSI MR9261-8i 2.13 /dev/sdk
[0:2:11:0] disk LSI MR9261-8i 2.13 /dev/sdm
[7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl  

11. Verify all disks now have "Firmware state"  "Online, Spun Up": 

# MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign"
  
Slot Number: 0
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 1
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 2
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 3
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 4
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 5
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 6
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 7
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 8
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 9
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 10
Firmware state: Online, Spun Up
Foreign State: None
Slot Number: 11
Firmware state: Online, Spun Up
Foreign State: None 
Reboot when all the disks are online spun-up

After reboot all mounts should be in place.

1. bdacheckhw should be fully healthy and report:

SUCCESS: No hardware errors reported by ILOM   

2. bdachecksw should be fully healthy and report:

SUCCESS: Correct bda-monitor status : bda monitor is running
SUCCESS: Big Data Appliance software validation checks succeeded 

3. Start the roles back depending on how you stopped them.

a) In CM re-add back in any mount points that were removed from the DataNode Data Directory dfs.data.dir, dfs.datanode.data.dir field.

or

b) Restore and roles in CM that were stopped:

i. Find the roles:

Hosts > <host> > processes > check the process e.g.

RegionServer
Impala Daemon
NodeManager
DataNode

ii. Start the roles:

hdfs > Instances > <DataNode for the host> > Actions for Selected > Start
yarn > Instances > <Node Manager for the host> > Actions for Selected > Start
impala > Instances > <Impala Daemon for the host> > Actions for Selected > Start
hbase > Instances > <RegionServer for the host> > Actions for Selected > Start

4. Resume to configuring the drive that was replaced as per: How to Configure a Server Disk After Replacement as an HDFS Disk or Oracle NoSQL Database
Disk on Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x (Doc ID 1581583.1).


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback