![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 2227128.1 : After HDFS Disk Replacement with Multiple Disk Failures bdacheckhw Raises "Wrong slot mapping to HBA target" Errors for Slots After the Replaced Disk and "ERROR: Wrong number of virtual disks : 13"
In this Document
Created from <SR 3-14075044781> Applies to:Big Data Appliance X4-2 Hardware - Version All Versions and laterLinux x86-64 SymptomsThe general scenario leading to this condition is that more than one hdfs disk reports a non-healthy state and the corresponding non-healthy disks have been replaced. The example scenario used in this note is that the disk is slot 2 was in a "bad"/"failing" state and the disk in slot 3 was "Unconfigured(good) as per the "MegaCli64 pdlist a0" output below: Slot Number: 2
... Firmware state: Unconfigured(bad) ... Foreign State: Foreign and Slot Number: 3
... Firmware state: Unconfigured(good), Spun Up ... Foreign State: Foreign In the example here, after replacing the disk in "Slot Number: 2" the disks were configured in the order "lowest" to "highest" i.e. "Slot Number: 2" then "Slot Number: 3" as per: Steps for Replacing a Disk Drive and Determining its Function on the Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x (Doc ID 1581331.1). The final steps of the process are to run bdacheckhw and bdachecksw. However in the case here, both fail. bdacheckhw fails with two main issues: 1. 13 Virtual Disks are reported. bdacheckhw output looks like: SUCCESS: Correct disk 0 status : Online, Spun Up No alert
SUCCESS: Correct disk 1 status : Online, Spun Up No alert SUCCESS: Correct disk 2 status : Online, Spun Up No alert SUCCESS: Correct disk 3 status : Online, Spun Up No alert SUCCESS: Correct disk 4 status : Online, Spun Up No alert SUCCESS: Correct disk 5 status : Online, Spun Up No alert SUCCESS: Correct disk 6 status : Online, Spun Up No alert SUCCESS: Correct disk 7 status : Online, Spun Up No alert SUCCESS: Correct disk 8 status : Online, Spun Up No alert SUCCESS: Correct disk 9 status : Online, Spun Up No alert SUCCESS: Correct disk 10 status : Online, Spun Up No alert SUCCESS: Correct disk 11 status : Online, Spun Up No alert ... ERROR: Wrong number of virtual disks : 13 INFO: Expected number of virtual disks : 12 SUCCESS: Correct slot 0 mapping to HBA target : 0 SUCCESS: Correct slot 1 mapping to HBA target : 1 SUCCESS: Correct slot 2 mapping to HBA target : 2 ERROR: Wrong slot 4 mapping to HBA target : 3 INFO: Expected slot 4 mapping to HBA target : 4 ERROR: Wrong slot 5 mapping to HBA target : 4 INFO: Expected slot 5 mapping to HBA target : 5 ERROR: Wrong slot 6 mapping to HBA target : 5 INFO: Expected slot 6 mapping to HBA target : 6 ERROR: Wrong slot 7 mapping to HBA target : 6 INFO: Expected slot 7 mapping to HBA target : 7 ERROR: Wrong slot 8 mapping to HBA target : 7 INFO: Expected slot 8 mapping to HBA target : 8 ERROR: Wrong slot 9 mapping to HBA target : 8 INFO: Expected slot 9 mapping to HBA target : 9 ERROR: Wrong slot 10 mapping to HBA target : 9 INFO: Expected slot 10 mapping to HBA target : 10 ERROR: Wrong slot 11 mapping to HBA target : 10 INFO: Expected slot 11 mapping to HBA target : 11 ERROR: Wrong slot 3 mapping to HBA target : 11 INFO: Expected slot 3 mapping to HBA target : 3 SUCCESS: Correct Host Channel Adapter model : Mellanox Technologies MT27500 Family [ConnectX-3] Other symptoms include: 1. From lsscsi: # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 - [0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm [0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdc [0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sdd [0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sde [0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdf [0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdg [0:2:8:0] disk LSI MR9261-8i 2.13 /dev/sdh [0:2:9:0] disk LSI MR9261-8i 2.13 /dev/sdi [0:2:10:0] disk LSI MR9261-8i 2.13 /dev/sdj [0:2:11:0] disk LSI MR9261-8i 2.13 /dev/sdk [0:2:12:0] disk LSI MR9261-8i 2.13 /dev/sdn [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 2. From "mount -l": # mount -l
/dev/md2 on / type ext4 (rw,noatime) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) /dev/md0 on /boot type ext4 (rw) /dev/sda4 on /u01 type ext4 (rw,nodev,noatime) [/u01] /dev/sdb4 on /u02 type ext4 (rw,nodev,noatime) [/u02] /dev/sdd1 on /u05 type ext4 (rw,nodev,noatime) [/u05] /dev/sde1 on /u06 type ext4 (rw,nodev,noatime) [/u06] /dev/sdf1 on /u07 type ext4 (rw,nodev,noatime) [/u07] /dev/sdg1 on /u08 type ext4 (rw,nodev,noatime) [/u08] /dev/sdh1 on /u09 type ext4 (rw,nodev,noatime) [/u09] /dev/sdi1 on /u10 type ext4 (rw,nodev,noatime) [/u10] /dev/sdj1 on /u11 type ext4 (rw,nodev,noatime) [/u11] /dev/sdk1 on /u12 type ext4 (rw,nodev,noatime) [/u12]
CauseThe cause is not fully known. There may have been a problem in the disk configuration steps taken. In the case here, when replacing the disk in slot 2 there was a kernel panic and reboot which might have led to reordering the drives post slot 2. Solution
Perform all steps on the node with the "bad"/"failing" disks as 'root' user unless otherwise specified. Note: This MOS document is only to be used for hdfs disks. If OS disks i.e. either of the first two disks (in slot 0 or slot 1) are affected by being out of order, it is necessary to follow the steps in: After OS Disk Replacement on Oracle Big Data Appliance bdachecksw/bdacheckhw Commands Fail with 'Wrong slot mapping to HBA target' Error (Doc ID 1569762.1). For OS disks is it necessary to reboot into the rescue image before proceeding with the steps presented in this document since the OS disks (first 2 disks in slot 0 and slot 1) can not be unmounted. An overview of the steps for hdfs disks1. Unmount the disks which are mapped incorrectly. Detailed steps for hdfs disksUnmount the disks which are mapped incorrectlyIn the case here: # umount /u04
# umount /u05 # umount /u06 # umount /u07 # umount /u08 # umount /u09 # umount /u10 # umount /u11 # umount /u12
Note: umount may not work if any processes or Cloudera Manager(CM) roles are accessing the data directories.
To stop the processes or Cloudera Manager roles accessing the data directories you can do one of the following: 1. You can remove the mounts from the DataNode Data Directory dfs.data.dir, dfs.datanode.data.dir field in Cloudera Manager. See: Steps for Replacing a Disk Drive and Determining its Function on the Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x (Doc ID 1581331.1), Step "7. Complete these steps in Cloudera Manager:" in the Prerequisites for Replacing a Working / Failing Disk section. After this umount should work. or 2. In Cloudera Manager stop the roles on the node with the bad disks like the DataNode and NodeManager roles and then umount should work. For example: a) Find the roles on the node in CM: Navigate: Hosts > <host> > processes > check the process e.g. find for example: Region Server b) Stop the roles by navigating: hbase > Instances > <RegionServer for the host> > Actions for Selected > Stop iii. Once on of the above are completed try to umount the disks again. This time umount should be successful. Address the incorrect slot mappings by removing the incorrectly mapped virtual drives in the order highest to lowestThis step also includes the step to address the 13 virtual drives i.e. "ERROR: Wrong number of virtual disks : 13" by removing virtual drive 12. 1. There are a few things to note: a) It should not be necessary to take the disks offline. Taking disks offline changes the physical state of the physical disk and should not be necessary to change the ordering of the logical disks which are the ones that are in the wrong order. Hence when trying to fix the ordering, the CfgLdDel / CfgLdAdd commands should be the only ones necessary and PDOffline commands should not be necessary. If PDOffline commands are made, then it will be necessary to call corresponding PDOnline command before doing the remounts. b) After removing the incorrectly mapped virtual drive verify with: i. "lsscsi" to confirm the virtual drive is removed. ii. "MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign"" to verify that the "Firmware state" changes from "Online, Spun Up" to "Unconfigured(good), Spun Up" as each virtual drive is removed. c) It may be necessary to use the "-force option when removing a virtual drive. For details see: Removing a Virtual Drive for a Non-OS Drive with "MegaCli64 CfgLdDel Ln a0" Raises "Virtual Disk n is an OS drive - cannot be deleted without force option." (Doc ID 2227244.1). If removing any hdfs virtual drive raises "Virtual Disk <x> is an OS drive" consult (Doc ID 2227244.1). From that note, you must verify the virtual drive being deleted is not an OS drive. Once fully confirmed that the virtual drive is not an OS drive, use "-force" as per the referenced document. d) In the steps below first try removing the virtual drive without "-force". If an error is raised, refer to (Doc ID 2227244.1). 2. First remove virtual drive 12, to remove the extra virtual drive listed in bdacheckhw. # MegaCli64 CfgLdDel L12 a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output when -force is required: # MegaCli64 CfgLdDel L12 -force a0
Adapter 0: Deleted Virtual Drive-12(target id-12)
Exit Code: 0x00 # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm [0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdc [0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sdd [0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sde [0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdf [0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdg [0:2:8:0] disk LSI MR9261-8i 2.13 /dev/sdh [0:2:9:0] disk LSI MR9261-8i 2.13 /dev/sdi [0:2:10:0] disk LSI MR9261-8i 2.13 /dev/sdj [0:2:11:0] disk LSI MR9261-8i 2.13 /dev/sdk [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 3. Remove virtual drive 11 to address the slot mapping error: # MegaCli64 CfgLdDel L11 -force a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output when -force is required: # MegaCli64 CfgLdDel L11 -force a0
Adapter 0: Deleted Virtual Drive-11(target id-11)
Exit Code: 0x00 # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm [0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdc [0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sdd [0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sde [0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdf [0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdg [0:2:8:0] disk LSI MR9261-8i 2.13 /dev/sdh [0:2:9:0] disk LSI MR9261-8i 2.13 /dev/sdi [0:2:10:0] disk LSI MR9261-8i 2.13 /dev/sdj [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 4. Remove virtual drive 10 to address the slot mapping error: # MegaCli64 CfgLdDel L10 -force a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output when -force is required: # MegaCli64 CfgLdDel L10 -force a0
Adapter 0: Deleted Virtual Drive-10(target id-10)
Exit Code: 0x00 # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm [0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdc [0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sdd [0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sde [0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdf [0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdg [0:2:8:0] disk LSI MR9261-8i 2.13 /dev/sdh [0:2:9:0] disk LSI MR9261-8i 2.13 /dev/sdi [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 5. Remove virtual drive 9 to address the slot mapping error: # MegaCli64 CfgLdDel L9 -force a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output when -force is required: # MegaCli64 CfgLdDel L9 -force a0
Adapter 0: Deleted Virtual Drive-9(target id-9)
Exit Code: 0x00 # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm [0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdc [0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sdd [0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sde [0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdf [0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdg [0:2:8:0] disk LSI MR9261-8i 2.13 /dev/sdh [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 6. Remove virtual drive 8 to address the slot mapping error: # MegaCli64 CfgLdDel L8 -force a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output when -force is required: # MegaCli64 CfgLdDel L8 -force a0
Adapter 0: Deleted Virtual Drive-8(target id-8)
Exit Code: 0x00 # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm [0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sdd [0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sde [0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdf [0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdg [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 7. Remove virtual drive 7 to address the slot mapping error: # MegaCli64 CfgLdDel L7 -force a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output when -force is required: # MegaCli64 CfgLdDel L7 -force a0
Adapter 0: Deleted Virtual Drive-7(target id-7)
Exit Code: 0x00 # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm [0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sdd [0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sde [0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdf [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 8. Remove virtual drive 6 to address the slot mapping error: # MegaCli64 CfgLdDel L6 -force a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output when -force is required: # MegaCli64 CfgLdDel L6 -force a0
Adapter 0: Deleted Virtual Drive-6(target id-6)
Exit Code: 0x00 # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm [0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sdd [0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sde [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 9. Remove virtual drive 5 to address the slot mapping error: # MegaCli64 CfgLdDel L5 -force a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output when -force is required: # MegaCli64 CfgLdDel L5 -force a0
Adapter 0: Deleted Virtual Drive-5(target id-5)
Exit Code: 0x00 # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm [0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sdd [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 10. Remove virtual drive 4 to address the slot mapping error: # MegaCli64 CfgLdDel L4 a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output when -force is required: # MegaCli64 CfgLdDel L4 -force a0
Adapter 0: Deleted Virtual Drive-4(target id-4)
Exit Code: 0x00 # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdm [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 11. Based on the above lsscsi output there is no need to remove virtual drive 3, as it is not present. 12. In the case here, it is necessary to remove virtual drive 2 as it is incorrectly mapped above. # MegaCli64 CfgLdDel L2 a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output when -force is required: # MegaCli64 CfgLdDel L2 -force a0
Adapter 0: Deleted Virtual Drive-2(target id-2)
Exit Code: 0x00 # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 13. When finished, Verify the OS disks in slot 0, 1 are "Online, Spun Up" and that the hdfs disks in slot 3-11 are "Unconfigured(good), Spun Up". # MegaCli64 pdlist a0 | egrep "^Firm|^Foreign|^Slot"
Slot Number: 0 Firmware state: Online, Spun Up Foreign State: None Slot Number: 1 Firmware state: Online, Spun Up Foreign State: None Slot Number: 2 Firmware state: Unconfigured(good), Spun Up Foreign State: None Slot Number: 3 Firmware state: Unconfigured(good), Spun Up Foreign State: None Slot Number: 4 Firmware state: Unconfigured(good), Spun Up Foreign State: None Slot Number: 5 Firmware state: Unconfigured(good), Spun Up Foreign State: None Slot Number: 6 Firmware state: Unconfigured(good), Spun Up Foreign State: None Slot Number: 7 Firmware state: Unconfigured(good), Spun Up Foreign State: None Slot Number: 8 Firmware state: Unconfigured(good), Spun Up Foreign State: None Slot Number: 9 Firmware state: Unconfigured(good), Spun Up Foreign State: None Slot Number: 10 Firmware state: Unconfigured(good), Spun Up Foreign State: None Slot Number: 11 Firmware state: Unconfigured(good), Spun Up Foreign State: None
Note that prior to this point, " MegaCli64 pdlist a0 | egrep "^Firm|^Foreign|^Slot"" will return the disks that are still "Online, Spun Up". As each virtual drive is removed the "Firmware state" will change to: "Unconfigured(good), Spun Up". Re-add the removed virtual drives in the order lowest to highestConfirm with: "MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign"" that the "Firmware state" changes from "Unconfigured(good), Spun Up" to "Online, Spun Up" as the virtual drives are readded. 1. Recreate drive 2: # MegaCli64 CfgLdAdd r0[20:2] a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output: # MegaCli64 CfgLdAdd r0[20:2] a0
Adapter 0: Created VD 2
Adapter 0: Configured the Adapter!! Exit Code: 0x00 # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 2. Recreate drive 3: # MegaCli64 CfgLdAdd r0[20:3] a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output: # MegaCli64 CfgLdAdd r0[20:3] a0
Adapter 0: Created VD 3
Adapter 0: Configured the Adapter!! # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc [0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdd [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 3. Recreate drive 4: # MegaCli64 CfgLdAdd r0[20:4] a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output: # MegaCli64 CfgLdAdd r0[20:4] a0
Adapter 0: Created VD 4
Adapter 0: Configured the Adapter!! # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc [0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdd [0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sde [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 4. Recreate drive 5: # MegaCli64 CfgLdAdd r0[20:5] a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output: # MegaCli64 CfgLdAdd r0[20:5] a0
Adapter 0: Created VD 5
Adapter 0: Configured the Adapter!! Exit Code: 0x00 # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc [0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdd [0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sde [0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sdf [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 5. Recreate drive 6: # MegaCli64 CfgLdAdd r0[20:6] a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output: # MegaCli64 CfgLdAdd r0[20:6] a0
Adapter 0: Created VD 6
Adapter 0: Configured the Adapter!! Exit Code: 0x00 # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc [0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdd [0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sde [0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sdf [0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdg [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 6. Recreate drive 7: # MegaCli64 CfgLdAdd r0[20:7] a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output: # MegaCli64 CfgLdAdd r0[20:7] a0
Adapter 0: Created VD 7
Adapter 0: Configured the Adapter!! Exit Code: 0x00 # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc [0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdd [0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sde [0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sdf [0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdg [0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdh [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 7. Recreate drive 8: # MegaCli64 CfgLdAdd r0[20:8] a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output: # MegaCli64 CfgLdAdd r0[20:8] a0
Adapter 0: Created VD 8
Adapter 0: Configured the Adapter!! Exit Code: 0x00 # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc [0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdd [0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sde [0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sdf [0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdg [0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdh [0:2:8:0] disk LSI MR9261-8i 2.13 /dev/sdi [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 8. Recreate drive 9: # MegaCli64 CfgLdAdd r0[20:9] a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output: # MegaCli64 CfgLdAdd r0[20:9] a0
Adapter 0: Created VD 9
Adapter 0: Configured the Adapter!! Exit Code: 0x00 # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc [0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdd [0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sde [0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sdf [0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdg [0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdh [0:2:8:0] disk LSI MR9261-8i 2.13 /dev/sdi [0:2:9:0] disk LSI MR9261-8i 2.13 /dev/sdj [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 9. Recreate drive 10: # MegaCli64 CfgLdAdd r0[20:10] a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output: # MegaCli64 CfgLdAdd r0[20:10] a0
Adapter 0: Created VD 10
Adapter 0: Configured the Adapter!! Exit Code: 0x00 # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc [0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdd [0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sde [0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sdf [0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdg [0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdh [0:2:8:0] disk LSI MR9261-8i 2.13 /dev/sdi [0:2:9:0] disk LSI MR9261-8i 2.13 /dev/sdj [0:2:10:0] disk LSI MR9261-8i 2.13 /dev/sdk [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 10. Recreate drive 11: # MegaCli64 CfgLdAdd r0[20:11] a0
# lsscsi # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign" Output: # MegaCli64 CfgLdAdd r0[20:11] a0
Adapter 0: Created VD 11
Adapter 0: Configured the Adapter!! Exit Code: 0x00 # lsscsi
[0:0:20:0] enclosu ORACLE CONCORD14 0d03 -[0:2:0:0] disk LSI MR9261-8i 2.13 /dev/sda [0:2:1:0] disk LSI MR9261-8i 2.13 /dev/sdb [0:2:2:0] disk LSI MR9261-8i 2.13 /dev/sdc [0:2:3:0] disk LSI MR9261-8i 2.13 /dev/sdd [0:2:4:0] disk LSI MR9261-8i 2.13 /dev/sde [0:2:5:0] disk LSI MR9261-8i 2.13 /dev/sdf [0:2:6:0] disk LSI MR9261-8i 2.13 /dev/sdg [0:2:7:0] disk LSI MR9261-8i 2.13 /dev/sdh [0:2:8:0] disk LSI MR9261-8i 2.13 /dev/sdi [0:2:9:0] disk LSI MR9261-8i 2.13 /dev/sdj [0:2:10:0] disk LSI MR9261-8i 2.13 /dev/sdk [0:2:11:0] disk LSI MR9261-8i 2.13 /dev/sdm [7:0:0:0] disk ORACLE UNIGEN-UFD PMAP /dev/sdl 11. Verify all disks now have "Firmware state" "Online, Spun Up": # MegaCli64 pdlist a0 | egrep "^Firm|^Slot|^Foreign"
Slot Number: 0 Firmware state: Online, Spun Up Foreign State: None Slot Number: 1 Firmware state: Online, Spun Up Foreign State: None Slot Number: 2 Firmware state: Online, Spun Up Foreign State: None Slot Number: 3 Firmware state: Online, Spun Up Foreign State: None Slot Number: 4 Firmware state: Online, Spun Up Foreign State: None Slot Number: 5 Firmware state: Online, Spun Up Foreign State: None Slot Number: 6 Firmware state: Online, Spun Up Foreign State: None Slot Number: 7 Firmware state: Online, Spun Up Foreign State: None Slot Number: 8 Firmware state: Online, Spun Up Foreign State: None Slot Number: 9 Firmware state: Online, Spun Up Foreign State: None Slot Number: 10 Firmware state: Online, Spun Up Foreign State: None Slot Number: 11 Firmware state: Online, Spun Up Foreign State: None Reboot when all the disks are online spun-upAfter reboot all mounts should be in place. 1. bdacheckhw should be fully healthy and report: SUCCESS: No hardware errors reported by ILOM
2. bdachecksw should be fully healthy and report: SUCCESS: Correct bda-monitor status : bda monitor is running
SUCCESS: Big Data Appliance software validation checks succeeded 3. Start the roles back depending on how you stopped them. a) In CM re-add back in any mount points that were removed from the DataNode Data Directory dfs.data.dir, dfs.datanode.data.dir field. or b) Restore and roles in CM that were stopped: i. Find the roles: Hosts > <host> > processes > check the process e.g. RegionServer ii. Start the roles: hdfs > Instances > <DataNode for the host> > Actions for Selected > Start 4. Resume to configuring the drive that was replaced as per: How to Configure a Server Disk After Replacement as an HDFS Disk or Oracle NoSQL Database Attachments This solution has no attachment |
||||||||||||||||||||||||||||||
|