Steps for Replacing a Disk Drive and Determining its Function on the Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x

Asset ID:	1-79-1581331.1
Update Date:	2018-03-14
Keywords:

Solution Type Predictive Self-Healing Sure

Solution 1581331.1 : Steps for Replacing a Disk Drive and Determining its Function on the Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x

Applies to:

Big Data Appliance X5-2 Full Rack - Version All Versions and later
Big Data Appliance X3-2 In-Rack Expansion - Version All Versions and later
Big Data Appliance X4-2 Starter Rack - Version All Versions and later
Big Data Appliance X5-2 Hardware - Version All Versions and later
Big Data Appliance X3-2 Starter Rack - Version All Versions and later
Linux x86-64

Purpose

The document describes the steps for replacing a disk drive and identifying its function so that it can be configured for use on the Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x.

Scope

This document is to be used by anyone who is configuring the disk. If attempting the steps and further assistance is needed please log a service request to contact support for help.

Details

About Disks, HBA, Slot Numbers, and Device Names

The Oracle Big Data Appliance includes a disk enclosure cage that holds 12 disk drives and is controlled by the HBA (Host Bus Adapter). The drives in this enclosure are identified by slot numbers 0 to 11 and can have different purposes, for example the drives in slot 0 and 1 have a raid 1 OS and boot partitions. The drives can be dedicated to specific functions, as shown in Table 1.

Version 2 of the image introduces new device symbolic links in /dev/disk/by_hba_slot/. The links refer the physical location or slot number of a disk inside the disk enclosure cage. The links are of the form of s<n>p<m>, where n is the slot number and m is the partition number. For example in an unaltered system the /dev/disk/by_hba_slot/s0p1 corresponds to /dev/sda1, ..s0p4 to ..sda4, ..s1p1 to sdb1 etc, and disk /dev/sda itself corresponds to /dev/by-hba-slot/s0, ..sdb to ..s1 etc.

When a disk is hot swapped, the operating system cannot reuse the kernel device name. Instead, it allocates a new device name. For example if /dev/sda was hot swapped then the disk corresponding /dev/disk/by-hba-slot/s0 may link to /dev/sdn instead of /dev/sda. Therefore, the links in /dev/disk/by-hba-slot/ are automatically updated (as part of udev rules) when devices are added or removed. Hence we prefer to use the symbolic device links in /dev/disk/by-hba-slot in configuration and recovery procedures.

Slot number device names in /dev/disk/by-hba-slot/ are based on virtual disk numbers as exposed by the HBA. The physical position slot number is specified when creating the virtual drive using the HBA cli MegaCli64. The base image creates virtual disks with corresponding physical slot positions.
Although symbolic device names can be used in OS commands, these device names are resolved to kernel sysf device names and output of commands may list device names as kernel device names. Thus /dev/disk/by-hba-slot/s0 can be listed as /sys/block/sda.
In the rest of the document we will refer to disk or partition by its symbolic name in /dev/disk/by-hba-slot/. Thus s0, s1p4 etc.

Overview of Disk Replacement

Disk replacement involves configuring the new disk using the disk controller MegaCli64 tool and formatting it using the Linux commands. All replaced disks need to be configured on the LSI raid controller using the MegaCli64 command line utility invoked from the boot screen or Linux command line. This is necessary in order for the disks to be recognized by the OS. The disks are configured as raid0 logical drives with logical drive 0 being recognized as disk /dev/disks/by-hba-slot/s0 on the OS level. The physical disk position (or slot number) does not need to correspond with the logical disk number. Use MegaCli64 LdPdInfo a0 to verify mapping of logical to physical drive numbers. While it is necessary to keep the same physical to logical disk mapping it is not necessary to keep the same disk to device mapping for the kernel (/dev/sd?). This is because on BDA the symbolic device links in /dev/disk/by-hba-slot are used in configuration and recovery procedures.

After configuring the disk it needs to be formatted and put back in the pool of disks for what the original disk was intended for. The steps required for this depend on the original disk usage. In general a disk on the BDA can be used for one of 3 purposes, OS, HDFS, or Oracle NoSQL Database. The first two disks s0 and s1 are the OS disks. The last disks disks s2..s11 are always configured as HDFS disks or Oracle NoSQL Database disks. Note that in the case of a service, like Kudu for example, which is not configured by Mammoth, that service may need to be manually configured after disk replacement.

In no case, the failure of a disk is catastrophic, nor should it cause the loss of user data. User data for HDFS is protected because HDFS data blocks are replicated. In rare cases this feature maybe turned off, but then a deliberate choice is made by the user that the data is non-critical. Metadata is stored on the root partition which is mirrored across 2 disks for redundancy. For example the TeraSort benchmark (actually TeraGen) would turn off replication since the data is generated. Oracle NoSQL Database data is also replicated by means of replica servers. Note that Oracle NoSQL Database is an optional feature and does not need to be active on the BDA. Only in the case of an Oracle NoSQL Database cluster will NOSQLDB disks be found. However non-HDFS user data on the replaced disk will be lost. If the disk is not failed, you can try to backup non-HDFS user-specific data.

OS disks s0 and s1. An OS disk holds a copy of the mirrored OS in addition to a swap partition, mirrored boot partition and HDFS data partition. Replacing such disk, involves cloning the partition table from the surviving disk and making the HDFS partition (a ext4 file system) and adding the software raid partitions for the OS and boot partitions.
HDFS only disks(s2..s11). These disk are used for HDFS data only. This will only be found on an HDFS cluster. Replacing involves partitioning and formatting the disk.
NOSQLDB disks (s2..s11). Replacing involves destroying the NOSQLDB logical volume and rebuilding it. This will only be found on an Oracle NoSQL Database cluster.

The steps for replacing a disk drive and identifying the dedicated function of the disk as either an HDFS disk, an operating system disk, or an Oracle NoSQL Database disk are provided in this document. Once you have replaced the disk drive and identified the function of the drive follow the steps in the appropriate document listed below to configure the disk drive:

How to Configure a Server Disk After Disk Replacement as an Operating System Disk for /u01 and /dev/sda on Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x (Doc ID 1581338.1)
How to Configure a Server Disk After Disk Replacement as an Operating System Disk for /u02 and /dev/sdb on Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x (Doc ID 1581373.1)
How to Configure a Server Disk After Replacement as an HDFS Disk or Oracle NoSQL Database Disk on Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x (Doc ID 1581583.1)

Disk Drive Identifiers

The following table (Table 1) shows the mappings between the RAID logical drives and the probable initial kernel device names, and the dedicated function of each drive in an Oracle Big Data Appliance server. The server with the failed drive is part of either a CDH cluster (HDFS) or an Oracle NoSQL Database cluster. This information will be used in a later procedure of partioning the disk for it's appropriate function so please note which mapping is applicable for the disk drive that is being replaced.

Table 1 - Disk Drive Identifiers

Physical Slot	Symbolic Link to Physical Slot	Probable Initial Kernel Device Name	Dedicated Function
0	/dev/disk/by-hba-slot/s0	/dev/sda	Operating system
1	/dev/disk/by-hba-slot/s1	/dev/sdb	Operating system
2	/dev/disk/by-hba-slot/s2	/dev/sdc	HDFS or Oracle NoSQL Database
3	/dev/disk/by-hba-slot/s3	/dev/sdd	HDFS or Oracle NoSQL Database
4	/dev/disk/by-hba-slot/s4	/dev/sde	HDFS or Oracle NoSQL Database
5	/dev/disk/by-hba-slot/s5	/dev/sdf	HDFS or Oracle NoSQL Database
6	/dev/disk/by-hba-slot/s6	/dev/sdg	HDFS or Oracle NoSQL Database
7	/dev/disk/by-hba-slot/s7	/dev/sdh	HDFS or Oracle NoSQL Database
8	/dev/disk/by-hba-slot/s8	/dev/sdi	HDFS or Oracle NoSQL Database
9	/dev/disk/by-hba-slot/s9	/dev/sdj	HDFS or Oracle NoSQL Database
10	/dev/disk/by-hba-slot/s10	/dev/sdk	HDFS or Oracle NoSQL Database
11	/dev/disk/by-hba-slot/s11	/dev/sdl	HDFS or Oracle NoSQL Database

Standard Mount Points

The following table (Table 2) shows the mappings between HDFS partitions and mount points. This information will be used in the later procedure so please note which mapping is applicable for the disk drive that is being replaced.

Table 2 - Mount Points

Physical Slot	Symbolic Link to Physical Slot and Partition	Probable Name for HDFS Partition	Mount Point
0	/dev/disk/by-hba-slot/s0p4	/dev/sda4	/u01
1	/dev/disk/by-hba-slot/s1p4	/dev/sdb4	/u02
2	/dev/disk/by-hba-slot/s2p1	/dev/sdc1	/u03
3	/dev/disk/by-hba-slot/s3p1	/dev/sdd1	/u04
4	/dev/disk/by-hba-slot/s4p1	/dev/sde1	/u05
5	/dev/disk/by-hba-slot/s5p1	/dev/sdf1	/u06
6	/dev/disk/by-hba-slot/s6p1	/dev/sdg1	/u07
7	/dev/disk/by-hba-slot/s7p1	/dev/sdh1	/u08
8	/dev/disk/by-hba-slot/s8p1	/dev/sdi1	/u09
9	/dev/disk/by-hba-slot/s9p1	/dev/sdj1	/u10
10	/dev/disk/by-hba-slot/s10p1	/dev/sdk1	/u11
11	/dev/disk/by-hba-slot/s11p1	/dev/sdl1	/u12

Note: MegaCli64, mount, umount and many of the commands require root log in so the recommendation is to run the entire procedure as root.

Note: The code examples provided here are based on replacing /dev/disk/by-hba-slot/s4 == /dev/sde == /dev/disk/by-hba-slot/s4p1 == /dev/sde1 == /u05. These 4 mappings for example are an easy way to set up the information that will be needed throughout the procedure. It is best to figure out the mapping and write it down to use in the procedure. For example, slot # is one less that mount point. All disks replacements will vary so please replace the examples with the proper information in regards to the disk replacement being done.

Helpful Tips: You can re-confirm the relationship among the disk slot number, the current kernel device name and mount point as follows:

1. Re-confirm the relationship between slot number and the current kernel device name using "lsscsi" command.
The lsscsi command shows the slot number X as "[0.2:X:0]". For example [0:2:4:0] means the slot number 4, [0:2:11:0] means the slot number 11.

# lsscsi
[0:0:20:0]   enclosu ORACLE   CONCORD14        0d03 -
[0:2:0:0]    disk    LSI      MR9261-8i        2.13 /dev/sda
[0:2:1:0]    disk    LSI      MR9261-8i        2.13 /dev/sdb
[0:2:2:0]    disk    LSI      MR9261-8i        2.13 /dev/sdc
[0:2:3:0]    disk    LSI      MR9261-8i        2.13 /dev/sdd
[0:2:4:0]    disk    LSI      MR9261-8i        2.13 /dev/sdn
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0:2:5:0]    disk    LSI      MR9261-8i        2.13 /dev/sdf
[0:2:6:0]    disk    LSI      MR9261-8i        2.13 /dev/sdg
[0:2:7:0]    disk    LSI      MR9261-8i        2.13 /dev/sdh
[0:2:8:0]    disk    LSI      MR9261-8i        2.13 /dev/sdi
[0:2:9:0]    disk    LSI      MR9261-8i        2.13 /dev/sdj
[0:2:10:0]   disk    LSI      MR9261-8i       2.13 /dev/sdk
[0:2:11:0]   disk    LSI      MR9261-8i       2.13 /dev/sdl
[7:0:0:0]    disk    ORACLE   UNIGEN-UFD       PMAP /dev/sdm

In this case, you can re-confirm the following:

    slot 0:   /dev/sda
    slot 1:   /dev/sdb
    slot 2:   /dev/sdc
    slot 3:   /dev/sdd
    slot 4:   /dev/sdn
    slot 5:   /dev/sdf
slot 6: /dev/sdg
    slot 7:   /dev/sdh
    slot 8: /dev/sdi
    slot 9:   /dev/sdj
    slot 10: /dev/sdk
    slot 11: /dev/sdl

2. Re-confirm the relationship between the current kernel device name and mount point using "mount" command.

# mount -l | grep /u
/dev/sda4 on /u01 type ext4 (rw,nodev,noatime) [/u01]
/dev/sdb4 on /u02 type ext4 (rw,nodev,noatime) [/u02]
/dev/sdc1 on /u03 type ext4 (rw,nodev,noatime) [/u03]
/dev/sdd1 on /u04 type ext4 (rw,nodev,noatime) [/u04]
/dev/sdn1 on /u05 type ext4 (rw,nodev,noatime) [/u05]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/dev/sdf1 on /u06 type ext4 (rw,nodev,noatime) [/u06]
/dev/sdg1 on /u07 type ext4 (rw,nodev,noatime) [/u07]
/dev/sdh1 on /u08 type ext4 (rw,nodev,noatime) [/u08]
/dev/sdi1 on /u09 type ext4 (rw,nodev,noatime) [/u09]
/dev/sdj1 on /u10 type ext4 (rw,nodev,noatime) [/u10]
/dev/sdk1 on /u11 type ext4 (rw,nodev,noatime) [/u11]
/dev/sdl1 on /u12 type ext4 (rw,nodev,noatime) [/u12]

In this case, you can re-confirm the followings:

    /dev/sda:   /u01
    /dev/sdb:   /u02
    /dev/sdc:    /u03
    /dev/sdd:   /u04
    /dev/sdn:   /u05
    /dev/sdf:    /u06
    /dev/sdg:   /u07
    /dev/sdh: /u08
    /dev/sdi:    /u09
    /dev/sdj:    /u10
    /dev/sdk:   /u11
    /dev/sdl:    /u12

3. From the outputs above, you can re-confirm the relationship among them as follows:

    slot 0: /dev/sda: /u01
    slot 1: /dev/sdb: /u02
    slot 2: /dev/sdc: /u03
    slot 3:   /dev/sdd: /u04
    slot 4:   /dev/sdn: /u05
    ^^^^^^^^^^^^^^^^^^^^^^
    slot 5: /dev/sdf:     /u06
    slot 6:   /dev/sdg:   /u07
    slot 7:   /dev/sdh:   /u08
    slot 8:   /dev/sdi:    /u09
    slot 9:   /dev/sdj:    /u10
    slot 10: /dev/sdk: /u11
    slot 11: /dev/sdl:   /u012

Obtaining the Physical Slot Number of a Disk Drive

Log in as root user and issue the following MegaCli64 command to verify the mapping of virtual drive numbers to physical slot numbers.

# MegaCli64 LdPdInfo a0 | more

Here is an excerpt of what the MegaCli64 LdPdInfo command will show about the virtual drive number 4 to physical slot number 4 in enclosure id 20:

Virtual Drive: 4 (Target Id: 4)
Name                :
RAID Level          : Primary-0, Secondary-0, RAID Level Qualifier-0
Size                : 1.817 TB
Parity Size         : 0
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 1
Span Depth          : 1
Default Cache Policy: WriteBack, ReadAheadNone, Cached, No Write Cache if Bad BB
U
Current Cache Policy: WriteBack, ReadAheadNone, Cached, No Write Cache if Bad BB
U
Access Policy       : Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Number of Spans: 1
Span: 0 - Number of PDs: 1

PD: 0 Information
Enclosure Device ID: 20
Slot Number: 4
Drive's postion: DiskGroup: 11, Span: 0, Arm: 0
Enclosure position: 0
Device Id: 15
WWN: 5000C50040A91930
Sequence Number: 5
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 1.819 TB [0xe8e088b0 Sectors]
Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors]
Coerced Size: 1.817 TB [0xe8b6d000 Sectors]
Firmware state: Online, Spun Up
Is Commissioned Spare : NO
Device Firmware Level: 061A
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x5000c50040a91931
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: SEAGATE ST32000SSSUN2.0T061A1141L7Y3G3
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Hard Disk Device
Drive Temperature :25C (77.00 F)
PI Eligibility: No
Drive is formatted for PI information: No
PI: No PI
Drive's write cache : Disabled
Port-0 :
Port status: Active
Port's Linkspeed: 6.0Gb/s
Port-1 :
Port status: Active
Port's Linkspeed: Unknown
Drive has flagged a S.M.A.R.T alert : No
...

Replacing a Disk Drive

Note: On BDA V3.0 to BDA V4.1.0 due to the issue reported in: Replacing DataNode Disks or Manually Changing the Storage IDs of Volumes in a Cluster May Result in Data Loss on BDA V3.0-V4.1.0 (Doc ID 1997896.1), please file an SR with Oracle Support prior to replacing a disk to determine the best way to avoid the problem.

Note: It is ok to replace a "bad"/"failed" disk on the BDA if the blue light is not on the drive. The blue light does not turn on for BDA (as it does for other Engineered Systems). It does not need to be turned on for an otherwise verified "bad"/"failing" disk to be replaced.

Complete this procedure for all failed disk drives. If multiple disks are unconfigured, then configure them in order from the lowest to the highest slot number.

1. Before replacing a Failed or Working(predicted to fail) disk refer to instructions in section "Prerequisites for Replacing a Working / Failing Disk."

2. If the disk is on an Oracle NoSQL Database cluster and is not an operating system disk, stop the Oracle NoSQL Database service - Ignore this step if the disk is running on an HDFS Cluster:

# service nsdbservice stop

Note: This will take the host offline until the disk is replaced.

If the following errors are seen when this step is run this means that this is not on an Oracle NoSQL Database cluster:

# service nsdbservice status
nsdbservice: unrecognized service

# service nsdbservice stop

File not found: /opt/oracle/kv-ce/bin/kvctl

3. Physically replace the failed disk drive. At this point there is no service light support.

4. Power on the server if you powered it off to replace the failed disk.

5. Connect to the server as root using an SSL connection from a laptop to the server.

In some cases after a disk replacement when the BDA server is rebooted the server won't come back up and appears to be stuck. Checking in ILOM a "LSI MegaRaid SAS-MFI BIOS" screen may be seen. During the boot process, monitor the graphics console through the ILOM java console. When loading its BIOS ROM, the new RAID controller will detect the RAID configuration on the disks and may complain that it has a foreign configuration.

This is expected. See the following note on how to resolve this: "MegaRaid BIOS Shows "Foreign configuration(s) found on adapter" On Reboot After BDA node Disk Replacement [ID 1551275.1]."

6. Issue MegaCli64 pdlist a0 and store the physical drive information in a file:

# MegaCli64 pdlist a0 > pdinfo.tmp

Note: This command redirects the output to a file so that you can perform several searches using a text editor. If you prefer, you can pipe the output through the more or grep commands. Look for Unconfigured disks or Foreign disks. Foreign disks are disks the controller has seen before such as reinserted disks. The state of Foreign State: Foreign, Firmware States of Unconfigured(bad) or Unconfigured(good) need to be resolved as will be detailed below. This is the reason for looking into the pdinfo.tmp and why there is a need to keep an eye on Foreign or Unconfigured states. The motivation is that the procedures that follow are going to fix these states.

The utility returns the following information for each slot. This example shows a Firmware State of Unconfigured(good), Spun Up.

Enclosure Device ID: 20
Slot Number: 4
Enclosure position: 0
Device Id: 18
WWN: 5000C500348B72B4
Sequence Number: 4
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SAS
Raw Size: 1.819 TB [0xe8e088b0 Sectors]
Non Coerced Size: 1.818 TB [0xe8d088b0 Sectors]
Coerced Size: 1.817 TB [0xe8b6d000 Sectors]
Firmware state: Unconfigured(good), Spun Up
Is Emergency Spare : NO
Device Firmware Level: 061A
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x5000c500348b72b5
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: SEAGATE ST32000SSSUN2.0T061A1127L6LX53
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None

7. Open the file you created in the last step in a text editor and search for the following:

Disks that have a Foreign State of Foreign.

Disks that have a Firmware state of Unconfigured.

The following example shows a Firmware state of Unconfigured(bad) and a Foreign State of Foreign:

...
Firmware state: Unconfigured(bad)
Device Firmware Level: 061A
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x5000c50040a4a8b9
SAS Address(1): 0x0
Connected Port Number: 0(path0)
Inquiry Data: SEAGATE ST32000SSSUN2.0T061A1140L7VSMZ
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: Foreign
...
...

8. For disks that have a Foreign State of Foreign, clear that status:

# MegaCli64 CfgForeign clear a0

Example output which shows the foreign state is cleared.

# MegaCli64 CfgForeign clear a0

Foreign configuration 0 is cleared on controller 0.

Exit Code: 0x00

A foreign disk is one that the controller saw previously, such as a reinserted disk.

9. For disks that have a Firmware State of Unconfigured(bad), complete these steps. If multiple disks are unconfigured, then configure them in order from the lowest to the highest slot number:

Note the enclosure device ID number (for example 20 or 81) and the slot number (for example 4). From the example in step 6 the following was shown for the Unconfigured(bad) disk:

Enclosure Device ID: 20
Slot Number: 4

b. Enter a command in this format replacing the enclosure and slot numbers which were found in the pdinfo.tmp file from running "MegaCli64 pdlist a0 > pdinfo.tmp" in step 5 above:

# MegaCli64 pdmakegood physdrv[enclosure:slot] a0

For example, [20:4] repairs the disk identified by enclosure 20 in slot 4.

MegaCli64 pdmakegood physdrv[20:4] a0

c. Check the current status of Foreign State again:

# MegaCli64 pdlist a0 | grep Foreign

d. If the Foreign State is still Foreign, then repeat the clear command:

# MegaCli64 CfgForeign clear a0

10. For disks that have a Firmware State of Unconfigured(good), use the following command. If multiple disks are unconfigured, then configure them in order from the lowest to the highest slot number as below. Use this step whether the state is "Unconfigured(good), Spun up" or "Unconfigured(good), Spun down”.

# MegaCli64 CfgLdAdd r0[enclosure:slot] a0

For example, [20:4] repairs the disk assuming virtual disk number 4, disk position enclosure id 20, slot number 4 (which would map to /dev/sde).

# MegaCli64 CfgLdAdd r0[20:4] a0

Adapter 0: Created VD 4

Adapter 0: Configured the Adapter!!

Exit Code: 0x00

The following example shows configuring multiple unconfigured disks in slots 4 and 5 from lowest to highest:

# MegaCli64 CfgLdAdd r0[20:4] a0

Adapter 0: Created VD 4

Adapter 0: Configured the Adapter!!

Exit Code: 0x00

# MegaCli64 CfgLdAdd r0[20:5] a0

Adapter 0: Created VD 5

Adapter 0: Configured the Adapter!!

Exit Code: 0x00

11. If CfgLdAdd command fails because of cached data present, use MegaCli64 -DiscardPreservedCache -Ln a0 to clear the cache for the logical drive where n is the number of the slot used.

# MegaCli64 -DiscardPreservedCache -Ln a0

For the above example it would be:

# MegaCli64 -DiscardPreservedCache -L4 a0

Note: Do NOT use CfgEachDskRaid0, this command reverses the slot and virtual disk numbers, so slot11 becomes virtual disk 0.

12. Verify the disk is recognized by the operating system.

# lsscsi

The disk may appear with its original device name (such as /dev/sdc) or under a new device name (such as /dev/sdn). If the operating system does not recognize the disk, then the disk is missing from the list generated by the lsscsi command.

The lsscsi output might not show the correct order, but you can continue with the configuration. While the same physical to logical disk mapping is required, the same disk to device mapping for the kernel is not required. The disk configuration is based on /dev/disks/by-hbda-slot device names.

This example output shows a disk with a new device names: /dev/sdn in slot 4.

# lsscsi
[0:0:20:0]   enclosu SUN      HYDE12           0341 -
[0:2:0:0]    disk    LSI      MR9261-8i        2.12 /dev/sda
[0:2:1:0] disk    LSI      MR9261-8i        2.12 /dev/sdb
[0:2:2:0]    disk    LSI      MR9261-8i        2.12 /dev/sdc
[0:2:3:0]    disk    LSI      MR9261-8i        2.12 /dev/sdd
[0:2:4:0]      disk      LSI       MR9261-8i       2.12 /dev/sdn
[0:2:5:0]    disk    LSI      MR9261-8i        2.12 /dev/sdf
[0:2:6:0]    disk    LSI      MR9261-8i        2.12 /dev/sdg
[0:2:7:0]    disk    LSI      MR9261-8i        2.12 /dev/sdh
[0:2:8:0]    disk    LSI      MR9261-8i        2.12 /dev/sdi
[0:2:9:0]    disk    LSI      MR9261-8i        2.12 /dev/sdj
[0:2:10:0] disk    LSI      MR9261-8i        2.12 /dev/sdk
[0:2:11:0] disk    LSI      MR9261-8i        2.12 /dev/sdl
[7:0:0:0]    disk    Unigen   PSA4000        1100 /dev/sdm

This example output shows a disk with a new device names: /dev/sdm in slot 7

# lsscsi
[0:0:20:0]   enclosu SUN      HYDE12           0341 -
[0:2:0:0]    disk    LSI      MR9261-8i        2.12 /dev/sda
[0:2:1:0]    disk    LSI      MR9261-8i        2.12 /dev/sdb
[0:2:2:0]    disk    LSI      MR9261-8i        2.12 /dev/sdc
[0:2:3:0]    disk    LSI      MR9261-8i        2.12 /dev/sdd
[0:2:4:0]    disk    LSI      MR9261-8i        2.12 /dev/sde
[0:2:5:0]    disk    LSI      MR9261-8i        2.12 /dev/sdf
[0:2:6:0]    disk    LSI      MR9261-8i        2.12 /dev/sdg
[0:2:7:0]      disk     LSI    MR9261-8i     2.12 /dev/sdm
[0:2:8:0]    disk    LSI      MR9261-8i        2.12 /dev/sdh
[0:2:9:0]    disk    LSI      MR9261-8i        2.12 /dev/sdi
[0:2:10:0]   disk    LSI      MR9261-8i        2.12 /dev/sdj
[0:2:11:0]   disk    LSI      MR9261-8i        2.12 /dev/sdk
[7:0:0:0]    disk    Unigen   PSA4000          1100 /dev/sdl

13. Check the hardware profile of the server as root, and correct any errors:

# bdacheckhw

Example excerpt from running bdacheckhw:

# bdacheckhw
...
SUCCESS: Correct number of disks : 12
SUCCESS: Correct disk 0 model : SEAGATE ST32000SSSUN2.0
SUCCESS: Sufficient disk 0 firmware (>=61A): 61A
SUCCESS: Correct disk 1 model : SEAGATE ST32000SSSUN2.0
SUCCESS: Sufficient disk 1 firmware (>=61A): 61A
SUCCESS: Correct disk 2 model : SEAGATE ST32000SSSUN2.0
SUCCESS: Sufficient disk 2 firmware (>=61A): 61A
SUCCESS: Correct disk 3 model : SEAGATE ST32000SSSUN2.0
SUCCESS: Sufficient disk 3 firmware (>=61A): 61A
SUCCESS: Correct disk 4 model : SEAGATE ST32000SSSUN2.0
SUCCESS: Sufficient disk 4 firmware (>=61A): 61A
SUCCESS: Correct disk 5 model : SEAGATE ST32000SSSUN2.0
SUCCESS: Sufficient disk 5 firmware (>=61A): 61A
SUCCESS: Correct disk 6 model : SEAGATE ST32000SSSUN2.0
SUCCESS: Sufficient disk 6 firmware (>=61A): 61A
SUCCESS: Correct disk 7 model : SEAGATE ST32000SSSUN2.0
SUCCESS: Sufficient disk 7 firmware (>=61A): 61A
SUCCESS: Correct disk 8 model : SEAGATE ST32000SSSUN2.0
SUCCESS: Sufficient disk 8 firmware (>=61A): 61A
SUCCESS: Correct disk 9 model : SEAGATE ST32000SSSUN2.0
SUCCESS: Sufficient disk 9 firmware (>=61A): 61A
SUCCESS: Correct disk 10 model : SEAGATE ST32000SSSUN2.0
SUCCESS: Sufficient disk 10 firmware (>=61A): 61A
SUCCESS: Correct disk 11 model : SEAGATE ST32000SSSUN2.0
SUCCESS: Sufficient disk 11 firmware (>=61A): 61A
SUCCESS: Correct disk 0 status : Online, Spun Up No alert
SUCCESS: Correct disk 1 status : Online, Spun Up No alert
SUCCESS: Correct disk 2 status : Online, Spun Up No alert
SUCCESS: Correct disk 3 status : Online, Spun Up No alert
SUCCESS: Correct disk 4 status : Online, Spun Up No alert
SUCCESS: Correct disk 5 status : Online, Spun Up No alert
SUCCESS: Correct disk 6 status : Online, Spun Up No alert
SUCCESS: Correct disk 7 status : Online, Spun Up No alert
SUCCESS: Correct disk 8 status : Online, Spun Up No alert
SUCCESS: Correct disk 9 status : Online, Spun Up No alert
SUCCESS: Correct disk 10 status : Online, Spun Up No alert
SUCCESS: Correct disk 11 status : Online, Spun Up No alert
SUCCESS: Correct disk 0 predictive failures : 0
SUCCESS: Correct disk 1 predictive failures : 0
SUCCESS: Correct disk 2 predictive failures : 0
SUCCESS: Correct disk 3 predictive failures : 0
SUCCESS: Correct disk 4 predictive failures : 0
SUCCESS: Correct disk 5 predictive failures : 0
SUCCESS: Correct disk 6 predictive failures : 0
SUCCESS: Correct disk 7 predictive failures : 0
SUCCESS: Correct disk 8 predictive failures : 0
SUCCESS: Correct disk 9 predictive failures : 0
SUCCESS: Correct disk 10 predictive failures : 0
SUCCESS: Correct disk 11 predictive failures : 0
SUCCESS: Correct number of virtual disks : 12
SUCCESS: Correct slot 0 mapping to HBA target : 0
SUCCESS: Correct slot 1 mapping to HBA target : 1
SUCCESS: Correct slot 2 mapping to HBA target : 2
SUCCESS: Correct slot 3 mapping to HBA target : 3
SUCCESS: Correct slot 4 mapping to HBA target : 4
SUCCESS: Correct slot 5 mapping to HBA target : 5
SUCCESS: Correct slot 6 mapping to HBA target : 6
SUCCESS: Correct slot 7 mapping to HBA target : 7
SUCCESS: Correct slot 8 mapping to HBA target : 8
SUCCESS: Correct slot 9 mapping to HBA target : 9
SUCCESS: Correct slot 10 mapping to HBA target : 10
SUCCESS: Correct slot 11 mapping to HBA target : 11
SUCCESS: Correct Host Channel Adapter model : Mellanox Technologies MT26428 ConnectX VPI PCIe 2.0
SUCCESS: Correct Host Channel Adapter firmware version : 2.9.1000
SUCCESS: Correct Host Channel Adapter PCI address : 0d:00.0
SUCCESS: Correct Host Channel Adapter PCI info : 0c06: 15b3:673c
SUCCESS: Correct Host Channel Adapter PCIe slot width : x8
SUCCESS: Correct internal USB device (sdm) : 1
SUCCESS: Correct internal USB size : 4010MB
...
SUCCESS: Big Data Appliance hardware validation checks succeeded

If either the wrong number of virtual disks is reported or if there are multiple incorrect slot mappings see: After hdfs Disk Replacement bdacheckhw Raises "ERROR: Wrong number of virtual disks : 13" and Reports "Wrong slot mapping to HBA target" Errors (Doc ID 2227128.1).

14. Check the software profile of the server as root, and correct any errors:

# bdachecksw

Mount errors stating the replaced disk partition missing can be ignored at this point. For example, disk /u08 in slot7 has been replaced and if bdachecksw states below error then it can be ignored

ERROR: Wrong mounted partitions : /dev/md0 /boot ext3 /dev/md2 / ext3 /dev/sd4 /u01 ext4 /dev/sd4 /u02 ext4 /dev/sd1 /u03 ext4 /dev/sd1 /u04 ext4 /dev/sd1 /u05 ext4 /dev/sd1 /u06 ext4 /dev/sd1
/u07 ext4 /dev/sd1 /u09 ext4 /dev/sd1 /u10 ext4 /dev/sd1 /u11 ext4 /dev/sd1 /u12 ext4
INFO: Expected mounted partitions : 12 data partitions, /boot and /

But if bdachecksw reports duplicate mount points or the slots are switched then refer to "Correcting a Mounted Partitions Error."

In BDA V4.3 and higher errors like "Wrong software Root/Boot RAID info" can be ignored if issued for one of the first two disks. This is a check to confirm the health of RAID partitions and is expected when one of the first two disks goes bad.

ERROR: Wrong software Root RAID info : removed active sync
INFO: Expected software Root RAID info : active sync active sync
ERROR: Wrong software Boot RAID info : removed active sync
INFO: Expected software Boot RAID info : active sync active sync

15. Identify the function of the drive, so you configure it properly. See "Identifying the Function of a Disk Drive."

Correcting a Mounted Partitions Error

When the bdachecksw utility finds a problem, it typically concerns the mounted partitions.

An old mount point might appear in the mount command output, so that the same mount point, such as /u03, appears twice.

To fix duplicate mount points:

1. Dismount both mount points by using the umount command twice. Replace n with the correct mountpoint.

# umount /u0n

This example dismounts two instances of /u03:

# umount /u03
# umount /u03

2. Remount the mount point.

# mount /u0n

This example remounts /u03:

# mount /u03

If a disk is in the wrong slot (that is, the virtual drive number), then you can switch two drives.

To switch slots:

1. Remove the mappings for both drives.

# MegaCli64 CfgLdDel Ln a0
# MegaCli64 CfgLdDel Ln a0

This example removes the drives from slots 4 and 10:

# MegaCli64 CfgLdDel L4 a0
# MegaCli64 CfgLdDel L10 a0

2. Add the drives in the order you want them to appear; the first command obtains the first available slot number:

# MegaCli64 CfgLdAdd r0[20:n] a0
# MegaCli64 CfgLdAdd r0[20:n] a0

Example:

# MegaCli64 CfgLdAdd r0[20:4] a0
# MegaCli64 CfgLdAdd r0[20:5] a0

Identifying the Function of a Disk Drive

The server with the failed disk is configured to support either HDFS or Oracle NoSQL Database, and most disks are dedicated to that purpose. However, two disks are dedicated to the operating system. Before configuring the new disk, find out how the failed disk was configured. Refer to Table 1 for details on the purpose of the disk.

Checking for Use by the Operating System

Oracle Big Data Appliance is configured with the operating system on the first two disks.

To confirm that a failed disk supported the operating system:

1. Check whether the replacement disk corresponds to slot 0 ( [0:2:0:0] or /dev/sda ) or slot 1 ( [0:2:1:0] or /dev/sdb), which are the operating system disks.

# lsscsi

See the output from Step 11 of "Replacing a Disk Drive"

2. Verify that /dev/sda and /dev/sdb are the operating system mirrored partitioned disks:

# mdadm -Q --detail /dev/md2

Example output:

# mdadm -Q --detail /dev/md2
/dev/md2:
        Version : 0.90
Creation Time : Tue Nov 6 13:54:28 2012
     Raid Level : raid1
     Array Size : 174079936 (166.02 GiB 178.26 GB)
Used Dev Size : 174079936 (166.02 GiB 178.26 GB)
   Raid Devices : 2
Total Devices : 1
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Mon Dec 10 10:36:00 2012
          State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

           UUID : 650d17ed:bf5b8031:82afe139:8036a906
         Events : 0.1290

    Number   Major   Minor   RaidDevice State
       0           8        2        0      active sync   /dev/sda2
       1         0        0        1      removed

3. If the previous steps indicate that the failed disk is an operating system disk, then follow the installation steps in "How to Configure a Server Disk After Disk Replacement as an Operating System Disk for /u01 and /dev/sda on Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x (Doc ID 1581338.1)" or "How to Configure a Server Disk After Disk Replacement as an Operating System Disk for /u02 and /dev/sdb on Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x (Doc ID 1581373.1)." To determine the correct note to follow please refer to the previous steps and Table 1 / Table 2 to use the appropriate document for the disk that has been replaced.

4. If the failed disk did not support the operating system, then follow the installation steps in "How to Configure a Server Disk After Replacement as an HDFS Disk or Oracle NoSQL Database Disk on Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x (Doc ID 1581583.1)."

Prerequisites for Replacing a Working / Failing Disk

Note: a "working" disk does not mean a "healthy" disk. In this case a "working" disk is still functional but it is unheathy since it is predicted to fail shortly.

If you plan to replace a working (predicted to fail) or failing HDFS disk or an operating system disk, then you should first dismount the HDFS partitions. If replacing a failing disk make sure the disk is dismounted if not then dismount it. You must also turn off swapping before replacing an operating system disk.

Caution: Only dismount HDFS partitions. For an operating system disk, ensure that you do not dismount operating system partitions. Only partition 4 (sda4 or sdb4) of an operating system disk is used for HDFS.

To dismount HDFS partitions:

1. Log in to the server with the failing drive as root user.

2. If the failing drive supported the operating system, then turn off swapping using bdaswapoff.

Important, disk 0 and disk 1 hold swap partitions. Bringing any of these disks down without inactivating the swap (with bdaswapoff) will trigger a reboot. Removing a disk with active swapping crashes the kernel. Please note that if you are not replacing disk 0 or disk 1 then this step should not be run.

# bdaswapoff

3. List the mounted HDFS partitions:

# mount -l

Sample output:

# mount -l
/dev/md2 on / type ext3 (rw,noatime)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/md0 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
/dev/sda4 on /u01 type ext4 (rw,nodev,noatime) [/u01]
/dev/sdb4 on /u02 type ext4 (rw,nodev,noatime) [/u02]
/dev/sdc1 on /u03 type ext4 (rw,nodev,noatime) [/u03]
/dev/sdd1 on /u04 type ext4 (rw,nodev,noatime) [/u04]
/dev/sde1 on /u05 type ext4 (rw,nodev,noatime) [/u05]
/dev/sdf1 on /u06 type ext4 (rw,nodev,noatime) [/u06]
/dev/sdg1 on /u07 type ext4 (rw,nodev,noatime) [/u07]
/dev/sdh1 on /u08 type ext4 (rw,nodev,noatime) [/u08]
/dev/sdi1 on /u09 type ext4 (rw,nodev,noatime) [/u09]
/dev/sdj1 on /u10 type ext4 (rw,nodev,noatime) [/u10]
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/mapper/lvg1-lv1 on /lv1 type ext4 (rw,nodev)

4. Check the list of mounted partitions for the failed disk. If the disk has no partitions listed, then proceed to the section titled "Replacing a Disk Drive." Otherwise, continue to the next step.

If dismounting /dev/sda4 on /u01 use:

# umount /u01

If dismounting /dev/sdb4 on /u02 use:

# umount /u02

5. Dismount the HDFS mount points for the failed disk as root user. Replace mountpoint below with the mount point obtained earlier as shown in the Standard Mount Points table (Table 2) above:

# umount mountpoint

Example of dismounting /u05, umount /u05 removes the mount point for disk /dev/sde:

# umount /u05

If the umount commands succeed, then verify the partition is no longer listed list the mounted HDFS partitions:

# mount -l

Sample output shows that /u05 has been dismounted:

# mount -l
/dev/md2 on / type ext3 (rw,noatime)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/md0 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
/dev/sda4 on /u01 type ext4 (rw,nodev,noatime) [/u01]
/dev/sdb4 on /u02 type ext4 (rw,nodev,noatime) [/u02]
/dev/sdc1 on /u03 type ext4 (rw,nodev,noatime) [/u03]
/dev/sdd1 on /u04 type ext4 (rw,nodev,noatime) [/u04]
/dev/sdf1 on /u06 type ext4 (rw,nodev,noatime) [/u06]
/dev/sdg1 on /u07 type ext4 (rw,nodev,noatime) [/u07]
/dev/sdh1 on /u08 type ext4 (rw,nodev,noatime) [/u08]
/dev/sdi1 on /u09 type ext4 (rw,nodev,noatime) [/u09]
/dev/sdj1 on /u10 type ext4 (rw,nodev,noatime) [/u10]
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
/dev/mapper/lvg1-lv1 on /lv1 type ext4 (rw,nodev)

Proceed to "Replacing a Disk Drive." If a umount command fails with a device busy message, then the partition is still in use. Continue to the next step.

Example:

# umount /u05
umount: /u05: device is busy
umount: /u05: device is busy

6. Open a browser window to Cloudera Manager. For example:

http://bda1node02.example.com:7180

7. Complete these steps in Cloudera Manager:

Note:If you remove mount points in Cloudera Manager as described in the following steps, then you must restore these mount points in Cloudera Manager after finishing all other configuration procedures.

    a. Log in as admin.

    b. On the Services page, click hdfs.

    c. Click the Instances subtab.

    d. In the Host column, locate the server with the failed disk. Then click the service in the Name column, such as datanode (...), to open its page.

    e. Click the Configuration subtab.

    f.   Remove the mount point from the DataNode Data Directory dfs.data.dir, dfs.datanode.data.dir field. You have to click on - (minus) sign to remove this.
         This shows the screen you will see where you will need to remove the mount point.

data node before removal

In this example /u05/hadoop/dfs has been removed.

data node after removal

g. Click Save Changes.

save changes

h. From the Actions list, choose Restart this DataNode.

restart
You will see a pop up that says something similar to the following:

"Restart this DataNode

Are you sure you want to Restart the role instance datanode (...)?"

Click on the button that says "Restart this DataNode."

On BDA V3.* and higher follow steps 8 and 9. On BDA versions less than V3.* proceed to step 10 and omit steps 8 and 9.

8. For BDA V3.* and higher perform step 8. In Cloudera Manager, remove the mount point from NodeManager Local Directories:

a. On the Services page, click Yarn.

b. In the Status Summary, click NodeManager.

c. From the list, click to select the NodeManager that is on the host with the failed disk.

d. Click the Configuration subtab.

e. Remove the mount point from the NodeManager.

f. Click Save Changes.

g. From the Actions list, choose Restart this NodeManager.

9. For BDA V3.* and higher perform step 9.

If you have added any other roles that store data on the same HDFS mount point (such as HBase Region Server), then remove and restore the mount points for these roles in the same way.

10. Return to your session on the server with the failed drive. (Perform step 10 and onward on all BDA versions).

11. Reissue the umount command:

# umount mountpoint

Example removing /u05 showing that it succeeds:

# umount /u05

If the umount still fails:

a) Try a lazy umount i.e. "umount -l". This will detach the filesystem from the filesystem hierarchy, and cleanup all references to the filesystem as soon as it is not busy anymore.

For example:

# umount -l /u05

b) Run lsof to list open files under the HDFS mount point and the processes that opened them. This may help you to identify the process that is preventing the unmount.

For example:

# lsof | grep /u05

c) Run fuser to find which process is using a file,

For example:

# fuser -m -u /u05

12. Use MegaCli64 to bring the disk offline. Replace enclosure and slot for the enclosure and slot on your system:

# MegaCli64 PDoffline “physdrv[enclosure:slot]” a0

For example, "physdrv[20:10]" identifies disk, which is located in slot 10 of enclosure 20.

# MegaCli64 PDoffline “physdrv[20:10]” a0

If the PD is already off-lined and/or the Virtual Disk is already deleted, the command might fail as follows. You can ignore them and continue the next steps.

# MegaCli64 PDoffline "physdrv[20:10]" a0

Adapter 0: Device at Enclosure - 20, Slot - 10 is not found.

Exit Code: 0x01

# MegaCli64 CfgLDDel L10 a0

Adapter 0: Virtual Drive 10 Does not Exist.

13. Delete the disk from the controller configuration table:

# MegaCli64 CfgLDDel Lslot a0

For example, L10 identifies slot 10.

# MegaCli64 CfgLDDel L10 a0

14. Complete the steps in the section titled "Replacing a Disk Drive."

What If a Server Fails to Restart?

The server may restart during the disk replacement procedures, either because you issued a reboot command or made an error in a MegaCli64 command. In most cases, the server restarts successfully, and you can continue working. However, in other cases, an error occurs so that you cannot reconnect using ssh. In this case, you must complete the reboot using Oracle ILOM.

To restart a server using Oracle ILOM:

1. Use your browser to open a connection to the server using Oracle ILOM. For example:

http://bda1node12-c.example.com

Note: Your browser must have a JDK plug-in installed. If you do not see the Java coffee cup on the log-in page, then you must install the plug-in before continuing.

2. Log in using your Oracle ILOM credentials.

3. Select the Remote Control tab.

4. Click the Launch Remote Console button.

5. Enter Ctrl+d to continue rebooting.

6. If the reboot fails, then enter the server root password at the prompt and attempt to fix the problem.

7. After the server restarts successfully, open the Redirection menu and choose Quit to close the console window.

See the following documentation for more information: Oracle Integrated Lights Out Manager (ILOM) 3.0 documentation at http://docs.oracle.com/cd/E19860-01/

References

<NOTE:1551275.1> - MegaRaid BIOS Shows "Foreign configuration(s) found on adapter" On Reboot After BDA node Disk Replacement
<NOTE:1581338.1> - How to Configure a Server Disk After Disk Replacement as an Operating System Disk for /u01 and /dev/sda on Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x
<NOTE:1581373.1> - How to Configure a Server Disk After Disk Replacement as an Operating System Disk for /u02 and /dev/sdb on Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x
<NOTE:1581583.1> - How to Configure a Server Disk After Replacement as an HDFS Disk or Oracle NoSQL Database Disk on Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x

Attachments

This solution has no attachment