Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-79-1581338.1
Update Date:2018-04-04
Keywords:

Solution Type  Predictive Self-Healing Sure

Solution  1581338.1 :   How to Configure a Server Disk After Disk Replacement as an Operating System Disk for /u01 and /dev/sda on Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x  


Related Items
  • Big Data Appliance X3-2 In-Rack Expansion
  •  
  • Big Data Appliance X6-2 Hardware
  •  
  • Big Data Appliance X5-2 Starter Rack
  •  
  • Big Data Appliance X4-2 Starter Rack
  •  
  • Big Data Appliance X5-2 Hardware
  •  
  • Big Data Appliance Hardware
  •  
  • Big Data Appliance X5-2 In-Rack Expansion
  •  
  • Big Data Appliance X3-2 Full Rack
  •  
  • Big Data Appliance X4-2 In-Rack Expansion
  •  
  • Big Data Appliance X4-2 Full Rack
  •  
  • Big Data Appliance X4-2 Hardware
  •  
  • Big Data Appliance X3-2 Hardware
  •  
  • Big Data Appliance X5-2 Full Rack
  •  
  • Big Data Appliance X3-2 Starter Rack
  •  
Related Categories
  • PLA-Support>Eng Systems>BDA>Big Data Appliance>DB: BDA_EST
  •  




In this Document
Purpose
Scope
Details
 Overview
 About Disk Drive Identifiers
 Disk Drive Identifiers
 Standard Mount Points
 Configuring an Operating System Disk
 Partitioning the Operating System Disk
 Repairing the RAID Arrays
 Formatting the HDFS Partition of an Operating System Disk
 Restoring the Swap Partition
 Restoring the GRUB Master Boot Records and HBA Boot Order
 Verifying the Disk Configuration
 What If Firmware Warnings or Errors occur?
 What If a Server Fails to Restart?
References


Applies to:

Big Data Appliance X5-2 Full Rack - Version All Versions to All Versions [Release All Releases]
Big Data Appliance X3-2 Starter Rack - Version All Versions to All Versions [Release All Releases]
Big Data Appliance X3-2 In-Rack Expansion - Version All Versions to All Versions [Release All Releases]
Big Data Appliance X4-2 In-Rack Expansion - Version All Versions to All Versions [Release All Releases]
Big Data Appliance X4-2 Full Rack - Version All Versions to All Versions [Release All Releases]
Linux x86-64

Purpose

The document will describe the steps for configuring a server's disk drive as an Operating System Disk for the /u01 and /dev/sda disk in Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x.

Scope

This document is to be used by anyone who is configuring the disk. If attempting the steps and further assistance is needed please log a service request to contact support for help.

Details

Overview

Failure of a disk is never catastrophic on Oracle Big Data Appliance. No user data should be lost. Data stored in HDFS or Oracle NoSQL Database is automatically replicated.

The following are the basic steps for replacing a server disk drive and configuring it as an Operating System Disk:

1. Replace the failed disk drive.

2. Perform the basic configuration steps for the new disk. If multiple disks are unconfigured, then configure them in order from the lowest to the highest slot number. Finish all the steps for one disk and then start with all the steps for the next.

3. Identify the dedicated function of the failed disk, either as an HDFS disk, an operating system disk, or an Oracle NoSQL Database disk.

The steps for 1, 2, and 3 are listed in "Steps for Replacing a Disk Drive and Determining its Function on the Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x (Doc ID 1581331.1)."

4. Configure the disk for the operating system.

5. Verify that the configuration is correct. 

The steps for 4 and 5 are listed here in this document.
See the following tables to identify the function of the drive, the slot number, and the mount point which will be used later in the procedure.

 

About Disk Drive Identifiers

The Oracle Big Data Appliance includes a disk enclosure cage that holds 12 disk drives and is controlled by the HBA (Host Bus Adapter). The drives in this enclosure are identified by slot numbers (0..11) and can have different purposes, for example the drives in slot 0 and 1 have a raid 1 OS and boot partitions. The drives can be dedicated to specific functions, as shown in Table 1.

Version 2 of the image introduces new device symbolic links in /dev/disk/by_hba_slot/. The links refer the physical location or slot number of a disk inside the disk enclosure cage. The links are of the form of s<n>p<m>, where n is the slot number and m is the partition number. For example in an unaltered system the /dev/disk/by_hba_slot/s0p1 corresponds to /dev/sda1, ..s0p4 to ..sda4, ..s1p1 to sdb1 etc, and disk /dev/sda itself corresponds to /dev/by-hba-slot/s0, ..sdb to ..s1 etc.

When a disk is hot swapped, the operating system cannot reuse the kernel device name. Instead, it allocates a new device name. For example if /dev/sda was hot swapped then the disk corresponding /dev/disk/by-hba-slot/s0 may link to /dev/sdn instead of /dev/sda. Therefore, the links in /dev/disk/by-hba-slot/ are automatically updated (as part of udev rules) when devices are added or removed. Hence we prefer to use the symbolic device links in /dev/disk/by-hba-slot in configuration and recovery procedures.

  • Slot number device names in /dev/disk/by-hba-slot/ are based on virtual disk numbers as exposed by the HBA. The physical position slot number is specified when creating the virtual drive using the HBA cli MegaCli64. The base image creates virtual disks with corresponding physical slot positions.
  • Although symbolic device names can be used in OS commands, these device names are resolved to kernel sysf device names and output of commands may list device names as kernel device names. Thus /dev/disk/by-hba-slot/s0 can be listed as /sys/block/sda.
       

In the rest of the document we will refer to disk or partition by its symbolic name in /dev/disk/by-hba-slot/. Thus s0, s0p4 etc.

Note the command output may list device names as kernel device names instead of symbolic link names. Thus, /dev/disk/by-hba-slot/s0 might be identified as /sys/block/sda in the output of a command.

Disk Drive Identifiers

The following table (Table 1) shows the mappings between the RAID logical drives and the probable initial kernel device names, and the dedicated function of each drive in an Oracle Big Data Appliance server.
This information will be used in a later procedure of partioning the disk for it's appropriate function so please note which mapping is applicable for the disk drive that is being replaced.

Table 1 - Disk Drive Identifiers

Physical SlotSymbolic Link to Physical SlotInitial Operating System LocationDedicated Function
0 /dev/disk/by-hba-slot/s0 /dev/sda Operating system
1 /dev/disk/by-hba-slot/s1 /dev/sdb Operating system
2 /dev/disk/by-hba-slot/s2 /dev/sdc HDFS or Oracle NoSQL Database
3 /dev/disk/by-hba-slot/s3 /dev/sdd HDFS or Oracle NoSQL Database
4 /dev/disk/by-hba-slot/s4 /dev/sde HDFS or Oracle NoSQL Database
5 /dev/disk/by-hba-slot/s5 /dev/sdf HDFS or Oracle NoSQL Database
6 /dev/disk/by-hba-slot/s6 /dev/sdg HDFS or Oracle NoSQL Database
7 /dev/disk/by-hba-slot/s7 /dev/sdh HDFS or Oracle NoSQL Database
8 /dev/disk/by-hba-slot/s8 /dev/sdi HDFS or Oracle NoSQL Database
9 /dev/disk/by-hba-slot/s9 /dev/sdj HDFS or Oracle NoSQL Database
10 /dev/disk/by-hba-slot/s10 /dev/sdk HDFS or Oracle NoSQL Database
11 /dev/disk/by-hba-slot/s11 /dev/sdl HDFS or Oracle NoSQL Database

 

Standard Mount Points

The following table (Table 2) shows the mappings between HDFS partitions and mount points. This information will be used in the later procedure so please note which mapping is applicable for the disk drive that is being replaced.

Table 2 - Mount Points

Physical SlotSymbolic Link to Physical Slot and PartitionHDFS PartitionMount Point
0 /dev/disk/by-hba-slot/s0p4 /dev/sda4 /u01
1 /dev/disk/by-hba-slot/s1p4 /dev/sdb4 /u02
2 /dev/disk/by-hba-slot/s2p1 /dev/sdc1 /u03
3 /dev/disk/by-hba-slot/s3p1 /dev/sdd1 /u04
4 /dev/disk/by-hba-slot/s4p1 /dev/sde1 /u05
5 /dev/disk/by-hba-slot/s5p1 /dev/sdf1 /u06
6 /dev/disk/by-hba-slot/s6p1 /dev/sdg1 /u07
7 /dev/disk/by-hba-slot/s7p1 /dev/sdh1 /u08
8 /dev/disk/by-hba-slot/s8p1 /dev/sdi1 /u09
9 /dev/disk/by-hba-slot/s9p1 /dev/sdj1 /u10
10 /dev/disk/by-hba-slot/s10p1 /dev/sdk1 /u11
11 /dev/disk/by-hba-slot/s11p1 /dev/sdl1 /u12

 

Note: mount, umount, reboot and many of the commands require root so the recommendation is to run the entire procedure as root.

 

Note: The code examples provided here are based on replacing/dev/disk/by-hba-slot/s0 == /dev/sda == /dev/disk/by-hba-slot/s0p4 == /dev/sda4 == /u01These 4 mappings for example are an easy way to set up the information that will be needed throughout the procedure. It is best to figure out the mapping and write it down to use in the procedure. For example, slot # is one less that mount point. All disks replacements will vary so please replace the examples with the proper information in regards to the disk replacement being done.

Helpful Tips: You can re-confirm the relationship among the disk slot number, the current kernel device name and mount point as follows:

1. Re-confirm the relationship between slot number and the current kernel device name using "lsscsi" command.
The lsscsi command shows the slot number X as "[0.2:X:0]". For example [0:2:0:0] means the slot number 0, [0:2:11:0] means the slot number 11.

# lsscsi
[0:0:20:0]   enclosu ORACLE   CONCORD14        0d03  -       
[0:2:0:0]    disk    LSI      MR9261-8i        2.13  /dev/sda
[0:2:1:0]    disk    LSI      MR9261-8i        2.13  /dev/sdb
[0:2:2:0]    disk    LSI      MR9261-8i        2.13  /dev/sdc
[0:2:3:0]    disk    LSI      MR9261-8i        2.13  /dev/sdd
[0:2:4:0]    disk    LSI      MR9261-8i        2.13  /dev/sdn

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[0:2:5:0]    disk    LSI      MR9261-8i        2.13  /dev/sdf
[0:2:6:0]    disk    LSI      MR9261-8i        2.13  /dev/sdg
[0:2:7:0]    disk    LSI      MR9261-8i        2.13  /dev/sdh
[0:2:8:0]    disk    LSI      MR9261-8i        2.13  /dev/sdi
[0:2:9:0]    disk    LSI      MR9261-8i        2.13  /dev/sdj
[0:2:10:0]   disk    LSI      MR9261-8i        2.13  /dev/sdk
[0:2:11:0]   disk    LSI      MR9261-8i        2.13  /dev/sdl
[7:0:0:0]    disk    ORACLE   UNIGEN-UFD       PMAP  /dev/sdm  

In this case, you can re-confirm the following:

    slot 0:   /dev/sda
    slot 1:   /dev/sdb
    slot 2:   /dev/sdc
    slot 3:   /dev/sdd
    slot 4:   /dev/sdn
    slot 5:   /dev/sdf
    slot 6:   /dev/sdg
    slot 7:   /dev/sdh
    slot 8:   /dev/sdi
    slot 9:   /dev/sdj
    slot 10: /dev/sdk
    slot 11: /dev/sdl


2. Re-confirm the relationship between the current kernel device name and mount point using "mount" command.

# mount -l | grep /u
/dev/sda4 on /u01 type ext4 (rw,nodev,noatime) [/u01]
/dev/sdb4 on /u02 type ext4 (rw,nodev,noatime) [/u02]
/dev/sdc1 on /u03 type ext4 (rw,nodev,noatime) [/u03]
/dev/sdd1 on /u04 type ext4 (rw,nodev,noatime) [/u04]
/dev/sdn1 on /u05 type ext4 (rw,nodev,noatime) [/u05]

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
/dev/sdf1 on /u06 type ext4 (rw,nodev,noatime) [/u06]
/dev/sdg1 on /u07 type ext4 (rw,nodev,noatime) [/u07]
/dev/sdh1 on /u08 type ext4 (rw,nodev,noatime) [/u08]
/dev/sdi1 on /u09 type ext4 (rw,nodev,noatime) [/u09]
/dev/sdj1 on /u10 type ext4 (rw,nodev,noatime) [/u10]
/dev/sdk1 on /u11 type ext4 (rw,nodev,noatime) [/u11]
/dev/sdl1 on /u12 type ext4 (rw,nodev,noatime) [/u12]

In this case, you can re-confirm the followings:

    /dev/sda:   /u01
    /dev/sdb:   /u02
    /dev/sdc:    /u03
    /dev/sdd:   /u04
    /dev/sdn:   /u05
    /dev/sdf:    /u06
    /dev/sdg:   /u07
    /dev/sdh:   /u08
    /dev/sdi:    /u09
    /dev/sdj:    /u10
    /dev/sdk:   /u11
    /dev/sdl:    /u12

3. From the outputs above, you can re-confirm the relationship among them as follows:

    slot 0:  /dev/sda: /u01
    slot 1:  /dev/sdb: /u02
    slot 2:  /dev/sdc: /u03
    slot 3:  /dev/sdd: /u04
    slot 4:  /dev/sde: /u05

    ^^^^^^^^^^^^^^^
    slot 5:  /dev/sdf: /u06
    slot 6:  /dev/sdg: /u07
    slot 7:  /dev/sdh: /u08
    slot 8:  /dev/sdi: /u09
    slot 9:  /dev/sdj: /u10
    slot 10: /dev/sdk: /u11
    slot 11: /dev/sdl /u012

Configuring an Operating System Disk

The first two disks support the Linux operating system. These disks store a copy of the mirrored operating system, a swap partition, a mirrored boot partition, and an HDFS data partition. To configure an operating system disk, you must copy the partition table from the surviving disk, create an HDFS partition (ext4 file system), and add the software raid partitions and boot partitions for the operating system.

Complete these procedures after replacing logical drive 0 (/dev/sda). For logical drive 1 (/dev/sdb) follow procedure in "How to Configure a Server Disk After Disk Replacement as an Operating System Disk /u02 and /dev/sdb on Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x (Doc ID 1581373.1)."

If multiple disks are unconfigured, then configure them in order from the lowest to the highest slot number. Finish all the steps for one disk and then start with all the steps for the next.

1. Partitioning the Operating System Disk

2. Repairing the RAID Arrays

3. Formatting the HDFS Partition of an Operating System Disk

4. Restoring the Swap Partition

5. Restoring the GRUB Master Boot Records

Partitioning the Operating System Disk


To partition a logical drive:

1.   Complete the steps in the following note prior to running these steps - "Steps for Replacing a Disk Drive and Determining its Function on the Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x (Doc ID 1581331.1)."

2.   Confirm that there is no existing partition table on the new disk for /dev/disk/by-hba-slot/s0:   

# parted /dev/disk/by-hba-slot/s0 -s print

You should see a message about a missing partition table.

OL6 Example which does not show a missing partition table:

# parted /dev/disk/by-hba-slot/s0 -s print
Model: LSI MR9261-8i (scsi)
Disk /dev/sda: 1999GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start     End      Size     File system             Name    Flags
1          1049kB  200MB  199MB   ext4                    primary boot
2           200MB  500GB   500GB                             primary raid
3           500GB  540GB   39.5GB  linux-swap(v1)      primary
4           540GB 1999GB 1459GB   ext4                   primary

  

OL5 Example which does not show a missing partition table:

# parted /dev/disk/by-hba-slot/s0 -s print

Model: LSI MR9261-8i (scsi)
Disk /dev/sda: 1999GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start   End     Size    File system  Name     Flags
 1      17.4kB  210MB   210MB   ext3                     raid
 2      210MB   178GB   178GB   ext3                     raid
 3      178GB   191GB   12.6GB  linux-swap
 4      191GB   1999GB  1808GB  ext3       primary


3.  The partition table needs to be cleared in the following cases:

a) If the partition table displays a partition table, then clear it.

or

b) If the output shows "Error: msdos labels do not support devices that have more than 4294967295 sectors.", then clear it. For example:  

# parted /dev/disk/by-hba-slot/s0 -s print

Error: msdos labels do not support devices that have more than 4294967295 sectors. 

or

c) If the output shows "Error: Both the primary and backup GPT tables are corrupt.", then clear it. 

# parted /dev/disk/by-hba-slot/s0 -s print
  
Error: Both the primary and backup GPT tables are corrupt.
Try making a fresh table, and using Parted's rescue feature to recover partitions. 

In any of the above, clear the partition table:

# dd if=/dev/zero of=/dev/disk/by-hba-slot/s0 bs=1M count=100

Example output clearing the partition table on /dev/disk/by-hba-slot/s0:

# dd if=/dev/zero of=/dev/disk/by-hba-slot/s0 bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 0.0511396 seconds, 2.1 GB/s

 

Note: You can use the "dd if=/dev/zero of=/dev/disk/by-hba-slot/s0 bs=1M count=100" command to restart an operating system disk configuration, if you make a mistake.

Example output reissuing the command from step 2 after clearing the partition table:

# parted /dev/disk/by-hba-slot/s0 -s print
Error: Unable to open /dev/sda - unrecognised disk label.

 

Also note in the case of the error: "Error: Both the primary and backup GPT tables are corrupt.", GPT may write backup tables to the end of the disk. In these  rare cases the end of the disk needs to be zeroed out as well.

If needed try:

# zero out last 100 MB of a disk
dd bs=512 if=/dev/zero of=/dev/sda count=2048 seek=$(('blockdev --getsz /dev/sda' - 2048))

Which comes from: http://unix.stackexchange.com/questions/13848/wipe-last-1mb-of-a-hard-drive


4.  Create the partition table on s0 using parted /dev/disk/by-hba-slot/s0 -s mklabel gpt print:

# parted /dev/disk/by-hba-slot/s0 -s mklabel gpt print

Example:

# parted /dev/disk/by-hba-slot/s0 -s mklabel gpt print

Model: LSI MR9261-8i (scsi)
Disk /dev/sda: 1999GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number  Start  End  Size  File system  Name  Flags


5.  To partition the OS disk make sure to have corresponding partitions of the surviving disk of the raid arrays. Do this by using CHS size formats. Print the CHS partition info of the surviving disk /dev/disk/by-hba-slot/s1.

List the Cylinder, Head, Sector (CHS) partition information of the surviving disk. Thus, if you are partitioning /dev/disk/by-hba-slot/s0, then enter /dev/disk/by-hba-slot/s1 in the following command:

# parted /dev/disk/by-hba-slot/s1 -s unit chs print

OL6 Example output using /dev/disk/by-hba-slot/s1 since the surviving disk is /dev/disk/by-hba-slot/s1: 

# parted /dev/disk/by-hba-slot/s1 -s unit chs print
  
Model: LSI MR9261-8i (scsi)
Disk /dev/sdb: 486305,152,54
Sector size (logical/physical): 512B/4096B
BIOS cylinder,head,sector geometry: 486305,255,63.  Each cylinder is 8225kB.
Partition Table: gpt

Number  Start           End                 File system          Name     Flags
 1      0,32,32           24,89,0             ext4                  primary   boot
 2      24,89,1           60812,135,58                             primary   raid
 3      60812,135,59  68082,213,34    linux-swap(v1)    primary
 4      68082,213,35  486305,120,22  ext4                   primary

  

OL5 Example output using /dev/disk/by-hba-slot/s1 since the surviving disk is /dev/disk/by-hba-slot/s1:

# parted /dev/disk/by-hba-slot/s1 -s unit chs print
 
Model: LSI MR9261-8i (scsi)
Disk /dev/sdb: 243031,30,6
Sector size (logical/physical): 512B/512B
BIOS cylinder,head,sector geometry: 243031,255,63.  Each cylinder is 8225kB.
Partition Table: gpt

Number  Start         End           File system  Name     Flags
 1      0,0,34             25,127,7          ext3                  raid
 2      25,127,8         21697,116,20    ext3                  raid
 3      21697,116,21  23227,61,35     linux-swap
 4      23227,61,36    243031,29,36    ext3                 primary


6. Create partitions 1 to 3 on /dev/disk/by-hba-slot/s0 by duplicating the partitions of the surviving disk /dev/sdb. Issue three commands in this format for /dev/disk/by-hba-slot/s0 for the disk you have replaced and file_system, start, and end with the correct values:

On OL6:

# parted /dev/disk/by-hba-slot/s0 -s mkpart file_system start end

Use the file_system, start, and end addresses that you obtained in Step 5 instead of the addresses shown in the following example:   

# parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext4 0,32,32 24,89,0
# parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext4 24,89,1 60812,135,58
# parted /dev/disk/by-hba-slot/s0 -s mkpart primary linux-swap 60812,135,59 68082,213,34

 
Example output for /dev/disk/by-hba-slot/s0 with the values found for the surviving disk /dev/disk/by-hba-slot/s1 that were replaced for file_system, start, and end:

# parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext4 0,32,32 24,89,0
# parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext4 24,89,1 60812,135,58
# parted /dev/disk/by-hba-slot/s0 -s mkpart primary linux-swap 60812,135,59 68082,213,34

On OL5:

# parted /dev/disk/by-hba-slot/s0 -s mkpart file_system start end

Use the file_system, start, and end addresses that you obtained in Step 5 instead of the addresses shown in the following example:   

# parted /dev/disk/by-hba-slot/s0 -s mkpart ext3 0,0,34 25,127,7
# parted /dev/disk/by-hba-slot/s0 -s mkpart ext3 25,127,8 21697,116,20
# parted /dev/disk/by-hba-slot/s0 -s mkpart linux-swap 21697,116,21 23227,61,35

 
Example output for /dev/disk/by-hba-slot/s0 with the values found for the surviving disk /dev/disk/by-hba-slot/s1 that were replaced for file_system, start, and end:

# parted /dev/disk/by-hba-slot/s0 -s mkpart  ext3 0,0,34 25,127,7
# parted /dev/disk/by-hba-slot/s0 -s mkpart  ext3 25,127,8 21697,116,20
# parted /dev/disk/by-hba-slot/s0 -s mkpart  linux-swap 21697,116,21 23227,61,35

7.  Create primary partition 4 using the start address obtained in Step 5 and an end address of 100% for /dev/disk/by-hba-slot/s0 for the disk you have replaced:

On OL6:

# parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext4 68082,213,35 100%

Example creating partition on /dev/disk/by-hba-slot/s0:

# parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext4 68082,213,35 100%

On OL5:

# parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext3 23227,61,36 100%

Example creating partition on /dev/disk/by-hba-slot/s0:

# parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext3 23227,61,36 100%

Partition 4 is an HDFS data partition so make it as big as possible. Other partitions please use exact chs info from surviving disk.The bda hardware check (bdacheckhw) checks for partition names and flags. Therefore we also need to clear the name and set the raid flags. Setting the name to empty can only be done in single command mode.

Note: in the case of output like below, where there is a gap between the third and fourth partition e.g. a gap from partition 3: 23227,61,35 and partition 4: 25531,9,31:

# parted /dev/disk/by-hba-slot/s1 -s unit chs print 

Model: LSI MR9261-8i (scsi)
Disk /dev/sda: 1999GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt

Number Start End Size File system Name Flags
1 0,0,34 25,127,7 ext3 raid

2 25,127,8 21697,116,20 ext3 raid

3 21697,116,21 23227,61,35 linux-swap

4 25531,9,31 364729,25,62 ext3 primary

Create the primary partition 4 using the output above:

On OL6:

# parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext4 25531,9,31 100% 

On OL5:

# parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext3 25531,9,31 100% 

Its best to use the information from the other OS disk, so in this case 25531,9,31. This may be a more optimal chs boundary that was chosen by parted when the drive was partitioned using MB sizes/offsets. 

8.  Set the RAID flags for /dev/disk/by-hba-slot/s0 for the disk you have replaced:

# parted -s /dev/disk/by-hba-slot/s0 set 1 raid
# parted -s /dev/disk/by-hba-slot/s0 set 2 raid

Example setting the RAID flags on /dev/disk/by-hba-slot/s0:

# parted -s /dev/disk/by-hba-slot/s0 set 1 raid
# parted -s /dev/disk/by-hba-slot/s0 set 2 raid

 

9. For OL6 ONLY set the boot flag on the first partition: (Not for OL5)

# parted -s /dev/disk/by-hba-slot/s0 set 1 boot

 
10. For OL5 ONLY Clear the names, using parted in interactive mode for /dev/disk/by-hba-slot/s0 for the disk you have replaced:

# parted /dev/disk/by-hba-slot/s0

Example output clearing the names on /dev/disk/by-hba-slot/s0:

# parted /dev/disk/by-hba-slot/s0
GNU Parted 1.8.1
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted)

Once in the shell then type the following to clear the names. Where parted is the parted prompt, don't type it:

name 1 " "
name 2 " "
name 3 " "
quit

 Example showing (parted) which is the parted prompt:

(parted) name 1 " "
(parted) name 2 " "
(parted) name 3 " "
(parted) quit
Information: Don't forget to update /etc/fstab, if necessary.


11. Check if the partitions are setup correctly.

# parted /dev/disk/by-hba-slot/s0 -s unit chs print 

OL6 Example output for /dev/disk/by-hba-slot/s0 -s unit chs print.  Note you must verify that the boot flag is set on OL6.

# parted /dev/disk/by-hba-slot/s0 -s unit chs print
  
Model: LSI MR9261-8i (scsi)
Disk /dev/sda: 486305,152,54
Sector size (logical/physical): 512B/4096B
BIOS cylinder,head,sector geometry: 486305,255,63.  Each cylinder is 8225kB.
Partition Table: gpt

Number  Start             End                 File system         Name     Flags
 1      0,32,32           24,89,0            ext4                  primary    boot
 2      24,89,1           60812,135,58                            primary    raid
 3      60812,135,59  68082,213,34   linux-swap(v1)    primary
 4      68082,213,35  486305,120,22  ext4                  primary

OL5 Example output for /dev/disk/by-hba-slot/s0

# parted /dev/disk/by-hba-slot/s0 -s unit chs print
  

Model: LSI MR9261-8i (scsi)
Disk /dev/sda: 243031,30,6
Sector size (logical/physical): 512B/512B
BIOS cylinder,head,sector geometry: 243031,255,63.  Each cylinder is 8225kB.
Partition Table: gpt

Number  Start             End             File system    Name     Flags
 1      0,0,34            25,127,7          ext3                          raid
 2      25,127,8         21697,116,20  ext3                          raid
 3      21697,116,21  23227,61,35    linux-swap
 4      23227,61,36   243031,29,36   ext3           primary

12. Complete the steps in the next section titled "Repairing the RAID Arrays."

Repairing the RAID Arrays

After partitioning the disks, repair the the two logical RAID arrays. There are two md arrays: /dev/md0 and /dev/md2

    /dev/md0 is made up of /dev/disk/by-hba-slot/s0p1 and /dev/disk/by-hba-slot/s1p1 and is mounted as /boot.

    /dev/md2 is made up of /dev/disk/by-hba-slot/s0p2 and /dev/disk/by-hba-slot/s1p2 and is mounted as / (root).

Caution: Do not dismount the /dev/md devices as this will bring the system down.

To repair the RAID arrays issue a series of mdadm commands in pairs as there are two arrays md0 and md2. For each partition first mark the partition failed, remove it, and add back in. 

Note: mdadm options frequently start with two dashes. All the mdadm command options listed in this section start with two dashes ("--").

 

1.  Assuming the faulty drives have been replaced issue the following commands to remove the partitions from the RAID arrays: 

# mdadm /dev/md0 -r detached
# mdadm /dev/md2 -r detached 

Example:

# mdadm /dev/md0 -r detached
# mdadm /dev/md2 -r detached

 

Note: Only if the faulty disk has NOT been replaced yet but will be shortly then follow a) and b) below to fail and remove the disk drive. Otherwise it the disk has been replaced already then continue to step 2 below and do NOT follow steps a) and b).

 a) Mark the partitions as failed for /dev/disk/by-hba-slot/s0p1 and /dev/disk/by-hba-slot/s0p2: 

# mdadm --fail /dev/md0 /dev/disk/by-hba-slot/s0p1
# mdadm --fail /dev/md2 /dev/disk/by-hba-slot/s0p2

     Example output marking the partitions as failed on /dev/disk/by-hba-slot/s0p1 and /dev/disk/by-hba-slot/s0p2:

# mdadm --fail /dev/md0 /dev/disk/by-hba-slot/s0p1
mdadm: set device faulty failed for /dev/disk/by-hba-slot/s0p1:  No such device
# mdadm --fail /dev/md2 /dev/disk/by-hba-slot/s0p2
mdadm: set device faulty failed for /dev/disk/by-hba-slot/s0p2:  No such device

     You can ignore "No such device" messages in the mdadm commands.

 
b)  Remove the partitions from the RAID arrays for /dev/disk/by-hba-slot/s0p1 and /dev/disk/by-hba-slot/s0p2: 

# mdadm --remove /dev/md0 /dev/disk/by-hba-slot/s0p1
# mdadm --remove /dev/md2 /dev/disk/by-hba-slot/s0p2

 Example output when removing the partitions on  /dev/disk/by-hba-slot/s0p1 and /dev/disk/by-hba-slot/s0p2:

# mdadm --remove /dev/md0 /dev/disk/by-hba-slot/s0p1
mdadm: hot remove failed for /dev/disk/by-hba-slot/s0p1: No such device or address
# mdadm --remove /dev/md2 /dev/disk/by-hba-slot/s0p2
mdadm: hot remove failed for /dev/disk/by-hba-slot/s0p2: No such device or address

 
2.  Verify that the RAID arrays are degraded:  

# mdadm -Q --detail /dev/md0
# mdadm -Q --detail /dev/md2

Example output:

# mdadm -Q --detail /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Wed Feb  6 14:53:33 2013
     Raid Level : raid1
     Array Size : 204736 (199.97 MiB 209.65 MB)
  Used Dev Size : 204736 (199.97 MiB 209.65 MB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Mon Feb 25 13:37:43 2013
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 9f524e09:c75bfe13:4803c1e9:70ea81fd
         Events : 0.156

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       17        1      active sync   /dev/sdb1
# mdadm -Q --detail /dev/md2
/dev/md2:
        Version : 0.90
  Creation Time : Wed Feb  6 14:52:34 2013
     Raid Level : raid1
     Array Size : 174079936 (166.02 GiB 178.26 GB)
  Used Dev Size : 174079936 (166.02 GiB 178.26 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 2
    Persistence : Superblock is persistent

    Update Time : Mon Feb 25 13:55:12 2013
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : 7ae6e86e:69391270:9cdd6430:f7625f21
         Events : 0.1092

    Number   Major   Minor   RaidDevice State
       0       0        0        0      removed
       1       8       18        1      active sync   /dev/sdb2

 
3.  Verify that the degraded file for each array is set to 1:  

# cat /sys/block/md0/md/degraded

 Example output: 

# cat /sys/block/md0/md/degraded
1

 

# cat /sys/block/md2/md/degraded

Example output:

# cat /sys/block/md2/md/degraded
1


4. Restore the partitions to the RAID arrays for /dev/disk/by-hba-slot/s0p1 and /dev/disk/by-hba-slot/s0p2:   

# mdadm --add /dev/md0 /dev/disk/by-hba-slot/s0p1
# mdadm --add /dev/md2 /dev/disk/by-hba-slot/s0p2

Example output restoring the partitions on /dev/disk/by-hba-slot/s0p1 and /dev/disk/by-hba-slot/s0p2:

# mdadm --add /dev/md0 /dev/disk/by-hba-slot/s0p1
mdadm: re-added /dev/disk/by-hba-slot/s0p1
# mdadm --add /dev/md2 /dev/disk/by-hba-slot/s0p2
mdadm: re-added /dev/disk/by-hba-slot/s0p2

5.  Check that resynchronization is started, so that /dev/md[02] is in a state of recovery and not idle. Although you may see 'idle' if recovery goes too fast.

# cat /sys/block/md0/md/sync_action

 And

# cat /sys/block/md2/md/sync_action

 Example output:

# cat /sys/block/md2/md/sync_action
recover


6.  To verify that resynchronization is proceeding, you can monitor the mdstat file. A counter identifies the percentage complete.

# cat /proc/mdstat

 Example output which shows the percentage complete at 66.1%:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[0] sdb1[1]
      204736 blocks [2/2] [UU]

md2 : active raid1 sda2[2] sdb2[1]
      174079936 blocks [2/1] [_U]
      [=============>.......]  recovery = 66.1% (115206144/174079936) finish=11.1min speed=87842K/sec

unused devices: <none>

The following output shows that synchronization is complete:  

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[0] sdb1[1]
      204736 blocks [2/2] [UU]

md2 : active raid1 sda2[0] sdb2[1]
      174079936 blocks [2/2] [UU]

unused devices: <none>

7.  View the contents of /etc/mdadm.conf:  

# cat /etc/mdadm.conf

Example output:

# cat /etc/mdadm.conf

# mdadm.conf written out by anaconda
DEVICE partitions
MAILADDR root
ARRAY /dev/md2 level=raid1 num-devices=2 uuid=7ae6e86e:69391270:9cdd6430:f7625f21
ARRAY /dev/md0 level=raid1 num-devices=2 uuid=9f524e09:c75bfe13:4803c1e9:70ea81fd

8. Compare the output of the following command with the content of /etc/mdadm.conf from Step 7:  

# mdadm --examine --brief --scan --config=partitions

 In this example output the content of the mdadm command and /etc/mdadm.conf are the same.  No changes /etc/mdadm.conf are required:

# mdadm --examine --brief --scan --config=partitions
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=9f524e09:c75bfe13:4803c1e9:70ea81fd
ARRAY /dev/md2 level=raid1 num-devices=2 UUID=7ae6e86e:69391270:9cdd6430:f7625f21

9.  If the UUIDs in the lines for /dev/md0 and /dev/md2 in the file are different from the output of the mdadm command, then use a text editor to replace them with the output of the above mdadm command.

     a. Open /etc/mdadm.conf in a text editor.
 
     b. Select from ARRAY to the end of the file, and delete the selected lines.
 
     c. Copy the output of the command into the file where you deleted the old lines.
 
     d. Save the modified file and exit.

10.  Complete the steps in the next section titled "Formatting the HDFS Partition of an Operating System Disk."

Formatting the HDFS Partition of an Operating System Disk

Partition 4 (sda4 or sdb4) on an operating system disk is used for HDFS. After you format the partition and set the correct label, HDFS rebalances the job load to use the partition if the disk space is needed.

To format the HDFS partition:

1.  Format the HDFS partition as an ext4 file system using /dev/disk/by-hba-slot/s0p4 for the disk that was replaced:

# mkfs -t ext4 /dev/disk/by-hba-slot/s0p4

Example output formatting the HDFS partiton on /dev/disk/by-hba-slot/s0p4 as an ext4 file system:

# mkfs -t ext4 /dev/disk/by-hba-slot/s0p4
mkfs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
110354432 inodes, 441393655 blocks
22069682 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
13471 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
        102400000, 214990848

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 23 mounts or
180 days, whichever comes first.  Use tune4fs -c or -i to override.

 

Note: If this command fails because the device is mounted, then dismount the drive now as shown below and skip step 3 later.

Example showing command failing. 

# mkfs -t ext4 /dev/disk/by-hba-slot/s0p4
mkfs 1.41.12 (17-May-2010)
/dev/disk/by-hba-slot/s0p4 is mounted; will not make a filesystem here!

If this command fails i.e. if "mkfs -t ext4 /dev/disk/by-hba-slot/s0p4" fails, then dismount the device and repeat the command.

# umount /u01
# mkfs -t ext4 /dev/disk/by-hba-slot/s0p4

Example output from formatting the HDFS Partition on /dev/disk/by-hba-slot/s0p4 after dismounting /u01:

# umount /u01
# mkfs -t ext4 /dev/disk/by-hba-slot/s0p4
mkfs 1.41.12 (17-May-2010)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
110354432 inodes, 441393655 blocks
22069682 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
13471 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
        102400000, 214990848

Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 25 mounts or
180 days, whichever comes first.  Use tune4fs -c or -i to override.


2. Verify that the partition label /u01 for s0p4 is missing:

# ls -l /dev/disk/by-label

Example output when nothing is missing:

#  ls -l /dev/disk/by-label
total 0

lrwxrwxrwx 1 root root 10 Sep 9 18:57 BDAUSB -> ../../sdm1
lrwxrwxrwx 1 root root 10 Sep 9 19:08 SWAP-sda3 -> ../../sda3
lrwxrwxrwx 1 root root 10 Sep 9 18:57 SWAP-sdb3 -> ../../sdb3
lrwxrwxrwx 1 root root 10 Sep 9 18:57 u01 -> ../../sda4
lrwxrwxrwx 1 root root 10 Sep 9 18:57 u02 -> ../../sdb4
lrwxrwxrwx 1 root root 10 Sep 9 18:57 u03 -> ../../sdc1
lrwxrwxrwx 1 root root 10 Sep 9 18:57 u04 -> ../../sdd1
lrwxrwxrwx 1 root root 10 Sep 9 18:57 u05 -> ../../sde1
lrwxrwxrwx 1 root root 10 Sep 9 18:57 u06 -> ../../sdf1
lrwxrwxrwx 1 root root 10 Sep 9 18:57 u07 -> ../../sdg1
lrwxrwxrwx 1 root root 10 Sep 9 18:57 u08 -> ../../sdh1
lrwxrwxrwx 1 root root 10 Sep 9 18:57 u09 -> ../../sdi1
lrwxrwxrwx 1 root root 10 Sep 9 18:57 u10 -> ../../sdj1
lrwxrwxrwx 1 root root 10 Sep 9 18:57 u11 -> ../../sdk1
lrwxrwxrwx 1 root root 10 Sep 9 18:57 u12 -> ../../sdl1

3. Dismount (only if you did not do so in step 1 above) the appropriate HDFS partition, /u01 for /dev/sda:

# umount /u01

Example:

# umount /u01

Once the HDFS partition is formatted as an ext4 file system then don't dismount the drive again because the device is already dismounted. Use mount -l to check if the device mounted.

# mount -l

Example showing /u01 is not mounted:

# mount -l
/dev/md2 on / type ext3 (rw,noatime)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/md0 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)                                  <<<< /u01 umounted
/dev/sdb4 on /u02 type ext4 (rw,nodev,noatime) [/u02]
/dev/sdc1 on /u03 type ext4 (rw,nodev,noatime) [/u03]
/dev/sdd1 on /u04 type ext4 (rw,nodev,noatime) [/u04]
/dev/sde1 on /u05 type ext4 (rw,nodev,noatime) [/u05]
/dev/sdf1 on /u06 type ext4 (rw,nodev,noatime) [/u06]
/dev/sdg1 on /u07 type ext4 (rw,nodev,noatime) [/u07]
/dev/sdh1 on /u08 type ext4 (rw,nodev,noatime) [/u08]
/dev/sdi1 on /u09 type ext4 (rw,nodev,noatime) [/u09]
/dev/sdj1 on /u10 type ext4 (rw,nodev,noatime) [/u10]
/dev/sdk1 on /u11 type ext4 (rw,nodev,noatime) [/u11]
/dev/sdl1 on /u12 type ext4 (rw,nodev,noatime) [/u12]
fuse_dfs on /mnt/hdfs-nnmount type fuse.fuse_dfs (rw,nosuid,nodev,allow_other,allow_other,default_permissions)
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
nfsd on /proc/fs/nfsd type nfsd (rw)

 
4. 
Reset the partition label for /u01 and /dev/disk/by-hba-slot/s0p4:

(For OL6 use tune2fs.  For OL5 use tune4fs.)

On OL6:

# tune2fs -c -1 -i 0 -m 0.2 -L /u01 /dev/disk/by-hba-slot/s0p4

On OL5:

# tune4fs -c -1 -i 0 -m 0.2 -L /u01 /dev/disk/by-hba-slot/s0p4

Example on OL5 resetting the partition label on /u01 and /dev/disk/by-hba-slot/s0p4:

# tune4fs -c -1 -i 0 -m 0.2 -L /u01 /dev/disk/by-hba-slot/s0p4
tune4fs 1.41.12 (17-May-2010)
Setting maximal mount count to -1
Setting interval between checks to 0 seconds
Setting reserved blocks percentage to 0.2% (882787 blocks)

 
5.  Mount the HDFS partition for /u01 for the device you have partitioned:

# mount /u01

Example mounting /u01:

# mount /u01 

You can check to see if the device is mounted:

# mount -l

The following shows /u01 is mounted:

# mount -l
/dev/md2 on / type ext3 (rw,noatime)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw)
devpts on /dev/pts type devpts (rw,gid=5,mode=620)
/dev/md0 on /boot type ext3 (rw)
tmpfs on /dev/shm type tmpfs (rw)
/dev/sdb4 on /u02 type ext4 (rw,nodev,noatime) [/u02]
/dev/sdc1 on /u03 type ext4 (rw,nodev,noatime) [/u03]
/dev/sdd1 on /u04 type ext4 (rw,nodev,noatime) [/u04]
/dev/sde1 on /u05 type ext4 (rw,nodev,noatime) [/u05]
/dev/sdf1 on /u06 type ext4 (rw,nodev,noatime) [/u06]
/dev/sdg1 on /u07 type ext4 (rw,nodev,noatime) [/u07]
/dev/sdh1 on /u08 type ext4 (rw,nodev,noatime) [/u08]
/dev/sdi1 on /u09 type ext4 (rw,nodev,noatime) [/u09]
/dev/sdj1 on /u10 type ext4 (rw,nodev,noatime) [/u10]
/dev/sdk1 on /u11 type ext4 (rw,nodev,noatime) [/u11]
/dev/sdl1 on /u12 type ext4 (rw,nodev,noatime) [/u12]
none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
nfsd on /proc/fs/nfsd type nfsd (rw)
/dev/sda4 on /u01 type ext4 (rw,nodev,noatime) [/u01]

 
6.  Complete the steps in the next section titled "Restoring the Swap Partition."

Restoring the Swap Partition

After formatting the HDFS partition, you can restore the swap partition.

To restore the swap partition:

1. Set the swap label with SWAP-sda3 and /dev/disk/by-hba-slot/s0p3:

# mkswap -L SWAP-sda3 /dev/disk/by-hba-slot/s0p3

Example OL6 output setting the swap label on sda3 and /dev/disk/by-hba-slot/s0p3:

# mkswap -L SWAP-sda3 /dev/disk/by-hba-slot/s0p3
Setting up swapspace version 1, size = 38602748 KiB
LABEL=SWAP-sda3, UUID=88075e3f-ac3a-41e7-bb90-7d0ff9076eb9

  

Example OL5 output setting the swap label on sda3 and /dev/disk/by-hba-slot/s0p3:

# mkswap -L SWAP-sda3 /dev/disk/by-hba-slot/s0p3
Setting up swapspace version 1, size = 12582907 kB
LABEL=SWAP-sda3, no uuid

 
2.  Verify that the swap partition is restored:

# bdaswapon; bdaswapoff

Example output verifying the swap partition is restored:  

# bdaswapon; bdaswapoff
Filename                                Type            Size    Used    Priority
/dev/sda3                               partition       12287992        0      1
/dev/sdb3                               partition       12287992        0      1

 
3.
Verifiy the replaced disk is listed in 'ls -l /dev/disk/by-label' output. If the replaced disk is listed then please skip to  "Restoring the GRUB Master Boot Records and HBA Boot Order." section. But if the replaced disk is NOT listed then continue to next step (4).

# ls -l /dev/disk/by-label

4. Trigger kernel device uevents to replay missing events at system coldplug.

a) For Linux OS 5, execute below command

udevtrigger

b) For Linux OS 6, execute below command

udevadm trigger

Note:- With both commands --verbose option can be used to check what events are triggered

5. Verifiy the replaced disk is listed in 'ls -l /dev/disk/by-label' output.

6. Complete the steps in the next section titled "Restoring the GRUB Master Boot Records and HBA Boot Order."

Restoring the GRUB Master Boot Records and HBA Boot Order

After restoring the swap partition, you can restore the Grand Unified Bootloader (GRUB) master boot record and the HBA Boot Order.

To restore the GRUB boot record:

1. The device.map file maps the BIOS drives to operating system devices. The following is an example of a default device map file:

# more /boot/grub/device.map
# this device map was generated by anaconda
(hd0)     /dev/sda
(hd1)     /dev/sdb 

Unfortunately grub device map does not support symbolic links. Thus (hd0) is mapped to /dev/sda , (hd1) is mapped to /dev/sdb in the map file and not by /dev/disk/by-hba-slot.

For this reason one needs to edit /boot/grub/device.map such that hd0/hd1 resolves to the right kernel device name.

a) Execute below command to check which kernel device the drive is using in slot0

# ls -ld /dev/disk/by-hba-slot/s0

Sample output when device name is sda

lrwxrwxrwx 1 root root 9 Apr 24 14:05 /dev/disk/by-hba-slot/s0 -> ../../sda

If device name is ../../sda then jump to step2(Open GRUB) . But if device is mapped to some other name say ../../sdn then follow below steps to set hd0 to point to the new device name by following these steps ....

b) Make a copy of /boot/grub/device.map file

[root@bdanode01 ~]#  cd /boot/grub
[root@bdanode01 grub]# cp device.map mydevice.map
[root@bdanode01 grub]# ls -l *device* 

Sample output

-rw-r--r-- 1 root root 85 Apr 22 14:50 device.map
-rw-r--r-- 1 root root 85 Apr 24 09:24 mydevice.map 


c) Edit mydevice.map file to set hd0 point to the new device name

In this example /dev/sdn is the new device name in slot 0.

# more /boot/grub/mydevice.map
# this device map was generated by bda install
(hd0)     /dev/sdn
(hd1)     /dev/sdb


2. Open GRUB:

a) If device name in slot0 is /dev/sda then use default device.map file

# grub --device-map=/boot/grub/device.map

OR

b) If device name in slot0 is not /dev/sda then use the newly created mydevice.map file

# grub --device-map=/boot/grub/mydevice.map

Example output when using device.map file:

# grub --device-map=/boot/grub/device.map


    GNU GRUB  version 0.97  (640K lower / 3072K upper memory)

 [ Minimal BASH-like line editing is supported.  For the first word, TAB
   lists possible command completions.  Anywhere else TAB lists the possible
   completions of a device/filename.]
grub>

 

3. Set the root device by entering hd0 for disk in slot0

grub> root (hd0,0)

Example setting the root device to hd0 for /dev/sda:  

grub> root (hd0,0)
root (hd0,0)
 Filesystem type is ext2fs, partition type 0x83

 
4. Install grub by entering hd0 for disk in slot0

grub> setup (hd0)

 Example installing grub on hd0 (/dev/sda):

grub> setup (hd0)
setup (hd0)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"... failed (this is not fatal)
 Running "embed /grub/e2fs_stage1_5 (hd0,0)"... failed (this is not fatal)
 Running "install /grub/stage1 (hd0) /grub/stage2 p /grub/grub.conf "... succeeded
Done.

 
5. Close the GRUB command-line interface:  

grub> quit

 Example output:

grub> quit
quit

6.Verify that the boot drive in the HBA is set correctly. If it is set correctly skip the next step, Step 7 and go on to Step 8. If it is not set correctly perform the next step, Step 7.

On BDA V4.3 and higher:

a) Verify that the BootDrive VD:0 is set as the boot drive in the HBA. 

MegaCli64 /c0 show bootdrive 

Example output when the BootDrive VD:0 is set as the boot drive in the HBA. In this case skip the next step, Step7 and go on to Step 8.

# MegaCli64 /c0 show bootdrive
Controller = 0
Status = Success
Description = None

Controller Properties :
=====================

----------------
Ctrl_Prop Value
----------------
BootDrive VD:0
----------------

 

b) Example output when the BootDrive VD:0 is NOT set as the boot drive in the HBA. In this case follow the next step, Step7.

i. In this example the BootDrive VD:0 is not set (Continue to Step 7).

# MegaCli64 /c0 show bootdrive
Controller = 0
Status = Success
Description = None

Controller Properties :
=====================

------------------------
Ctrl_Prop Value
------------------------
BootDrive No Boot Drive
------------------------

 

ii. You may also encounter the case where the BootDrive is set to VD:1.  In this case also continue to Step 7.

The standard default value for the boot drive in the HBA is VD0. If the boot drive in the HBA is set to VD1, set the boot drive to VD0 for consistency in Step 7.

On BDA V4.2 and lower:

Verify that logical drive L0 (letter L zero) is set as the boot drive in the HBA.

# MegaCli64 -AdpBootDrive -get a0

Example output when the logical drive L0 is set as the boot drive in the HBA. In this case skip the next step, Step7 and go on to Step 8.

# MegaCli64 -AdpBootDrive -get a0

Adapter 0: Boot Virtual Drive - #0 (target id - 0).

Exit Code: 0x00

Any other outout, continue to Step 7.  This includes the case when the BootDrive is not set and this includes the case when the BootDrive is set to VD:1. The standard default value for the boot drive in the HBA is VD0. If the boot drive in the HBA is set to VD1, therefore also set the boot drive to VD0 for consistency in Step 7.

7. Ensure that the Boot Drive is set correctly.  You only need to perform this step if the boot drive in the HBA is NOT set correctly as per the previous step.

On BDA V4.3 and higher:

If the 'MegaCli64 /c0 show bootdrive' command does not report that the boot drive is set in the HBA i.e. BootDrive VD:0 or that the BootDrive is VD:1 then issue the following command.

# MegaCli64 /c0/v0 set bootdrive=on

Example Output:

# MegaCli64 /c0/v0 set bootdrive=on
Controller = 0
Status = Success
Description = None

Detailed Status :
===============

-----------------------------------------
VD Property Value Status ErrCd ErrMsg
-----------------------------------------
0 Boot Drive On Success 0 -
-----------------------------------------

 Verify:

# MegaCli64 /c0 show bootdrive

  

On BDA V4.2 and lower:

If the 'MegaCli64 -AdpBootDrive -get a0' command does not report logical L0 or "Boot Virtual Drive - #0 (target id - 0)" then issue the following command.

# MegaCli64 AdpBootDrive set L0 a0

 Example output:

# MegaCli64 AdpBootDrive set L0 a0

Boot Virtual Drive is set to #0 (target id #0) on Adapter 0

Exit Code: 0x00

Verify:

# MegaCli64 -AdpBootDrive -get a0

  

8. Ensure the auto select boot drive feature is enabled.

# MegaCli64 adpBIOS EnblAutoSelectBootLd a0

Example output showing that the Auto Select Boot is already enabled:

# MegaCli64 adpBIOS EnblAutoSelectBootLd a0

Auto select Boot is already Enabled on Adapter 0.

Exit Code: 0x00

9. Check the configuration. See the section titled "Verifying the Disk Configuration."

Verifying the Disk Configuration

To verify the disk configuration:

1. Check the software configuration as root user:  

# bdachecksw

Example successful output from running bdachecksw:

# bdachecksw
SUCCESS: Correct OS disk s0 partition info : 1 ext3 raid 2 ext3 raid 3 linux-swap 4 ext3 primary
SUCCESS: Correct OS disk s1 partition info : 1 ext3 raid 2 ext3 raid 3 linux-swap 4 ext3 primary
SUCCESS: Correct data disk s2 partition info : 1 ext3 primary
SUCCESS: Correct data disk s3 partition info : 1 ext3 primary
SUCCESS: Correct data disk s4 partition info : 1 ext3 primary
SUCCESS: Correct data disk s5 partition info : 1 ext3 primary
SUCCESS: Correct data disk s6 partition info : 1 ext3 primary
SUCCESS: Correct data disk s7 partition info : 1 ext3 primary
SUCCESS: Correct data disk s8 partition info : 1 ext3 primary
SUCCESS: Correct data disk s9 partition info : 1 ext3 primary
SUCCESS: Correct data disk s10 partition info : 1 ext3 primary
SUCCESS: Correct data disk s11 partition info : 1 ext3 primary
SUCCESS: Correct software RAID info : /dev/md2 level=raid1 num-devices=2 /dev/md0 level=raid1 num-devices=2
SUCCESS: Correct mounted partitions : /dev/md0 /boot ext3 /dev/md2 / ext3 /dev/sd4 /u01 ext4 /dev/sd4 /u02 ext4 /dev/sd1 /u03 ext4 /dev/sd1 /u04 ext4 /dev/sd1 /u05 ext4 /dev/sd1 /u06 ext4 /dev/sd1 /u07 ext4 /dev/sd1 /u08 ext4 /dev/sd1 /u09 ext4 /dev/sd1 /u10 ext4 /dev/sd1 /u11 ext4 /dev/sd1 /u12 ext4
SUCCESS: Correct matching label and slot  : symbolic link to `../../sda4'
SUCCESS: Correct matching label and slot  : symbolic link to `../../sdb4'
SUCCESS: Correct matching label and slot  : symbolic link to `../../sdc1'
SUCCESS: Correct matching label and slot  : symbolic link to `../../sdd1'
SUCCESS: Correct matching label and slot  : symbolic link to `../../sde1'
SUCCESS: Correct matching label and slot  : symbolic link to `../../sdf1'
SUCCESS: Correct matching label and slot  : symbolic link to `../../sdg1'
SUCCESS: Correct matching label and slot  : symbolic link to `../../sdh1'
SUCCESS: Correct matching label and slot  : symbolic link to `../../sdi1'
SUCCESS: Correct matching label and slot  : symbolic link to `../../sdj1'
SUCCESS: Correct matching label and slot  : symbolic link to `../../sdk1'
SUCCESS: Correct matching label and slot  : symbolic link to `../../sdl1'
SUCCESS: Correct swap partition on /dev/disk/by-hba-slot/s0p3 : SWAP
SUCCESS: Correct swap partition on /dev/disk/by-hba-slot/s1p3 : SWAP
SUCCESS: Correct internal USB device (sdm) : 1
SUCCESS: Correct internal USB partitions : 1 primary ext3
SUCCESS: Correct internal USB ext3 partition check : clean
SUCCESS: Correct Linux kernel version : Linux 2.6.32-200.21.1.el5uek
SUCCESS: Correct Java Virtual Machine version : HotSpot(TM) 64-Bit Server 1.6.0_51
SUCCESS: Correct puppet version : 2.6.11
SUCCESS: Correct MySQL version : 5.5.17
SUCCESS: All required programs are accessible in $PATH
SUCCESS: All required RPMs are installed and valid
SUCCESS: Correct bda-monitor status : bda monitor is running
SUCCESS: Big Data Appliance software validation checks succeeded

2. If there are errors, then redo the configuration steps as necessary to correct the problem.

If error like below occurs i.e replaced disk partition is listed at the end and all partitions are recognized then this error can be ignored and is caused due to Bug 17899101 in bdachecksw script.

ERROR: Wrong mounted partitions : /dev/md0 /boot ext3 /dev/md2 / ext3 /dev/sd4 /u01 ext4 /dev/sd1 /u03 ext4 /dev/sd1 /u04 ext4 /dev/sd1 /u05 ext4 /dev/sd1 /u06 ext4 /dev/sd1 /u07 ext4 /dev/sd1 /u08 ext4 /dev/sd1 /u09 ext4 /dev/sd1 /u10 ext4 /dev/sd1 /u11 ext4 /dev/sd1 /u12 ext4  /dev/sd4 /u02 ext4
INFO: Expected mounted partitions : 12 data partitions, /boot and /

Bug 17899101 is fixed in V2.4 release of BDA.

Patch 17924936 contains one-off patch for BUG 17899101 to V2.3.1 release of BDA.

Patch 17924887 contains one-off patch for BUG 17899101 to V2.2.1 release of BDA.

Refer to the Readme file for instructions on how to apply the patch.  Readme file also contains un-install instructions as needed.

What If Firmware Warnings or Errors occur?

If the bdacheckhw utility reports errors / warnings with regards to the HDD (Hard Disk Drive) Firmware Information indicating that the HDD firmware needs to be updated follow the instructions in "Firmware Usage and Upgrade Information for BDA Software Managed Components on Oracle Big Data Appliance V2 [ID 1542871.1]”.

What If a Server Fails to Restart?

The server may restart during the disk replacement procedures, either because you issued a reboot command or made an error in a MegaCli64 command. In most cases, the server restarts successfully, and you can continue working. However, in other cases, an error occurs so that you cannot reconnect using ssh. In this case, you must complete the reboot using Oracle ILOM.

To restart a server using Oracle ILOM:

1.  Use your browser to open a connection to the server using Oracle ILOM. For example:

    http://bda1node12-c.example.com
   

Note: Your browser must have a JDK plug-in installed. If you do not see the Java coffee cup on the log-in page, then you must install the plug-in before continuing.

2.  Log in using your Oracle ILOM credentials.

3.  Select the Remote Control tab.

4.  Click the Launch Remote Console button.

5.  Enter Ctrl+d to continue rebooting.

6.  If the reboot fails, then enter the server root password at the prompt and attempt to fix the problem.

7.  After the server restarts successfully, open the Redirection menu and choose Quit to close the console window.

See the following documentation for more information: Oracle Integrated Lights Out Manager (ILOM) 3.0 documentation at http://docs.oracle.com/cd/E19860-01/

 

References

<NOTE:1542871.1> - Firmware Usage and Upgrade Information for BDA Software Managed Components on Oracle Big Data Appliance
<NOTE:1581331.1> - Steps for Replacing a Disk Drive and Determining its Function on the Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x
<NOTE:1581373.1> - How to Configure a Server Disk After Disk Replacement as an Operating System Disk for /u02 and /dev/sdb on Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback