![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Solution Type Predictive Self-Healing Sure Solution 1581338.1 : How to Configure a Server Disk After Disk Replacement as an Operating System Disk for /u01 and /dev/sda on Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x
In this Document
Applies to:Big Data Appliance X5-2 Full Rack - Version All Versions to All Versions [Release All Releases]Big Data Appliance X3-2 Starter Rack - Version All Versions to All Versions [Release All Releases] Big Data Appliance X3-2 In-Rack Expansion - Version All Versions to All Versions [Release All Releases] Big Data Appliance X4-2 In-Rack Expansion - Version All Versions to All Versions [Release All Releases] Big Data Appliance X4-2 Full Rack - Version All Versions to All Versions [Release All Releases] Linux x86-64 PurposeThe document will describe the steps for configuring a server's disk drive as an Operating System Disk for the /u01 and /dev/sda disk in Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x. ScopeThis document is to be used by anyone who is configuring the disk. If attempting the steps and further assistance is needed please log a service request to contact support for help. DetailsOverviewFailure of a disk is never catastrophic on Oracle Big Data Appliance. No user data should be lost. Data stored in HDFS or Oracle NoSQL Database is automatically replicated. The following are the basic steps for replacing a server disk drive and configuring it as an Operating System Disk: 1. Replace the failed disk drive. 2. Perform the basic configuration steps for the new disk. If multiple disks are unconfigured, then configure them in order from the lowest to the highest slot number. Finish all the steps for one disk and then start with all the steps for the next. 3. Identify the dedicated function of the failed disk, either as an HDFS disk, an operating system disk, or an Oracle NoSQL Database disk. The steps for 1, 2, and 3 are listed in "Steps for Replacing a Disk Drive and Determining its Function on the Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x (Doc ID 1581331.1)."
4. Configure the disk for the operating system. 5. Verify that the configuration is correct. The steps for 4 and 5 are listed here in this document.
See the following tables to identify the function of the drive, the slot number, and the mount point which will be used later in the procedure.
About Disk Drive IdentifiersThe Oracle Big Data Appliance includes a disk enclosure cage that holds 12 disk drives and is controlled by the HBA (Host Bus Adapter). The drives in this enclosure are identified by slot numbers (0..11) and can have different purposes, for example the drives in slot 0 and 1 have a raid 1 OS and boot partitions. The drives can be dedicated to specific functions, as shown in Table 1.
In the rest of the document we will refer to disk or partition by its symbolic name in /dev/disk/by-hba-slot/. Thus s0, s0p4 etc. Note the command output may list device names as kernel device names instead of symbolic link names. Thus, /dev/disk/by-hba-slot/s0 might be identified as /sys/block/sda in the output of a command.
Disk Drive IdentifiersThe following table (Table 1) shows the mappings between the RAID logical drives and the probable initial kernel device names, and the dedicated function of each drive in an Oracle Big Data Appliance server. Table 1 - Disk Drive Identifiers
Standard Mount PointsThe following table (Table 2) shows the mappings between HDFS partitions and mount points. This information will be used in the later procedure so please note which mapping is applicable for the disk drive that is being replaced. Table 2 - Mount Points
Note: mount, umount, reboot and many of the commands require root so the recommendation is to run the entire procedure as root.
Note: The code examples provided here are based on replacing/dev/disk/by-hba-slot/s0 == /dev/sda == /dev/disk/by-hba-slot/s0p4 == /dev/sda4 == /u01. These 4 mappings for example are an easy way to set up the information that will be needed throughout the procedure. It is best to figure out the mapping and write it down to use in the procedure. For example, slot # is one less that mount point. All disks replacements will vary so please replace the examples with the proper information in regards to the disk replacement being done. Helpful Tips: You can re-confirm the relationship among the disk slot number, the current kernel device name and mount point as follows: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^ Configuring an Operating System DiskThe first two disks support the Linux operating system. These disks store a copy of the mirrored operating system, a swap partition, a mirrored boot partition, and an HDFS data partition. To configure an operating system disk, you must copy the partition table from the surviving disk, create an HDFS partition (ext4 file system), and add the software raid partitions and boot partitions for the operating system. If multiple disks are unconfigured, then configure them in order from the lowest to the highest slot number. Finish all the steps for one disk and then start with all the steps for the next. 3. Formatting the HDFS Partition of an Operating System Disk 4. Restoring the Swap Partition Partitioning the Operating System Disk
# parted /dev/disk/by-hba-slot/s0 -s print
You should see a message about a missing partition table. OL6 Example which does not show a missing partition table: # parted /dev/disk/by-hba-slot/s0 -s print Number Start End Size File system Name Flags
OL5 Example which does not show a missing partition table: # parted /dev/disk/by-hba-slot/s0 -s print
Model: LSI MR9261-8i (scsi) Disk /dev/sda: 1999GB Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags 1 17.4kB 210MB 210MB ext3 raid 2 210MB 178GB 178GB ext3 raid 3 178GB 191GB 12.6GB linux-swap 4 191GB 1999GB 1808GB ext3 primary
a) If the partition table displays a partition table, then clear it. or b) If the output shows "Error: msdos labels do not support devices that have more than 4294967295 sectors.", then clear it. For example: # parted /dev/disk/by-hba-slot/s0 -s print
Error: msdos labels do not support devices that have more than 4294967295 sectors. or c) If the output shows "Error: Both the primary and backup GPT tables are corrupt.", then clear it. # parted /dev/disk/by-hba-slot/s0 -s print
Error: Both the primary and backup GPT tables are corrupt. Try making a fresh table, and using Parted's rescue feature to recover partitions. In any of the above, clear the partition table: # dd if=/dev/zero of=/dev/disk/by-hba-slot/s0 bs=1M count=100
Example output clearing the partition table on /dev/disk/by-hba-slot/s0: # dd if=/dev/zero of=/dev/disk/by-hba-slot/s0 bs=1M count=100
100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 0.0511396 seconds, 2.1 GB/s
Note: You can use the "dd if=/dev/zero of=/dev/disk/by-hba-slot/s0 bs=1M count=100" command to restart an operating system disk configuration, if you make a mistake. Example output reissuing the command from step 2 after clearing the partition table: # parted /dev/disk/by-hba-slot/s0 -s print
Error: Unable to open /dev/sda - unrecognised disk label.
Also note in the case of the error: "Error: Both the primary and backup GPT tables are corrupt.", GPT may write backup tables to the end of the disk. In these rare cases the end of the disk needs to be zeroed out as well. If needed try: # zero out last 100 MB of a disk Which comes from: http://unix.stackexchange.com/questions/13848/wipe-last-1mb-of-a-hard-drive
# parted /dev/disk/by-hba-slot/s0 -s mklabel gpt print
Example: # parted /dev/disk/by-hba-slot/s0 -s mklabel gpt print
Model: LSI MR9261-8i (scsi) Disk /dev/sda: 1999GB Sector size (logical/physical): 512B/512B Partition Table: gpt Number Start End Size File system Name Flags
List the Cylinder, Head, Sector (CHS) partition information of the surviving disk. Thus, if you are partitioning /dev/disk/by-hba-slot/s0, then enter /dev/disk/by-hba-slot/s1 in the following command: # parted /dev/disk/by-hba-slot/s1 -s unit chs print
OL6 Example output using /dev/disk/by-hba-slot/s1 since the surviving disk is /dev/disk/by-hba-slot/s1: # parted /dev/disk/by-hba-slot/s1 -s unit chs print
Model: LSI MR9261-8i (scsi) Disk /dev/sdb: 486305,152,54 Sector size (logical/physical): 512B/4096B BIOS cylinder,head,sector geometry: 486305,255,63. Each cylinder is 8225kB. Partition Table: gpt Number Start End File system Name Flags 1 0,32,32 24,89,0 ext4 primary boot 2 24,89,1 60812,135,58 primary raid 3 60812,135,59 68082,213,34 linux-swap(v1) primary 4 68082,213,35 486305,120,22 ext4 primary OL5 Example output using /dev/disk/by-hba-slot/s1 since the surviving disk is /dev/disk/by-hba-slot/s1: # parted /dev/disk/by-hba-slot/s1 -s unit chs print
Model: LSI MR9261-8i (scsi) Disk /dev/sdb: 243031,30,6 Sector size (logical/physical): 512B/512B BIOS cylinder,head,sector geometry: 243031,255,63. Each cylinder is 8225kB. Partition Table: gpt Number Start End File system Name Flags 1 0,0,34 25,127,7 ext3 raid 2 25,127,8 21697,116,20 ext3 raid 3 21697,116,21 23227,61,35 linux-swap 4 23227,61,36 243031,29,36 ext3 primary
On OL6: # parted /dev/disk/by-hba-slot/s0 -s mkpart file_system start end
Use the file_system, start, and end addresses that you obtained in Step 5 instead of the addresses shown in the following example: # parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext4 0,32,32 24,89,0
# parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext4 24,89,1 60812,135,58 # parted /dev/disk/by-hba-slot/s0 -s mkpart primary linux-swap 60812,135,59 68082,213,34 # parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext4 0,32,32 24,89,0
# parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext4 24,89,1 60812,135,58 # parted /dev/disk/by-hba-slot/s0 -s mkpart primary linux-swap 60812,135,59 68082,213,34 On OL5: # parted /dev/disk/by-hba-slot/s0 -s mkpart file_system start end
Use the file_system, start, and end addresses that you obtained in Step 5 instead of the addresses shown in the following example: # parted /dev/disk/by-hba-slot/s0 -s mkpart ext3 0,0,34 25,127,7
# parted /dev/disk/by-hba-slot/s0 -s mkpart ext3 25,127,8 21697,116,20 # parted /dev/disk/by-hba-slot/s0 -s mkpart linux-swap 21697,116,21 23227,61,35 # parted /dev/disk/by-hba-slot/s0 -s mkpart ext3 0,0,34 25,127,7
# parted /dev/disk/by-hba-slot/s0 -s mkpart ext3 25,127,8 21697,116,20 # parted /dev/disk/by-hba-slot/s0 -s mkpart linux-swap 21697,116,21 23227,61,35 7. Create primary partition 4 using the start address obtained in Step 5 and an end address of 100% for /dev/disk/by-hba-slot/s0 for the disk you have replaced: On OL6: # parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext4 68082,213,35 100% Example creating partition on /dev/disk/by-hba-slot/s0: # parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext4 68082,213,35 100%
On OL5: # parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext3 23227,61,36 100%
Example creating partition on /dev/disk/by-hba-slot/s0: # parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext3 23227,61,36 100%
Partition 4 is an HDFS data partition so make it as big as possible. Other partitions please use exact chs info from surviving disk.The bda hardware check (bdacheckhw) checks for partition names and flags. Therefore we also need to clear the name and set the raid flags. Setting the name to empty can only be done in single command mode. Note: in the case of output like below, where there is a gap between the third and fourth partition e.g. a gap from partition 3: 23227,61,35 and partition 4: 25531,9,31: # parted /dev/disk/by-hba-slot/s1 -s unit chs print
Model: LSI MR9261-8i (scsi) Number Start End Size File system Name Flags 2 25,127,8 21697,116,20 ext3 raid 3 21697,116,21 23227,61,35 linux-swap 4 25531,9,31 364729,25,62 ext3 primary Create the primary partition 4 using the output above: On OL6: # parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext4 25531,9,31 100%
On OL5: # parted /dev/disk/by-hba-slot/s0 -s mkpart primary ext3 25531,9,31 100%
Its best to use the information from the other OS disk, so in this case 25531,9,31. This may be a more optimal chs boundary that was chosen by parted when the drive was partitioned using MB sizes/offsets. 8. Set the RAID flags for /dev/disk/by-hba-slot/s0 for the disk you have replaced: # parted -s /dev/disk/by-hba-slot/s0 set 1 raid
# parted -s /dev/disk/by-hba-slot/s0 set 2 raid Example setting the RAID flags on /dev/disk/by-hba-slot/s0: # parted -s /dev/disk/by-hba-slot/s0 set 1 raid
# parted -s /dev/disk/by-hba-slot/s0 set 2 raid
9. For OL6 ONLY set the boot flag on the first partition: (Not for OL5) # parted -s /dev/disk/by-hba-slot/s0 set 1 boot
# parted /dev/disk/by-hba-slot/s0
Example output clearing the names on /dev/disk/by-hba-slot/s0: # parted /dev/disk/by-hba-slot/s0
GNU Parted 1.8.1 Using /dev/sda Welcome to GNU Parted! Type 'help' to view a list of commands. (parted) Once in the shell then type the following to clear the names. Where parted is the parted prompt, don't type it: name 1 " "
name 2 " " name 3 " " quit Example showing (parted) which is the parted prompt: (parted) name 1 " "
(parted) name 2 " " (parted) name 3 " " (parted) quit Information: Don't forget to update /etc/fstab, if necessary.
# parted /dev/disk/by-hba-slot/s0 -s unit chs print
OL6 Example output for /dev/disk/by-hba-slot/s0 -s unit chs print. Note you must verify that the boot flag is set on OL6. # parted /dev/disk/by-hba-slot/s0 -s unit chs print
Model: LSI MR9261-8i (scsi) Disk /dev/sda: 486305,152,54 Sector size (logical/physical): 512B/4096B BIOS cylinder,head,sector geometry: 486305,255,63. Each cylinder is 8225kB. Partition Table: gpt Number Start End File system Name Flags 1 0,32,32 24,89,0 ext4 primary boot 2 24,89,1 60812,135,58 primary raid 3 60812,135,59 68082,213,34 linux-swap(v1) primary 4 68082,213,35 486305,120,22 ext4 primary OL5 Example output for /dev/disk/by-hba-slot/s0 # parted /dev/disk/by-hba-slot/s0 -s unit chs print
Model: LSI MR9261-8i (scsi) Disk /dev/sda: 243031,30,6 Sector size (logical/physical): 512B/512B BIOS cylinder,head,sector geometry: 243031,255,63. Each cylinder is 8225kB. Partition Table: gpt Number Start End File system Name Flags 1 0,0,34 25,127,7 ext3 raid 2 25,127,8 21697,116,20 ext3 raid 3 21697,116,21 23227,61,35 linux-swap 4 23227,61,36 243031,29,36 ext3 primary 12. Complete the steps in the next section titled "Repairing the RAID Arrays." Repairing the RAID ArraysAfter partitioning the disks, repair the the two logical RAID arrays. There are two md arrays: /dev/md0 and /dev/md2 Caution: Do not dismount the /dev/md devices as this will bring the system down.
To repair the RAID arrays issue a series of mdadm commands in pairs as there are two arrays md0 and md2. For each partition first mark the partition failed, remove it, and add back in. Note: mdadm options frequently start with two dashes. All the mdadm command options listed in this section start with two dashes ("--").
# mdadm /dev/md0 -r detached
# mdadm /dev/md2 -r detached Example: # mdadm /dev/md0 -r detached
# mdadm /dev/md2 -r detached
Note: Only if the faulty disk has NOT been replaced yet but will be shortly then follow a) and b) below to fail and remove the disk drive. Otherwise it the disk has been replaced already then continue to step 2 below and do NOT follow steps a) and b).
a) Mark the partitions as failed for /dev/disk/by-hba-slot/s0p1 and /dev/disk/by-hba-slot/s0p2: # mdadm --fail /dev/md0 /dev/disk/by-hba-slot/s0p1
# mdadm --fail /dev/md2 /dev/disk/by-hba-slot/s0p2 Example output marking the partitions as failed on /dev/disk/by-hba-slot/s0p1 and /dev/disk/by-hba-slot/s0p2: # mdadm --fail /dev/md0 /dev/disk/by-hba-slot/s0p1
mdadm: set device faulty failed for /dev/disk/by-hba-slot/s0p1: No such device # mdadm --fail /dev/md2 /dev/disk/by-hba-slot/s0p2 mdadm: set device faulty failed for /dev/disk/by-hba-slot/s0p2: No such device You can ignore "No such device" messages in the mdadm commands. # mdadm --remove /dev/md0 /dev/disk/by-hba-slot/s0p1
# mdadm --remove /dev/md2 /dev/disk/by-hba-slot/s0p2 Example output when removing the partitions on /dev/disk/by-hba-slot/s0p1 and /dev/disk/by-hba-slot/s0p2: # mdadm --remove /dev/md0 /dev/disk/by-hba-slot/s0p1
mdadm: hot remove failed for /dev/disk/by-hba-slot/s0p1: No such device or address # mdadm --remove /dev/md2 /dev/disk/by-hba-slot/s0p2 mdadm: hot remove failed for /dev/disk/by-hba-slot/s0p2: No such device or address # mdadm -Q --detail /dev/md0
# mdadm -Q --detail /dev/md2 Example output: # mdadm -Q --detail /dev/md0
/dev/md0: Version : 0.90 Creation Time : Wed Feb 6 14:53:33 2013 Raid Level : raid1 Array Size : 204736 (199.97 MiB 209.65 MB) Used Dev Size : 204736 (199.97 MiB 209.65 MB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Mon Feb 25 13:37:43 2013 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 9f524e09:c75bfe13:4803c1e9:70ea81fd Events : 0.156 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 17 1 active sync /dev/sdb1 # mdadm -Q --detail /dev/md2
/dev/md2: Version : 0.90 Creation Time : Wed Feb 6 14:52:34 2013 Raid Level : raid1 Array Size : 174079936 (166.02 GiB 178.26 GB) Used Dev Size : 174079936 (166.02 GiB 178.26 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Mon Feb 25 13:55:12 2013 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 UUID : 7ae6e86e:69391270:9cdd6430:f7625f21 Events : 0.1092 Number Major Minor RaidDevice State 0 0 0 0 removed 1 8 18 1 active sync /dev/sdb2 # cat /sys/block/md0/md/degraded
Example output: # cat /sys/block/md0/md/degraded
1
# cat /sys/block/md2/md/degraded
Example output: # cat /sys/block/md2/md/degraded
1
# mdadm --add /dev/md0 /dev/disk/by-hba-slot/s0p1
# mdadm --add /dev/md2 /dev/disk/by-hba-slot/s0p2 Example output restoring the partitions on /dev/disk/by-hba-slot/s0p1 and /dev/disk/by-hba-slot/s0p2: # mdadm --add /dev/md0 /dev/disk/by-hba-slot/s0p1
mdadm: re-added /dev/disk/by-hba-slot/s0p1 # mdadm --add /dev/md2 /dev/disk/by-hba-slot/s0p2 mdadm: re-added /dev/disk/by-hba-slot/s0p2 5. Check that resynchronization is started, so that /dev/md[02] is in a state of recovery and not idle. Although you may see 'idle' if recovery goes too fast. # cat /sys/block/md0/md/sync_action
And # cat /sys/block/md2/md/sync_action
Example output: # cat /sys/block/md2/md/sync_action
recover
# cat /proc/mdstat
Example output which shows the percentage complete at 66.1%: # cat /proc/mdstat
Personalities : [raid1] md0 : active raid1 sda1[0] sdb1[1] 204736 blocks [2/2] [UU] md2 : active raid1 sda2[2] sdb2[1] 174079936 blocks [2/1] [_U] [=============>.......] recovery = 66.1% (115206144/174079936) finish=11.1min speed=87842K/sec unused devices: <none> The following output shows that synchronization is complete: # cat /proc/mdstat
Personalities : [raid1] md0 : active raid1 sda1[0] sdb1[1] 204736 blocks [2/2] [UU] md2 : active raid1 sda2[0] sdb2[1] 174079936 blocks [2/2] [UU] unused devices: <none> 7. View the contents of /etc/mdadm.conf: # cat /etc/mdadm.conf
Example output: # cat /etc/mdadm.conf
# mdadm.conf written out by anaconda DEVICE partitions MAILADDR root ARRAY /dev/md2 level=raid1 num-devices=2 uuid=7ae6e86e:69391270:9cdd6430:f7625f21 ARRAY /dev/md0 level=raid1 num-devices=2 uuid=9f524e09:c75bfe13:4803c1e9:70ea81fd 8. Compare the output of the following command with the content of /etc/mdadm.conf from Step 7: # mdadm --examine --brief --scan --config=partitions
In this example output the content of the mdadm command and /etc/mdadm.conf are the same. No changes /etc/mdadm.conf are required: # mdadm --examine --brief --scan --config=partitions
ARRAY /dev/md0 level=raid1 num-devices=2 UUID=9f524e09:c75bfe13:4803c1e9:70ea81fd ARRAY /dev/md2 level=raid1 num-devices=2 UUID=7ae6e86e:69391270:9cdd6430:f7625f21 9. If the UUIDs in the lines for /dev/md0 and /dev/md2 in the file are different from the output of the mdadm command, then use a text editor to replace them with the output of the above mdadm command. a. Open /etc/mdadm.conf in a text editor. 10. Complete the steps in the next section titled "Formatting the HDFS Partition of an Operating System Disk." Formatting the HDFS Partition of an Operating System DiskPartition 4 (sda4 or sdb4) on an operating system disk is used for HDFS. After you format the partition and set the correct label, HDFS rebalances the job load to use the partition if the disk space is needed. To format the HDFS partition: # mkfs -t ext4 /dev/disk/by-hba-slot/s0p4
Example output formatting the HDFS partiton on /dev/disk/by-hba-slot/s0p4 as an ext4 file system: # mkfs -t ext4 /dev/disk/by-hba-slot/s0p4
mkfs 1.41.12 (17-May-2010) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 110354432 inodes, 441393655 blocks 22069682 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 13471 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848 Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 23 mounts or 180 days, whichever comes first. Use tune4fs -c or -i to override.
Note: If this command fails because the device is mounted, then dismount the drive now as shown below and skip step 3 later.
Example showing command failing. # mkfs -t ext4 /dev/disk/by-hba-slot/s0p4
mkfs 1.41.12 (17-May-2010) /dev/disk/by-hba-slot/s0p4 is mounted; will not make a filesystem here! If this command fails i.e. if "mkfs -t ext4 /dev/disk/by-hba-slot/s0p4" fails, then dismount the device and repeat the command. # umount /u01
# mkfs -t ext4 /dev/disk/by-hba-slot/s0p4 Example output from formatting the HDFS Partition on /dev/disk/by-hba-slot/s0p4 after dismounting /u01: # umount /u01
# mkfs -t ext4 /dev/disk/by-hba-slot/s0p4 mkfs 1.41.12 (17-May-2010) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) Stride=0 blocks, Stripe width=0 blocks 110354432 inodes, 441393655 blocks 22069682 blocks (5.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 13471 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000, 214990848 Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 25 mounts or 180 days, whichever comes first. Use tune4fs -c or -i to override.
# ls -l /dev/disk/by-label
Example output when nothing is missing: # ls -l /dev/disk/by-label
total 0 3. Dismount (only if you did not do so in step 1 above) the appropriate HDFS partition, /u01 for /dev/sda: # umount /u01
Example: # umount /u01
Once the HDFS partition is formatted as an ext4 file system then don't dismount the drive again because the device is already dismounted. Use mount -l to check if the device mounted. # mount -l
Example showing /u01 is not mounted: # mount -l
/dev/md2 on / type ext3 (rw,noatime) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/md0 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) <<<< /u01 umounted /dev/sdb4 on /u02 type ext4 (rw,nodev,noatime) [/u02] /dev/sdc1 on /u03 type ext4 (rw,nodev,noatime) [/u03] /dev/sdd1 on /u04 type ext4 (rw,nodev,noatime) [/u04] /dev/sde1 on /u05 type ext4 (rw,nodev,noatime) [/u05] /dev/sdf1 on /u06 type ext4 (rw,nodev,noatime) [/u06] /dev/sdg1 on /u07 type ext4 (rw,nodev,noatime) [/u07] /dev/sdh1 on /u08 type ext4 (rw,nodev,noatime) [/u08] /dev/sdi1 on /u09 type ext4 (rw,nodev,noatime) [/u09] /dev/sdj1 on /u10 type ext4 (rw,nodev,noatime) [/u10] /dev/sdk1 on /u11 type ext4 (rw,nodev,noatime) [/u11] /dev/sdl1 on /u12 type ext4 (rw,nodev,noatime) [/u12] fuse_dfs on /mnt/hdfs-nnmount type fuse.fuse_dfs (rw,nosuid,nodev,allow_other,allow_other,default_permissions) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) nfsd on /proc/fs/nfsd type nfsd (rw) (For OL6 use tune2fs. For OL5 use tune4fs.) On OL6: # tune2fs -c -1 -i 0 -m 0.2 -L /u01 /dev/disk/by-hba-slot/s0p4
On OL5: # tune4fs -c -1 -i 0 -m 0.2 -L /u01 /dev/disk/by-hba-slot/s0p4
Example on OL5 resetting the partition label on /u01 and /dev/disk/by-hba-slot/s0p4: # tune4fs -c -1 -i 0 -m 0.2 -L /u01 /dev/disk/by-hba-slot/s0p4
tune4fs 1.41.12 (17-May-2010) Setting maximal mount count to -1 Setting interval between checks to 0 seconds Setting reserved blocks percentage to 0.2% (882787 blocks) # mount /u01
Example mounting /u01: # mount /u01
You can check to see if the device is mounted: # mount -l
The following shows /u01 is mounted: # mount -l
/dev/md2 on / type ext3 (rw,noatime) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) /dev/md0 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) /dev/sdb4 on /u02 type ext4 (rw,nodev,noatime) [/u02] /dev/sdc1 on /u03 type ext4 (rw,nodev,noatime) [/u03] /dev/sdd1 on /u04 type ext4 (rw,nodev,noatime) [/u04] /dev/sde1 on /u05 type ext4 (rw,nodev,noatime) [/u05] /dev/sdf1 on /u06 type ext4 (rw,nodev,noatime) [/u06] /dev/sdg1 on /u07 type ext4 (rw,nodev,noatime) [/u07] /dev/sdh1 on /u08 type ext4 (rw,nodev,noatime) [/u08] /dev/sdi1 on /u09 type ext4 (rw,nodev,noatime) [/u09] /dev/sdj1 on /u10 type ext4 (rw,nodev,noatime) [/u10] /dev/sdk1 on /u11 type ext4 (rw,nodev,noatime) [/u11] /dev/sdl1 on /u12 type ext4 (rw,nodev,noatime) [/u12] none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) nfsd on /proc/fs/nfsd type nfsd (rw) /dev/sda4 on /u01 type ext4 (rw,nodev,noatime) [/u01] Restoring the Swap PartitionAfter formatting the HDFS partition, you can restore the swap partition. # mkswap -L SWAP-sda3 /dev/disk/by-hba-slot/s0p3
Example OL6 output setting the swap label on sda3 and /dev/disk/by-hba-slot/s0p3: # mkswap -L SWAP-sda3 /dev/disk/by-hba-slot/s0p3
Setting up swapspace version 1, size = 38602748 KiB LABEL=SWAP-sda3, UUID=88075e3f-ac3a-41e7-bb90-7d0ff9076eb9 Example OL5 output setting the swap label on sda3 and /dev/disk/by-hba-slot/s0p3: # mkswap -L SWAP-sda3 /dev/disk/by-hba-slot/s0p3
Setting up swapspace version 1, size = 12582907 kB LABEL=SWAP-sda3, no uuid # bdaswapon; bdaswapoff
Example output verifying the swap partition is restored: # bdaswapon; bdaswapoff
Filename Type Size Used Priority /dev/sda3 partition 12287992 0 1 /dev/sdb3 partition 12287992 0 1 # ls -l /dev/disk/by-label
4. Trigger kernel device uevents to replay missing events at system coldplug. a) For Linux OS 5, execute below command udevtrigger
b) For Linux OS 6, execute below command udevadm trigger
Note:- With both commands --verbose option can be used to check what events are triggered 5. Verifiy the replaced disk is listed in 'ls -l /dev/disk/by-label' output. 6. Complete the steps in the next section titled "Restoring the GRUB Master Boot Records and HBA Boot Order." Restoring the GRUB Master Boot Records and HBA Boot Order
After restoring the swap partition, you can restore the Grand Unified Bootloader (GRUB) master boot record and the HBA Boot Order. 1. The device.map file maps the BIOS drives to operating system devices. The following is an example of a default device map file: # more /boot/grub/device.map
# this device map was generated by anaconda (hd0) /dev/sda (hd1) /dev/sdb Unfortunately grub device map does not support symbolic links. Thus (hd0) is mapped to /dev/sda , (hd1) is mapped to /dev/sdb in the map file and not by /dev/disk/by-hba-slot. # ls -ld /dev/disk/by-hba-slot/s0
Sample output when device name is sda lrwxrwxrwx 1 root root 9 Apr 24 14:05 /dev/disk/by-hba-slot/s0 -> ../../sda
If device name is ../../sda then jump to step2(Open GRUB) . But if device is mapped to some other name say ../../sdn then follow below steps to set hd0 to point to the new device name by following these steps .... [root@bdanode01 ~]# cd /boot/grub
[root@bdanode01 grub]# cp device.map mydevice.map [root@bdanode01 grub]# ls -l *device* Sample output -rw-r--r-- 1 root root 85 Apr 22 14:50 device.map
-rw-r--r-- 1 root root 85 Apr 24 09:24 mydevice.map
# more /boot/grub/mydevice.map
# this device map was generated by bda install (hd0) /dev/sdn (hd1) /dev/sdb
a) If device name in slot0 is /dev/sda then use default device.map file # grub --device-map=/boot/grub/device.map
OR # grub --device-map=/boot/grub/mydevice.map
Example output when using device.map file: # grub --device-map=/boot/grub/device.map
GNU GRUB version 0.97 (640K lower / 3072K upper memory) [ Minimal BASH-like line editing is supported. For the first word, TAB lists possible command completions. Anywhere else TAB lists the possible completions of a device/filename.] grub>
3. Set the root device by entering hd0 for disk in slot0 grub> root (hd0,0)
Example setting the root device to hd0 for /dev/sda: grub> root (hd0,0)
root (hd0,0) Filesystem type is ext2fs, partition type 0x83 grub> setup (hd0)
Example installing grub on hd0 (/dev/sda): grub> setup (hd0)
setup (hd0) Checking if "/boot/grub/stage1" exists... no Checking if "/grub/stage1" exists... yes Checking if "/grub/stage2" exists... yes Checking if "/grub/e2fs_stage1_5" exists... yes Running "embed /grub/e2fs_stage1_5 (hd0)"... failed (this is not fatal) Running "embed /grub/e2fs_stage1_5 (hd0,0)"... failed (this is not fatal) Running "install /grub/stage1 (hd0) /grub/stage2 p /grub/grub.conf "... succeeded Done. grub> quit
Example output: grub> quit
quit 6.Verify that the boot drive in the HBA is set correctly. If it is set correctly skip the next step, Step 7 and go on to Step 8. If it is not set correctly perform the next step, Step 7. On BDA V4.3 and higher: a) Verify that the BootDrive VD:0 is set as the boot drive in the HBA. MegaCli64 /c0 show bootdrive
Example output when the BootDrive VD:0 is set as the boot drive in the HBA. In this case skip the next step, Step7 and go on to Step 8. # MegaCli64 /c0 show bootdrive Controller Properties : ----------------
b) Example output when the BootDrive VD:0 is NOT set as the boot drive in the HBA. In this case follow the next step, Step7. i. In this example the BootDrive VD:0 is not set (Continue to Step 7). # MegaCli64 /c0 show bootdrive Controller Properties : ------------------------
ii. You may also encounter the case where the BootDrive is set to VD:1. In this case also continue to Step 7. The standard default value for the boot drive in the HBA is VD0. If the boot drive in the HBA is set to VD1, set the boot drive to VD0 for consistency in Step 7. On BDA V4.2 and lower: Verify that logical drive L0 (letter L zero) is set as the boot drive in the HBA. # MegaCli64 -AdpBootDrive -get a0
Example output when the logical drive L0 is set as the boot drive in the HBA. In this case skip the next step, Step7 and go on to Step 8. # MegaCli64 -AdpBootDrive -get a0
Adapter 0: Boot Virtual Drive - #0 (target id - 0). Exit Code: 0x00 Any other outout, continue to Step 7. This includes the case when the BootDrive is not set and this includes the case when the BootDrive is set to VD:1. The standard default value for the boot drive in the HBA is VD0. If the boot drive in the HBA is set to VD1, therefore also set the boot drive to VD0 for consistency in Step 7. 7. Ensure that the Boot Drive is set correctly. You only need to perform this step if the boot drive in the HBA is NOT set correctly as per the previous step. On BDA V4.3 and higher: If the 'MegaCli64 /c0 show bootdrive' command does not report that the boot drive is set in the HBA i.e. BootDrive VD:0 or that the BootDrive is VD:1 then issue the following command. # MegaCli64 /c0/v0 set bootdrive=on
Example Output: # MegaCli64 /c0/v0 set bootdrive=on Detailed Status : ----------------------------------------- Verify: # MegaCli64 /c0 show bootdrive
On BDA V4.2 and lower: If the 'MegaCli64 -AdpBootDrive -get a0' command does not report logical L0 or "Boot Virtual Drive - #0 (target id - 0)" then issue the following command. # MegaCli64 AdpBootDrive set L0 a0
Example output: # MegaCli64 AdpBootDrive set L0 a0
Boot Virtual Drive is set to #0 (target id #0) on Adapter 0 Exit Code: 0x00 Verify: # MegaCli64 -AdpBootDrive -get a0
8. Ensure the auto select boot drive feature is enabled. # MegaCli64 adpBIOS EnblAutoSelectBootLd a0
Example output showing that the Auto Select Boot is already enabled: # MegaCli64 adpBIOS EnblAutoSelectBootLd a0
Auto select Boot is already Enabled on Adapter 0. Exit Code: 0x00 9. Check the configuration. See the section titled "Verifying the Disk Configuration." Verifying the Disk ConfigurationTo verify the disk configuration: # bdachecksw
Example successful output from running bdachecksw: # bdachecksw
SUCCESS: Correct OS disk s0 partition info : 1 ext3 raid 2 ext3 raid 3 linux-swap 4 ext3 primary SUCCESS: Correct OS disk s1 partition info : 1 ext3 raid 2 ext3 raid 3 linux-swap 4 ext3 primary SUCCESS: Correct data disk s2 partition info : 1 ext3 primary SUCCESS: Correct data disk s3 partition info : 1 ext3 primary SUCCESS: Correct data disk s4 partition info : 1 ext3 primary SUCCESS: Correct data disk s5 partition info : 1 ext3 primary SUCCESS: Correct data disk s6 partition info : 1 ext3 primary SUCCESS: Correct data disk s7 partition info : 1 ext3 primary SUCCESS: Correct data disk s8 partition info : 1 ext3 primary SUCCESS: Correct data disk s9 partition info : 1 ext3 primary SUCCESS: Correct data disk s10 partition info : 1 ext3 primary SUCCESS: Correct data disk s11 partition info : 1 ext3 primary SUCCESS: Correct software RAID info : /dev/md2 level=raid1 num-devices=2 /dev/md0 level=raid1 num-devices=2 SUCCESS: Correct mounted partitions : /dev/md0 /boot ext3 /dev/md2 / ext3 /dev/sd4 /u01 ext4 /dev/sd4 /u02 ext4 /dev/sd1 /u03 ext4 /dev/sd1 /u04 ext4 /dev/sd1 /u05 ext4 /dev/sd1 /u06 ext4 /dev/sd1 /u07 ext4 /dev/sd1 /u08 ext4 /dev/sd1 /u09 ext4 /dev/sd1 /u10 ext4 /dev/sd1 /u11 ext4 /dev/sd1 /u12 ext4 SUCCESS: Correct matching label and slot : symbolic link to `../../sda4' SUCCESS: Correct matching label and slot : symbolic link to `../../sdb4' SUCCESS: Correct matching label and slot : symbolic link to `../../sdc1' SUCCESS: Correct matching label and slot : symbolic link to `../../sdd1' SUCCESS: Correct matching label and slot : symbolic link to `../../sde1' SUCCESS: Correct matching label and slot : symbolic link to `../../sdf1' SUCCESS: Correct matching label and slot : symbolic link to `../../sdg1' SUCCESS: Correct matching label and slot : symbolic link to `../../sdh1' SUCCESS: Correct matching label and slot : symbolic link to `../../sdi1' SUCCESS: Correct matching label and slot : symbolic link to `../../sdj1' SUCCESS: Correct matching label and slot : symbolic link to `../../sdk1' SUCCESS: Correct matching label and slot : symbolic link to `../../sdl1' SUCCESS: Correct swap partition on /dev/disk/by-hba-slot/s0p3 : SWAP SUCCESS: Correct swap partition on /dev/disk/by-hba-slot/s1p3 : SWAP SUCCESS: Correct internal USB device (sdm) : 1 SUCCESS: Correct internal USB partitions : 1 primary ext3 SUCCESS: Correct internal USB ext3 partition check : clean SUCCESS: Correct Linux kernel version : Linux 2.6.32-200.21.1.el5uek SUCCESS: Correct Java Virtual Machine version : HotSpot(TM) 64-Bit Server 1.6.0_51 SUCCESS: Correct puppet version : 2.6.11 SUCCESS: Correct MySQL version : 5.5.17 SUCCESS: All required programs are accessible in $PATH SUCCESS: All required RPMs are installed and valid SUCCESS: Correct bda-monitor status : bda monitor is running SUCCESS: Big Data Appliance software validation checks succeeded 2. If there are errors, then redo the configuration steps as necessary to correct the problem. If error like below occurs i.e replaced disk partition is listed at the end and all partitions are recognized then this error can be ignored and is caused due to Bug 17899101 in bdachecksw script. ERROR: Wrong mounted partitions : /dev/md0 /boot ext3 /dev/md2 / ext3 /dev/sd4 /u01 ext4 /dev/sd1 /u03 ext4 /dev/sd1 /u04 ext4 /dev/sd1 /u05 ext4 /dev/sd1 /u06 ext4 /dev/sd1 /u07 ext4 /dev/sd1 /u08 ext4 /dev/sd1 /u09 ext4 /dev/sd1 /u10 ext4 /dev/sd1 /u11 ext4 /dev/sd1 /u12 ext4 /dev/sd4 /u02 ext4
INFO: Expected mounted partitions : 12 data partitions, /boot and / Bug 17899101 is fixed in V2.4 release of BDA. Patch 17924936 contains one-off patch for BUG 17899101 to V2.3.1 release of BDA. Patch 17924887 contains one-off patch for BUG 17899101 to V2.2.1 release of BDA. Refer to the Readme file for instructions on how to apply the patch. Readme file also contains un-install instructions as needed. What If Firmware Warnings or Errors occur?If the bdacheckhw utility reports errors / warnings with regards to the HDD (Hard Disk Drive) Firmware Information indicating that the HDD firmware needs to be updated follow the instructions in "Firmware Usage and Upgrade Information for BDA Software Managed Components on Oracle Big Data Appliance V2 [ID 1542871.1]”. What If a Server Fails to Restart?The server may restart during the disk replacement procedures, either because you issued a reboot command or made an error in a MegaCli64 command. In most cases, the server restarts successfully, and you can continue working. However, in other cases, an error occurs so that you cannot reconnect using ssh. In this case, you must complete the reboot using Oracle ILOM. Note: Your browser must have a JDK plug-in installed. If you do not see the Java coffee cup on the log-in page, then you must install the plug-in before continuing.
2. Log in using your Oracle ILOM credentials. See the following documentation for more information: Oracle Integrated Lights Out Manager (ILOM) 3.0 documentation at http://docs.oracle.com/cd/E19860-01/
References<NOTE:1542871.1> - Firmware Usage and Upgrade Information for BDA Software Managed Components on Oracle Big Data Appliance<NOTE:1581331.1> - Steps for Replacing a Disk Drive and Determining its Function on the Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x <NOTE:1581373.1> - How to Configure a Server Disk After Disk Replacement as an Operating System Disk for /u02 and /dev/sdb on Oracle Big Data Appliance V2.2.*/V2.3.1/V2.4.0/V2.5.0/V3.x/V4.x Attachments This solution has no attachment |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|