![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Technical Instruction Sure Solution 1479736.1 : How to Replace an Exadata Compute (Database) node hard disk drive (Predictive or Hard Failure) (X4-2 and earlier)
Canned Action Plan procedure to replace an Exadata Compute (Database) node hard disk drive (Predictive or Hard Failure). This covers Exadata disk alert HALRT-02007 and HALRT-02008. Applies to:Exadata Database Machine V2 - Version All Versions and laterExadata Database Machine X2-8 - Version All Versions and later Exadata X3-2 Hardware - Version All Versions and later Exadata X3-8b Hardware - Version All Versions and later Exadata X4-2 Hardware - Version All Versions and later Linux x86-64 Information in this document applies to any platform. Oracle Solaris on x86-64 (64-bit) GoalIdentify and replace a failed hard disk drive from an Exadata Compute (Database) node for hard or predictive failures (X4-2 and earlier). SolutionDISPATCH INSTRUCTIONS: The customer may choose to do the replacement themselves. In this case, the disk should be sent out using a parts-only dispatch.
WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?: Linux megaraid familiarity TIME ESTIMATE: 60 minutes Complete time may be dependent on disk re-sync time. TASK COMPLEXITY: 0 CRU-optional; default is FRU with Task Complexity: 2 FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS: A critical hard disk failure may be marked “critical” or “failed” depending on release version, and is referred to as critical hard failure in this document. For a critical hard failure, the LED for the failed hard disk should have the "OK to Remove" blue LED illuminated/flashing and have the "Service Action Required" amber LED illuminated/flashing. This may trigger alarm HALRT-02007 - refer to Note 1113034.1. For a predictive failure, the LED for the failed hard disk should have the “Service Action Required” amber LED illuminated/flashing.On certain image revisions, predictive failures may not yet be removed from the volume and may not have a fault LED on. This may trigger alarm HALRT-02008 - refer to Note 1113014.1. The normal DB node volume arrangement depends on the OS installed and the current active image version. Use “/opt/oracle.cellos/imageinfo” to determine the current active image version, and “uname -s” to determine the OS type. The volumes expected are as follows: V2/X2-2/X3-2/X4-2 Linux only, if dual-boot Solaris image partition has been reclaimed or was not present:
X2-2/X3-2/X4-2 Linux and Solaris dual-boot, if other OS image partitions have not been reclaimed:
X2-2/X3-2/X4-2 Solaris only, if dual-boot Linux image partition has been reclaimed:
X2-8 Linux only, if dual-boot Solaris image partition has been reclaimed:
X2-8 Linux and Solaris dual-boot, if other OS image partitions have not been reclaimed:
X3-8 Linux only:
WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE?: 1. Backup the volume and be familiar with the restore from bare metal procedure before replacing the disk. See Note 1084360.1 for details. If the DB node was running 11.2.2.1.1 or 11.2.2.2.x images and was in a state of write-through caching mode at some stage (default is write-back), there is a possibility that the Linux file system is corrupt due to a disk controller firmware bug. When this is encountered the file system may have been operating normally however will go read-only when attempting to rebuild the corrupted blocks across to the hotspare disk. This may be unavoidable as the rebuild copy back from hotspare to replacement occurs automatically. This requires a bare metal restore to correct. To check the status of the current cache policy, use the following command, the current should be WriteBack, not WriteThrough. The Linux example shows the state that is not desired, Solaris is showing the desired state: Linux: # /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep -i "cache policy"
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Disk Cache Policy : Disabled # Solaris: # /opt/MegaRAID/MegaCli -ldpdinfo -a0 | grep -i "cache policy"
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU Disk Cache Policy : Disabled # 2. Identify the disk using the Amber fault and Blue OK-to-Remove LED states. The DB node server within the rack can be determined from the hostname usually, and the known default Exadata server numbering scheme counting server numbers up from 1 as the lower most DB node in the rack. The server's white Locate LED may be flashing as well.
a. Obtain the enclosure ID for the MegaRAID card: Linux: # /opt/MegaRAID/MegaCli/MegaCli64 -encinfo -a0 | grep ID
Device ID : 252 # Solaris: # /opt/MegaRAID/MegaCli -encinfo -a0 | grep ID
Device ID : 252 # b. Identify the physical disk slot that is failed: Linux: # /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -iE "slot|firmware"
Slot Number: 0 Firmware state: Unconfigured(bad) Slot Number: 1 Firmware state: Online, Spun Up Slot Number: 2 Firmware state: Online, Spun Up Slot Number: 3 Firmware state: Rebuild Solaris: # /opt/MegaRAID/MegaCli -pdlist -a0 | egrep -i "slot|firmware"
"Unconfigured(bad)" is the expected state for the faulted disk. In this example, it is located in physical slot 0, and it can be seen that the Hotspare in slot 3 has started rebuilding the volume. If all disks show as Online or Hotspare, then the disk may be in predictive failure state but not yet gone offline. The failed disk can be identified using this additional information: Linux: # /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -iE "slot|predictive|firmware"
Slot Number: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Slot Number: 1 Predictive Failure Count: 12 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Slot Number: 2 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Slot Number: 3 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Hotspare, Spun down Solaris: # /opt/MegaRAID/MegaCli -pdlist -a0 | egrep -i "slot|predictive|firmware"
In this example, the disk in slot 1 has reported itself as predictive failed several times but is still online. This disk should be considered the bad one. For more details refer to Note 1452325.1. c. Use the locate function which turns the "Service Action Required" amber LED on flashing: Linux: # /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -start -physdrv[E#:S#] -a0
Solaris: # /opt/MegaRAID/MegaCli -PdLocate -start -physdrv[E#:S#] -a0
where E# is the enclosure ID number identified in step a, and S# is the slot number of the disk identified in step b. In the example above, the command would be: # /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -start -physdrv[252:0] -a0
3. Verify the state of the RAID is optimal or rebuilding if there is a hotspare, or degraded if there is not, with the good disk(s) online before hot-swap removing the failed disk. If the failed disk was the global hotspare, then this step should be skipped. Linux (RAID5 Example): # /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -a0 | grep -iE "target|state|slot"
Virtual Drive: 0 (Target Id: 0) State : Degraded Slot Number: 3 Firmware state: Rebuild Foreign State: None Slot Number: 1 Firmware state: Online, Spun Up Foreign State: None Slot Number: 2 Firmware state: Online, Spun Up Foreign State: None # Linux (RAID1 Example): # /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -a0 | grep -iE "target|state|slot"
Virtual Drive: 0 (Target Id: 0) State : Degraded Slot Number: 0 Firmware state: Online, Spun Up Foreign State: None Slot Number: 1 Firmware state: Unconfigured(bad) Foreign State: None Virtual Drive: 1 (Target Id: 1) State : Optimal Slot Number: 2 Firmware state: Online, Spun Up Foreign State: None Virtual Drive: 2 (Target Id: 2) State : Optimal Slot Number: 3 Firmware state: Online, Spun Up Foreign State: None # Solaris:
The volume type on Solaris is RAID0, and then the failure may cause the virtual drive to no longer be visible. In that case, check the expected number of good drives are present and online (3 of the 4 in X2-2 or 6 of the 8 in X2-8 where the hotspare does not show in this command), and verify the zpool status is degraded with 1 of the mirrors online:
# /opt/MegaRAID/MegaCli -LdPdInfo -a0 | egrep -i "target|state|slot"
Virtual Drive: 0 (Target Id: 0) State : Optimal Slot Number: 0 Firmware state: Online, Spun Up Foreign State: None Slot Number: 1 Firmware state: Online, Spun Up Foreign State: None Virtual Drive: 1 (Target Id: 1) State : Optimal Slot Number: 2 Firmware state: Online, Spun Up Foreign State: None # zpool status pool: rpool state: DEGRADED status: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Online the device using 'zpool online' or replace the device with 'zpool replace'. scan: resilvered 9.87G in 0h1m with 0 errors on Tue Jul 10 16:35:50 2012 config: NAME STATE READ WRITE CKSUM rpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 c3t1d0s0 ONLINE 0 0 0 c3t2d0s0 REMOVED 0 0 0 errors: No known data errors # 4. On the drive you plan to remove, push the storage drive release button to open the latch. 5. Grasp the latch and pull the drive out of the drive slot (Caution: The latch is not an ejector. Do not bend it too far to the right. Doing so can damage the latch. Also, whenever you remove a storage drive, you should replace it with another storage drive or a filler panel, otherwise the server might overheat due to improper airflow.) 6. Wait three minutes for the system to acknowledge the disk has been removed. 7. Slide the new drive into the drive slot until it is fully seated. 8. Close the latch to lock the drive in place. 9. Verify the "OK/Activity" Green LED begins to flicker as the system recognizes the new drive. The other two LEDs for the drive should no longer be illuminated. The server's locate and disk's service LED locate blinking function should automatically turn off. If it does not, it can be manually turned off for the device using: Linux: # /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -stop -physdrv[E#:S#] -a0
Solaris: # /opt/MegaRAID/MegaCli -PdLocate -stop -physdrv[E#:S#] -a0
where E# is the enclosure ID number identified in step 2a, and S# is the slot number of the disk identified in step 2b. In the example above, the command would be: # /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -stop -physdrv[252:0] -a0
OBTAIN CUSTOMER ACCEPTANCE If the OS is Linux, depending on the volume arrangement and image version, the disk may automatically become the new hotspare disk, or it may stay in an Unconfigured(good) state until the hotspare rebuild has completed. If it stays Unconfigured then the hotspare will copy back to rebuild on the new disk after the rebuild has completed. If it is a RAID1 then it should automatically come into the volume and start rebuilding. If the OS is Solaris, it is a Solaris RAID0 volume, and may not come into a volume automatically and will be in state Unconfigured(good) until it is in a volume. # /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -physdrv[E#:Slot#] -a0
where E# is the enclosure ID number identified in step 2a of the replacement steps, and S# is the slot number of the disk replaced. In the example above, the command and output would be: # /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -physdrv[252:0] -a0
Adapter #0 Enclosure Device ID: 252 Slot Number: 0 Device Id: 10 Sequence Number: 7 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 136.727 GB [0x11174b81 Sectors] Non Coerced Size: 136.227 GB [0x11074b81 Sectors] Coerced Size: 136.218 GB [0x11070000 Sectors] Firmware state: Unconfigured(good), Spun Up SAS Address(0): 0x5000cca00a1b817d SAS Address(1): 0x0 Connected Port Number: 2(path0) Inquiry Data: HITACHI H103014SCSUN146GA1600934FH3Y8E FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive: Not Certified 2 Verify the replacement disk has been added to the expected RAID volume. If the OS is running Linux and the failed disk was originally the global hotspare, then the replacement should have become the hotspare automatically, identified in step 1, and this step should be skipped. If that did not occur automatically, then the new disk can be assigned as the hotspare with the following command: # /opt/MegaRAID/MegaCli/MegaCli64 -PdHsp -set -EnclAffinity -PhysDrv[E#:Slot#] -a0
where E# is the enclosure ID number identified in step 2a of the replacement steps, and S# is the slot number of the disk replaced. If the OS is running Linux and the failed disk was part of a RAID volume, use the following MegaRAID command to verify the status of the RAID: # /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -a0 | grep -iE "target|state|slot"
If it has already completed the copyback when checked, then it may already be in “Online” state. If it is in rebuilding or copyback state, you can use the following to verify progress to completion: # /opt/MegaRAID/MegaCli/MegaCli64 -pdrbld -showprog -physdrv [E#:Slot#] -a0
where E# is the enclosure ID number identified in step 2a of the replacement steps, and S# is the slot number of the disk in Rebuild state. This is typically the original Hotspare disk slot. # /opt/MegaRAID/MegaCli/MegaCli64 -pdrbld -showprog -physdrv [252:3] -a0
Rebuild Progress on Device at Enclosure 252, Slot 3 Completed 9% in 3 Minutes. Exit Code: 0x00 # or # /opt/MegaRAID/MegaCli/MegaCli64 -pdcpybk -showprog -physdrv [E#:Slot#] -a0
where E# is the enclosure ID number identified in step 2a of the replacement steps, and S# is the slot number of the disk in Copyback state. This is typically the replaced disk slot. # /opt/MegaRAID/MegaCli/MegaCli64 -pdcpybk -showprog -physdrv [252:0] -a0
Copyback Progress on Device at Enclosure 252, Slot 0 Completed 79% in 29 Minutes. Exit Code: 0x00 # If the OS is running Solaris, the RAID0 MegaRAID volume may need to be recreated, if it was not done so automatically. In this example the rpool mirror disk in slot 3 was failed: # /opt/MegaRAID/MegaCli -cfgldadd -r0[252:3] wb nora direct nocachedbadbbu -strpsz1024 -a0
Adapter 0: Created VD 2 Adapter 0: Configured the Adapter!! Exit Code: 0x00 # Use format to partition the disk with a full-disk Solaris label, single cylinder boot block on slice 8, and the rest of the disk as root partition on slice 0. # format -e
Searching for disks...done c3t0d0: configured with capacity of 275.53GB AVAILABLE DISK SELECTIONS: 0. c3t0d0 <LSI-MR9261-8i-2.12 cyl 281 alt 2 hd 255 sec 8064> /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@0,0 1. c3t1d0 <LSI-MR9261-8i-2.12 cyl 36348 alt 2 hd 255 sec 63> ai-disk /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@1,0 2. c3t2d0 <LSI-MR9261-8i-2.12 cyl 36348 alt 2 hd 255 sec 63> /pci@0,0/pci8086,340a@3/pci1000,9263@0/sd@2,0 Specify disk (enter its number): 2 selecting c3t2d0 [disk formatted] No Solaris fdisk partition found. FORMAT MENU: disk - select a disk type - select (define) a disk type partition - select (define) a partition table current - describe the current disk format - format and analyze the disk fdisk - run the fdisk program repair - repair a defective sector label - write label to the disk analyze - surface analysis defect - defect list management backup - search for backup labels verify - read and display labels save - save new disk/partition definitions inquiry - show disk ID volname - set 8-character volume name !<cmd> - execute <cmd>, then return quit format> fdisk No fdisk table exists. The default partition for the disk is: a 100% "SOLARIS System" partition Type "y" to accept the default partition, otherwise type "n" to edit the partition table. Y format> ver Warning: Primary label on disk appears to be different from current label. Warning: Check the current partitioning and 'label' the disk or use the 'backup' command. Primary label contents: Volume name = < > ascii name = <DEFAULT cyl 36348 alt 2 hd 255 sec 63> pcyl = 36350 ncyl = 36348 acyl = 2 bcyl = 0 nhead = 255 nsect = 63 Part Tag Flag Cylinders Size Blocks 0 root wm 1 - 36347 278.43GB (36347/0/0) 583914555 1 unassigned wu 0 0 (0/0/0) 0 2 backup wu 0 - 36349 278.46GB (36350/0/0) 583962750 3 unassigned wu 0 0 (0/0/0) 0 4 unassigned wu 0 0 (0/0/0) 0 5 unassigned wu 0 0 (0/0/0) 0 6 unassigned wu 0 0 (0/0/0) 0 7 unassigned wu 0 0 (0/0/0) 0 8 boot wu 0 - 0 7.84MB (1/0/0) 16065 9 unassigned wu 0 0 (0/0/0) 0 format> label Ready to label disk, continue? y format> q # Re-attach the new disk to the zpool. Use -f option if this is a mounted root pool: # zpool attach -f rpool c3t1d0s0 c3t2d0s0
Make sure to wait until resilver is done before rebooting. If the attach fails on X2-2 nodes, it may be because 540-7869 was substituted with 542-0388 and vice versa. These parts are not compatible and have different cylinder counts. <Note 1370699.1> provides a workaround for that. If this was one of the 2 boot disks in the root pool, then re-enable booting: # installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c3t2d0s0
stage2 written to partition 0, 282 sectors starting at 50 (abs 16115) Verify status of the zpool rebuilding: # zpool status
pool: rpool state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Tue Jul 17 17:25:18 2012 32.9G scanned out of 35.9G at 128M/s, 0h0m to go 32.9G resilvered, 91.74% done config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 nirror-0 ONLINE 0 0 0 c3t1d0s0 ONLINE 0 0 0 c3t2d0s0 ONLINE 0 0 0 (resilvering) errors: No known data errors # PARTS NOTE: Refer to the Exadata Database Machine Owner's Guide Appendix C for part information. How to identify which Exadata disk FRU part number to order , based on image and vendor and mixed disk support status - Note 1416303.1
Internal Only References: References<NOTE:1113034.1> - HALRT-02007: Database node hard disk failure<NOTE:1113014.1> - HALRT-02008: Database node hard disk predictive failure <NOTE:1071220.1> - Oracle Sun Database Machine V2 Diagnosability and Troubleshooting Best Practices <NOTE:1416303.1> - How to identify which Exadata disk FRU part number to order based on disk model and server model <NOTE:1360360.1> - INTERNAL Exadata Database Machine Hardware Troubleshooting <NOTE:1501450.1> - INTERNAL Exadata Database Machine Hardware Current Product Issues (X3-2, X4-2, X3-8, X4-8 w/X4-2L) <NOTE:1452325.1> - Determining when Disks should be replaced on Oracle Exadata Database Machine <NOTE:1274324.1> - Oracle Sun Database Machine X2-2/X2-8, X3-2/X3-8 and X4-2 Diagnosability and Troubleshooting Best Practices <NOTE:1360343.1> - INTERNAL Exadata Database Machine Hardware Current Product Issues (V2, X2-2, X2-8) <NOTE:1084360.1> - Bare Metal Restore Procedure for Compute Nodes on an Exadata Environment <NOTE:1370699.1> - Exadata zfs boot disk replacement fails <NOTE:1967510.1> - How to Replace an Exadata X5-2/X4-8 or later Compute (Database) node hard disk drive (Predictive or Hard Failure) Attachments This solution has no attachment |
||||||||||||
|