![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||
Solution Type Technical Instruction Sure Solution 1967510.1 : How to Replace an Exadata X5-2, X4-8, or later Compute (Database) Node HDD (Predictive or Hard Failure)
In this Document
Applies to:Exadata X5-2 Half Rack - Version All Versions and laterExadata X5-2 Quarter Rack - Version All Versions and later Exadata X5-2 Eighth Rack - Version All Versions and later Zero Data Loss Recovery Appliance X5 Hardware - Version All Versions and later Exadata X5-2 Full Rack - Version All Versions and later Information in this document applies to any platform. GoalHow to Replace an Exadata X5-2, X4-8, or later Compute (Database) Node HDD (Predictive or Hard Failure). SolutionDISPATCH INSTRUCTIONS WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?: The following information will be required prior to dispatch of a replacement: WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED?: Linux megaraid familiarity
Complete time may be dependent on disk re-sync time.
CRU-optional; default is FRU with Task Complexity: 2
- X5-2 - 4 disk RAID5 - X6-2 - 4 disk RAID5 with option for 8 disk RAID5 - X7-2 - 4 disk RAID5 with option for 8 disk RAID5 - X4-8 - 7 disk RAID5 - X5-8/X6-8 - 8 disk RAID5
1. Backup the volume and be familiar with the restore from bare metal procedure before replacing the disk. See Note 1084360.1 for details.
# /opt/MegaRAID/MegaCli/MegaCli64 -ldpdinfo -a0 | grep -i "cache policy"
Default Cache Policy: WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU Disk Cache Policy : Disabled #
# /opt/MegaRAID/MegaCli/MegaCli64 -encinfo -a0 | grep ID
Device ID : 252 #
# /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -iE "slot|firmware" Slot Number: 0
# /opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -iE "slot|predictive|firmware"
Slot Number: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Device Firmware Level: 0B70 Slot Number: 1 Predictive Failure Count: 290 Last Predictive Failure Event Seq Number: 121022 Firmware state: Online, Spun Up Device Firmware Level: 0B70 Slot Number: 2 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Device Firmware Level: 0B70 Slot Number: 3 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 Firmware state: Online, Spun Up Device Firmware Level: 0B70
# /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -start -physdrv[E#:S#] -a0
where E# is the enclosure ID number identified in step a, and S# is the slot number of the disk identified in step b. In the example above, the command # /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -start -physdrv[252:0] -a0
3. Verify the state of the RAID is optimal or degraded , with the good disk(s) online before hot-swap removing the # /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -a0 | grep -iE "target|state|slot"
Virtual Drive: 0 (Target Id: 0) State : Degraded Slot Number: 0 Firmware state: Online, Spun Up Foreign State: None Slot Number: 1 Firmware state: Failed Foreign State: None Slot Number: 2 Firmware state: Online, Spun Up Foreign State: None Slot Number: 3 Firmware state: Online, Spun Up Foreign State: None #
4. On the drive you plan to remove, push the storage drive release button to open the latch. 5. Grasp the latch and pull the drive out of the drive slot (Caution: The latch is not an ejector. Do not bend it too far to the right. Doing so can damage the latch. 6. Wait three minutes for the system to acknowledge the disk has been removed. 7. Slide the new drive into the drive slot until it is fully seated. 8. Close the latch to lock the drive in place. 9. Verify the "OK/Activity" Green LED begins to flicker as the system recognizes the new drive. The other two LEDs for the drive should no longer be illuminated. # /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -stop -physdrv[E#:S#] -a0
# /opt/MegaRAID/MegaCli/MegaCli64 -PdLocate -stop -physdrv[252:1] -a0
WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE: 1. Verify the disk is brought online into a volume by LSI MegaRAID. Until the disk is added into a volume, the OS will not be able to use the disk.
# /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -physdrv[E#:Slot#] -a0
where E# is the enclosure ID number identified in step 2a of the replacement steps, and S# is the slot number of the disk replaced. In the example above, # /opt/MegaRAID/MegaCli/MegaCli64 -PdInfo -physdrv[252:1] -a0
Adapter #0 Enclosure Device ID: 252 Slot Number: 1 Device Id: 10 Sequence Number: 7 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SAS Raw Size: 136.727 GB [0x11174b81 Sectors] Non Coerced Size: 136.227 GB [0x11074b81 Sectors] Coerced Size: 136.218 GB [0x11070000 Sectors] Firmware state: Online, Spun Up SAS Address(0): 0x5000cca00a1b817d SAS Address(1): 0x0 Connected Port Number: 2(path0) Inquiry Data: HITACHI H103014SCSUN146GA1600934FH3Y8E FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive: Not Certified 2 Verify the replacement disk has been added to the expected RAID volume. # /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -a0 | grep -iE "target|state|slot"
If it has already completed the copyback when checked, then it may already be in “Online” state. If it is in rebuilding or copyback state, you can use the following to # /opt/MegaRAID/MegaCli/MegaCli64 -pdrbld -showprog -physdrv [E#:Slot#] -a0
where E# is the enclosure ID number identified in step 2a of the replacement steps, and S# is the slot number of the disk in Rebuild state. # /opt/MegaRAID/MegaCli/MegaCli64 -pdrbld -showprog -physdrv [252:1] -a0
Rebuild Progress on Device at Enclosure 252, Slot 1 Completed 9% in 3 Minutes. Exit Code: 0x00 # or # /opt/MegaRAID/MegaCli/MegaCli64 -pdcpybk -showprog -physdrv [E#:Slot#] -a0
where E# is the enclosure ID number identified in step 2a of the replacement steps, and S# is the slot number of the disk in Copyback state. This is typically # /opt/MegaRAID/MegaCli/MegaCli64 -pdcpybk -showprog -physdrv [252:0] -a0
Copyback Progress on Device at Enclosure 252, Slot 0 Completed 79% in 29 Minutes. Exit Code: 0x00 # 3. Optionally update the disk firmware as needed, following the procedure in Note 2088888.1. PARTS NOTE: Refer to the Exadata Database Maintenance Guide Appendix B for part information. Refer to the Oracle System Handbook for part information. (https://mosemp.us.oracle.com/handbook_internal/index.html)
References<NOTE:1113014.1> - HALRT-02008: Database node hard disk predictive failure<NOTE:2088888.1> - How to Update Disk Drive Firmware on Exadata and Recovery Appliance Compute Nodes <NOTE:2010838.1> - INTERNAL Exadata Database Machine Hardware Current Product Issues - DB Nodes (X5 and Later) <NOTE:1479736.1> - How to Replace an Exadata Compute (Database) node hard disk drive (Predictive or Hard Failure) (X4-2 and earlier) <NOTE:1360360.1> - INTERNAL Exadata Database Machine Hardware Troubleshooting <NOTE:1452325.1> - Determining when Disks should be replaced on Oracle Exadata Database Machine <NOTE:1084360.1> - Bare Metal Restore Procedure for Compute Nodes on an Exadata Environment <NOTE:1113034.1> - HALRT-02007: Database node hard disk failure Attachments This solution has no attachment |
||||||||||||||||
|