![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||
Solution Type Technical Instruction Sure Solution 1524329.1 : Root Volume of Predictive Failure Boot Hard Drive in an Exadata Storage Server Remains in State of 'active'
In this Document
Created from <SR 3-6341701201> Applies to:Exadata X3-8 Hardware - Version All Versions and laterExadata X3-2 Full Rack - Version All Versions and later Exadata X3-2 Half Rack - Version All Versions and later Exadata Database Machine X2-2 Hardware - Version All Versions and later SPARC SuperCluster T4-4 Full Rack - Version All Versions and later Information in this document applies to any platform. GoalThis document describes the steps to take when the root volume of a Predictive Failure boot hard drive in an Exadata storage (cell) server remains in a state of 'active'. This document presupposes that <Document 1390836.1> How to Replace a Hard Drive in an Exadata Storage Server (Predictive Failure) has been followed through step 5 b. Solution<Document 1390836.1> How to Replace a Hard Drive in an Exadata Storage Server (Predictive Failure) explains that before pulling an OS disk that is in a state of Predictive Failure, the field engineer should verify that the root volume of the disk is in a 'clean' state. It then states that if the volume is 'active' and the disk is hot removed, the OS may crash making the recovery more difficult. Normally, the state should be clean immediately following (or shortly after) the disk changes to Predictive Failure and the disk replacement can proceed using the steps in <Document 1390836.1>. If this is not the case, and it is possible to do so, the cell node should be rebooted using <Document 1188080.1> Steps to shut down or reboot an Exadata storage cell without affecting ASM in an attempt to get the root volume status to change to 'clean'. If a reboot does not change the root volume status to the necessary 'clean' status, follow the steps below before proceeding to remove the physical device per <Document 1390836.1>. The following output assumes the root volume is /dev/md5 and shows that its status is still active: [root@edx2cel03 ~]# mdadm -Q --detail /dev/md5 Use 'mdadm' to set the faulty disk's root volume to a faulty state and to remove the volume from the configuration. IMPORTANT: Before faulting and removing the root volume of the failed disk, confirm again the slot of the failed boot disk using CellCLI as was done in step #1 of <Document 1390836.1> How to Replace a Hard Drive in an Exadata Storage Server (Predictive Failure). Failure to do so may result in the wrong root volume being faulted, thus causing the running OS to crash. A boot disk in slot 0 will have a logical device name of '/dev/sda', while a boot disk in slot 1 will have a logical device name of '/dev/sdb'.
In the following example, we fault and remove the root volume for the disk in slot 1:
[root@edx2cel03 ~]# mdadm --set-faulty /dev/md5 /dev/sdb5 mdadm: set /dev/sdb5 faulty in /dev/md5 [root@edx2cel03 ~]# mdadm --remove /dev/md5 /dev/sdb5
After running the above commands, 'mdadm --detail /dev/md5' should show sdb in a state of 'removed'. The disk in slot 1 can then be removed and replaced per the steps for physical disk replacement in <Document 1390836.1> How to Replace a Hard Drive in an Exadata Storage Server (Predictive Failure). After replacement, the mirror re-attach and resyncs should happen automatically, which could take up to several minutes to start. Run 'mdadm --detail /dev/md5' again to confirm. Also note that the logical device name for the disk that was replaced may have changed (for example, from sdb to sdad). This is normal. References<NOTE:1390836.1> - How to Replace a Hard Drive in an Exadata Storage Server (Predictive Failure)Attachments This solution has no attachment |
||||||||||||||||
|