Asset ID: |
1-71-1985948.1 |
Update Date: | 2017-07-19 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1985948.1
:
How to Replace an Exalytics Storage Drive
Related Items |
- Exalytics In-Memory Machine X2-4
- Exalytics In-Memory Machine X3-4
- Exalytics In-Memory Machine X4-4
- Exalytics In-Memory Machine X5-4
- Exalytics In-Memory Machine X6-4
|
Related Categories |
- PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
|
Oracle Confidential PARTNER - Available to partners (SUN).
Reason: internal support doc
Applies to:
Exalytics In-Memory Machine X2-4 - Version All Versions to All Versions [Release All Releases]
Exalytics In-Memory Machine X3-4 - Version All Versions to All Versions [Release All Releases]
Exalytics In-Memory Machine X4-4 - Version All Versions to All Versions [Release All Releases]
Exalytics In-Memory Machine X5-4 - Version All Versions to All Versions [Release All Releases]
Exalytics In-Memory Machine X6-4 - Version All Versions to All Versions [Release All Releases]
x86_64
Goal
How to Replace an Exalytics Storage Drive
Solution
CAP PROBLEM OVERVIEW: Storage Drive Replacement
DISPATCH INSTRUCTIONS
WHAT SKILLS DOES THE FIELD ENGINEER/ADMINISTRATOR NEED:
N/A
TIME ESTIMATE: 60 minutes
TASK COMPLEXITY: 1
FIELD ENGINEER/ADMINISTRATOR INSTRUCTIONS
WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? :
The Exalytics server supports hot-plugging of the storage drives so a complete power down of the server is not required for normal disk replacement procedures.
WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO TAKE:
1. Confirm the Storage Drive failure and it's location.
- Confirm which Storage Drive is to be replaced. There are six Disk drives in an Exalytics system and they are numbered 0 to 5 from bottom to top.
- For most hard drive failures the amber fault led on the drive will be lit allowing the drive to be identified.
- If the drive location is not known then the drive status should be checked using the following megacli64 command to list out the current drive states. In this example we see a failed drive in slot 0 being identified:
[root@exalytics0 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | grep "Slot \| state"
Slot Number: 0
Firmware state: Unconfigured(bad)
Slot Number: 1
Firmware state: Online, Spun Up
Slot Number: 2
Firmware state: Online, Spun Up
Slot Number: 3
Firmware state: Online, Spun Up
Slot Number: 4
Firmware state: Online, Spun Up
Slot Number: 5
Firmware state: Online, Spun Up
- If the drive fault led is lit and it's Blue "ok to remove" led is lit then the drive may be hot swapped, proceed to step 3.
- If the drive to be replaced is not failed at all and this is a proactive replacement (or the drive is being replaced for any reason while the drive has not been marked as failed) then proceed to step 2 to manually mark the drive for replacment.
- If the drive is failed but the Blue "ok to remove" led is not lit then use the following megacli64 command to prepare the drive for removal. In this example 252 is the enclosure number and 5 is the drive slot being marked for removal. For an Exalytics internal disk drive the enclosure number should be 252 adjust the command in this example to use the proper drive number to be removed.
[root@exalytics0 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PDPrpRmv -PhysDrv [252:5] -a0
Prepare for removal Success
Exit Code: 0x00
- After the drive has been prepared for removal go to step 3 to perform the physical replacement.
2. For disks not marked as failed, manually mark the drive for replacement.
- If the drive to be replaced is not in a failed state and is still in use by the system then it must be manually marked for replacement using megacli64. (for drives already failed and ready for replacement skip to step 3)
- After confirming the disk slot location to be replaced change the drive to offline status. In the following command examples we will be replacing the disk in slot 1 adjust your commands for the proper drive slot.
[root@exalytics0 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PDOffline -PhysDrv [252:1] -a0
Adapter: 0: EnclId-252 SlotId-1 state changed to OffLine.
Exit Code: 0x00
- Then mark the drive as missing
[root@exalytics0 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PDMarkMissing -PhysDrv [252:1] -a0
EnclId-252 SlotId-1 is marked Missing.
Exit Code: 0x00
- and finally prepare the drive for removal
[root@exalytics0 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PDPrpRmv -PhysDrv [252:1] -a0
Prepare for removal Success
Exit Code: 0x00
- The disk is now ready for physical replacement.
3. Remove the Storage Drive
- On the drive you plan to remove, push the latch release button to open the latch. (Caution - The latch is not an ejector. Do not bend the latch too far to the right. Doing so can damage the latch.)
- Grasp the latch and pull the drive out of the drive slot.
- place the drive aside on an antistatic mat.
4. Install the replacement Storage Drive
- Remove the replacement drive from its packaging and place the drive on an antistatic mat.
- Align the replacement drive to the drive slot. The drive is physically addressed according to the slot in which it is installed. It is important to install a replacement drive in the same slot as the drive that was removed.
- Slide the drive into the bay until the drive is fully seated.
- Close the drive latch to lock the drive in place.
5. Check the drive and confirm the system is rebuilding the raid volume, if not then use the manual commands to bring the drive back into use.
- For most replacements the SAS controller will automatically attach the disk to the array which contained the previously failed drive and start to rebuild the array. After physically installing the drive wait at least 2 minutes and observe the drive status LEDs. If the controller automatically attaches the drive to the array and starts a rebuild you will see the fault led turn off and the activity light on the drive start to blink.
- After waiting two minutes check the drive status to see if it is being rebuilt automatically. We can use the same command used previously to check the disk status. In this example we see the drive in slot 1 is in the rebuild status
[root@exalytics0 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | grep "Slot \| state"
Slot Number: 0
Firmware state: Online, Spun Up
Slot Number: 1
Firmware state: Rebuild
Slot Number: 2
Firmware state: Online, Spun Up
Slot Number: 3
Firmware state: Online, Spun Up
Slot Number: 4
Firmware state: Online, Spun Up
Slot Number: 5
Firmware state: Online, Spun Up
- We can check on the progress of the rebuild and see how long the rebuild has taken so far using the following command
[root@exalytics0 ~]# /opt/MegaRAID/MegaCli/MegaCli64 PDRbld ShowProg PhysDrv [252:1] -aAll
Rebuild Progress on Device at Enclosure 252, Slot 1 Completed 11% in 7 Minutes.
Exit Code: 0x00
- If the drive does not automatically reattach and start it's rebuild we can do this manually. Start by getting the array information on the missing drive
[root@exalytics0 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -pdgetmissing -a0
Adapter 0 - Missing Physical drives
No. Array Row Size Expected
0 0 1 571250 MB
Exit Code: 0x00
- Here we see that the disk in slot 1 was part of Array 0 and Row 1 we will use this information to replace the missing disk
[root@exalytics0 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PdReplaceMissing -PhysDrv [252:1] -Array0 -row1 -a0
Adapter: 0: Missing PD at Array 0, Row 1 is replaced.
Exit Code: 0x00
- After the drive has been replaced we can start the rebuild process
[root@exalytics0 ~]# /opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -Start -PhysDrv [252:1] -a0
Started rebuild progress on device(Encl-252 Slot-1)
Exit Code: 0x00
- We can now check on the progress of the rebuild and see how long the rebuild has taken so far using the following command
[root@exalytics0 ~]# /opt/MegaRAID/MegaCli/MegaCli64 PDRbld ShowProg PhysDrv [252:1] -aAll
Rebuild Progress on Device at Enclosure 252, Slot 1 Completed 11% in 7 Minutes.
Exit Code: 0x00
- After the rebuild process finishes the replacement is complete. The time for the rebuild process will vary based on the drive sizes and the usage of the drives during the rebuild process. High usage during the replacement will extend the time needed for the rebuilding to finish.
OBTAIN CUSTOMER ACCEPTANCE
WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:
Perform the system administration tasks required to utilize the replacement drive as needed depending upon the configuration in use.
REFERENCE INFORMATION:
Oracle Exalytics In-Memory Machine Documentation Library
https://docs.oracle.com/cd/E56045_01/index.htm
Sun Server X2-4 Documentation
http://docs.oracle.com/cd/E20781_01/index.html
Sun Server X4-4 Documentation
http://docs.oracle.com/cd/E38212_01/index.html
Sun Server X5-4 Documentation
http://docs.oracle.com/cd/E56388_01/index.html
Attachments
This solution has no attachment