Asset ID: |
1-71-1599510.1 |
Update Date: | 2018-04-30 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1599510.1
:
How to Replace a Flash Accelerator PCIe Card in an Oracle Exalytics X2-4 or X3-4 system
Related Items |
- Exalytics In-Memory Machine X2-4
- Exalytics In-Memory Machine X3-4
|
Related Categories |
- PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
|
Oracle Confidential INTERNAL - Do not distribute to customer (OracleConfidential).
Reason: internal support doc All Exalytic components are FRU replaced
Applies to:
Exalytics In-Memory Machine X3-4 - Version All Versions to All Versions [Release All Releases] Exalytics In-Memory Machine X2-4 - Version Not Applicable to Not Applicable [Release N/A] x86_64
Goal
How to Replace a Flash Accelerator PCIe Card in an Oracle Exalytics X2-4 or X3-4 system
Solution
CAP PROBLEM OVERVIEW: F40/F80 Flash PCIe Card replacement
DISPATCH INSTRUCTIONS
WHAT SKILLS DOES THE ENGINEER NEED: Oracle Exalytics Server Training
TIME ESTIMATE: 60 minutes
TASK COMPLEXITY: 3-FRU
FIELD ENGINEER INSTRUCTIONS
WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? :
If the system is still up and functioning, the customer should be ready to perform an orderly and graceful shutdown of applications and OS. Access to the system's OS root login may be needed if the flash card failure needs to be confirmed.
A data backup is not a prerequisite but is a wise precaution.
WHAT ACTION DOES THE ENGINEER NEED TO TAKE:
1. Check the Flash card status and confirm/identify the failed card.
- Check the status of the flash cards using the exalytics_CheckFlash.sh script. The following example shows a failure of one of the devices on Flash Card 1 (extra output from the other cards has been cut from the output for brevity)
[root@exalytics0 ~]# /opt/exalytics/bin/exalytics_CheckFlash.sh
Checking Exalytics Flash Drive Status
Fetching some info on installed flash drives ....
Driver version : 01.250.41.04 (2012.06.04)
Supported number of flash drives detected (6)
Flash card 1 :
Overall health status : ERROR. Use --detail for more info
Size (in MB) : 286101
Capacity (in bytes) : 300000000000
Firmware Version : 109.05.26.00
Devices: /dev/sde /dev/sdc /dev/sdf
:
---cut---
:
Raid Array Info (/dev/md3):
/dev/md3: 1117.59GiB raid1 2 devices, 0 spares. Use mdadm --detail for more detail.
/dev/md3: No md super block found, not an md component.
Summary:
Healthy flash drives : 5
Broken flash drives : 1
Fail : Flash card health check failed. See above for more details.
- The script will report an "ERROR" on the health status line for the card that has experienced a failure. In the above example we see that device /dev/sdd has failed and is no longer seen by Flash card 1. Make note of the devices assigned to this card so that they can be checked against the devices assigned to the replacement card later.
- A failed Flash card should have it's status LED lit amber or red. The status LED is the middle led on the rear of the card. (the top led is the Life LED and the bottom is the Activity LED). Check the rear of the system to confirm that the failed card can be identified by an amber/red status LED. For a normal/non-failed card this LED should be solid green. If the failed card to be replaced can be identified by it's status led make note of it's location and proceed to the next step to perform the physical replacment of the card.
- If the card has failed in such a way that the status LED is not showing a fault then the card to be replaced will need to be identified manually. In an Exalytics X2-4 or X3-4 system the Flash cards will populate PCIe slots 1,2,3,5,7,8. In step 1a above we should have identified the flash card that has failed. We can use this information to identify the card to be replaced by matching the ID number to the list of flashcards in the "ddcli -listall" command output to get the PCI Address.
[root@exalytics0 ~]# /opt/exalytics/flashUtil/ddcli -listall
****************************************************************************
LSI Corporation WarpDrive Management Utility
Version 107.00.00.04 (2012.06.05)
Copyright (c) 2011 LSI Corporation. All Rights Reserved.
****************************************************************************
ID WarpDrive Package Version PCI Address
-- --------- --------------- -----------
1 ELP-4x100-4d-n 08.05.01.00 00:11:00:00
2 ELP-4x100-4d-n 08.05.01.00 00:21:00:00
3 ELP-4x100-4d-n 08.05.01.00 00:31:00:00
4 ELP-4x100-4d-n 08.05.01.00 00:a1:00:00
5 ELP-4x100-4d-n 08.05.01.00 00:c1:00:00
6 ELP-4x100-4d-n 08.05.01.00 00:d1:00:00
LSI WarpDrive Management Utility: Execution completed successfully.
Use this output list to confirm the physical PCIe slot to be replaced by matching the PCI Address for the Flash card identified above to the list below.
ID PCI Address Physical Slot
-- ----------- -------------
1 00:11:00:00 slot 1
2 00:21:00:00 slot 2
3 00:31:00:00 slot 3
4 00:a1:00:00 slot 5
5 00:c1:00:00 slot 7
6 00:d1:00:00 slot 8
- You can also use the "locate" sub-command of ddcli to identify the card. Using this command will cause the status led to blink for a couple minutes so that the card may be identified physically. The following example will turn on the locate feature for Flash card 5 which is listed as "5 ELP-4x100-4d-n 08.05.01.00 00:c1:00:00" and is physically located in PCIe slot 7
[root@exalytics0 ~]# ddcli -c 5 -locate on
****************************************************************************
LSI Corporation WarpDrive Management Utility
Version 107.00.00.04 (2012.06.05)
Copyright (c) 2011 LSI Corporation. All Rights Reserved.
****************************************************************************
LSI WarpDrive Management Utility: Execution completed successfully.
- Once the physical location of the PCIe card to be replaced has been identified proceed to the replacement steps.
2. Prepare the server for service.
- Power off the server and disconnect the power cords from the power supplies.
- Extend the server to the maintenance position in the rack.
- Attach an anti-static wrist strap.
- Remove the top cover.
3. Locate and Remove the PCIe card.
- The server has ten PCIe slots.They are numbered 0 through 9 from left to right when you view the server from the rear (the onboard ports/connectors are located between slots 4 and 5)
- Identify the location of the PCIe slot that contains the failed Flash card using the previous steps.
- Disengage the PCIe slot crossbar from it's locked position and rotate it into it's upright position.
- Carefully remove the Flash PCIe card from the PCIe card slot by lifting it straight up from it's connector.
- Place the PCIe card on an antistatic mat.
4. Install the replacement Flash PCIe card.
- Remove the replacment Flash PCIe card from it's anti-static bag and place on an anti-static mat.
- Make sure to re-install the card into the same location from which the previous card was removed.
- Insert the PCIe card into the correct slot.
- Return the PCIe card slot crossbar to its closed and locked position to secure the PCIe cards in place.
5. Return the Server to operation
- Replace the top cover
- Remove any anti-static measures that were used.
- Return the server to it's normal operating position within the rack.
- Re-install the AC power cords and any data cables that were removed.
- Power on server. Verify that the Power/OK indicator led lights steady on.
- Allow the system to boot into the OS.
6. Confirm replacement card is healthy and identify the raid configuration type.
- After the system boots up into the OS with the replacement Flash card installed we should observe that the status LED for the new card is now lit green. Physically check to make sure the status LED of the new card is solid green.
- Execute the /opt/exalytics/bin/exalytics_CheckFlash.sh script to check on the status of the Flash cards and to see what devices are mapped to the newly replaced card. Confirm that the system now reports all 6 cards as GOOD/Healthy.
[root@exalytics0 ~]# /opt/exalytics/bin/exalytics_CheckFlash.sh
Checking Exalytics Flash Drive Status
Fetching some info on installed flash drives ....
Driver version : 01.250.41.04 (2012.06.04)
Supported number of flash drives detected (6)
Flash card 1 :
Overall health status : GOOD
Size (in MB) : 381468
Capacity (in bytes) : 400000000000
Firmware Version : 108.05.00.00
Devices: /dev/sdf /dev/sde /dev/sdd /dev/sdc
Flash card 2 :
Overall health status : GOOD
Size (in MB) : 381468
Capacity (in bytes) : 400000000000
Firmware Version : 108.05.00.00
Devices: /dev/sdj /dev/sdi /dev/sdh /dev/sdg
Flash card 3 :
Overall health status : GOOD
Size (in MB) : 381468
Capacity (in bytes) : 400000000000
Firmware Version : 108.05.00.00
Devices: /dev/sdn /dev/sdm /dev/sdl /dev/sdk
Flash card 4 :
Overall health status : GOOD
Size (in MB) : 381468
Capacity (in bytes) : 400000000000
Firmware Version : 108.05.00.00
Devices: /dev/sdr /dev/sdq /dev/sdp /dev/sdo
Flash card 5 :
Overall health status : GOOD
Size (in MB) : 381468
Capacity (in bytes) : 400000000000
Firmware Version : 108.05.00.00
Devices: /dev/sdv /dev/sdu /dev/sdt /dev/sds
Flash card 6 :
Overall health status : GOOD
Size (in MB) : 381468
Capacity (in bytes) : 400000000000
Firmware Version : 108.05.00.00
Devices: /dev/sdx /dev/sdz /dev/sdy /dev/sdw
Raid Array Info (/dev/md3):
/dev/md3: 1117.59GiB raid1 2 devices, 0 spares. Use mdadm --detail for more detail.
/dev/md3: No md super block found, not an md component.
Summary:
Healthy flash drives : 6
Broken flash drives : 0
Pass : Flash card health check passed
- At this point the card has been replaced and confirmed to be working properly from the hardware level. If the system is using a "bare-metal" install and the flash is configured as a Raid01, Raid10 or Raid05 created by using the config_flash.sh script (as done during a normal EIS install) then the following steps can be followed to bring the new card back into use by the SW raid array. If the system is Virtualized or not using a standard raid configuration then the following steps do not apply and should not be followed - the HW replacement is now complete, the system administrator will need to take care of putting the new card back into use for Virtualized and non-standard raid configurations.
- With Exalytics image versions up to 1.0.0.6 the flash creation script created a Raid01 array for the flash devices. Starting with image 1.0.0.7 the flash can be configured using either a Raid10 or Raid05. Each type requires a different process to add the newly replaced flash devices to the raid configuration. To identify the Raid configuration type check the "Raid Array Info" section near the bottom of the exalytics_CheckFlash.sh output. A Raid01 configuration will show /dev/md3 as a Raid1 with 2 devices. A Raid10 configuration will show /dev/md0 as a Raid0 with 12 devices and a Raid05 configuration will show /dev/md0 as a Raid 5 with 6 devices.
- For a Raid01 configuration follow the steps in section 7A
- For a Raid10 configuration follow the steps in section 7B
- For a Raid05 configuration follow the steps in section 7C
7A. Raid01 restoration steps.
- Looking at the output of the exalytics_CheckFlash.sh script we can see that the system has assigned the same devices to the four flash drives on the replaced card as were mapped to the original card (/dev/sdc /dev/sdd /dev/sde /dev/sdf). This is normal and expected but be aware that the Operating System may map new/different devices to the flash card. If this happens you will need to recreate the RAID using the new devices as listed. Compare to the original output from step 1 to confirm if the devices are the same or are now different.
- Check the /proc/mdstat file to see the SW Raid status.
[root@exalytics0 ~]# cat /proc/mdstat
Personalities : [raid0] [raid1]
md3 : active raid1 md1[0]
1171873472 blocks [2/1] [U_]
md2 : inactive sdh[4](S) sdk[11](S) sdm[10](S) sdl[9](S) sdn[8](S) sdj[7](S) sdg[6](S) sdi[5](S)
781249024 blocks
md1 : active raid0 sdq[0] sdz[11] sdy[10] sdw[9] sdx[8] sds[7] sdu[6] sdt[5] sdv[4] sdp[3] sdo[2] sdr[1]
1171873536 blocks 64k chunks
unused devices: <none>
- Since the replaced Flash card contains four flash devices the SW Raid will now show that the affected Raid0 device (md2 in our example) is missing four sd devices (one for each flash module on the replaced card). This device will also show as inactive so it will no longer be attached to the mirror array /dev/md3. The mirror /dev/md3 wil be active but it's listing should end with something similar to [2/1] [U_] showing that only one device of the mirror is attached.
- The inactive Raid0 md device will need to be recreated (if your system is missing a different device then adjust your commands to use the md device that is missing) First stop the md device that will be recreated.
[root@exalytics0 ~]# mdadm --stop /dev/md2
mdadm: stopped /dev/md2
- Then check the /etc/mdadm.conf file to see which devices made up the md device.
[root@exalytics0 ~]# cat /etc/mdadm.conf
ARRAY /dev/md1 level=raid0 num-devices=12 metadata=0.90 UUID=9ed80169:d18856c9:0aa0e629:d3f42c85
devices=/dev/sdq,/dev/sdr,/dev/sdo,/dev/sdp,/dev/sdv,/dev/sdt,/dev/sdu,/dev/sds,/dev/sdx,/dev/sdw,/dev/sdy,/dev/sdz
ARRAY /dev/md2 level=raid0 num-devices=12 metadata=0.90 UUID=c9fb8de5:7361d6e9:a7bb483d:d62b01e2
devices=/dev/sdf,/dev/sde,/dev/sdd,/dev/sdc,/dev/sdh,/dev/sdi,/dev/sdg,/dev/sdj,/dev/sdn,/dev/sdl,/dev/sdm,/dev/sdk
ARRAY /dev/md3 level=raid1 num-devices=2 metadata=0.90 UUID=3601f54c:a8714701:92076bd2:aee83fb2
devices=/dev/md1,/dev/md2
- Using the devices listed in the output we will re-create the raid0 md device using mdadm --create (again note that if the output of the exalytics_CheckFlash.sh script showed that the sd device names changed make sure to use the new devices listed). You will likely see a warning that some of the devices were previously part of another raid device, if you see this you will need to reply 'y' when asked if you want to continue to create the array.
[root@exalytics0 ~]# mdadm /dev/md2 --create --raid-devices=12 --level=0 /dev/sdf /dev/sde /dev/sdd /dev/sdc /dev/sdh /dev/sdi /dev/sdg /dev/sdj /dev/sdn /dev/sdl /dev/sdm /dev/sdk
mdadm: /dev/sdh appears to be part of a raid array:
level=raid0 devices=12 ctime=Thu Mar 5 16:25:05 2015
mdadm: /dev/sdi appears to be part of a raid array:
level=raid0 devices=12 ctime=Thu Mar 5 16:25:05 2015
mdadm: /dev/sdg appears to be part of a raid array:
level=raid0 devices=12 ctime=Thu Mar 5 16:25:05 2015
mdadm: /dev/sdj appears to be part of a raid array:
level=raid0 devices=12 ctime=Thu Mar 5 16:25:05 2015
mdadm: /dev/sdn appears to be part of a raid array:
level=raid0 devices=12 ctime=Thu Mar 5 16:25:05 2015
mdadm: /dev/sdl appears to be part of a raid array:
level=raid0 devices=12 ctime=Thu Mar 5 16:25:05 2015
mdadm: /dev/sdm appears to be part of a raid array:
level=raid0 devices=12 ctime=Thu Mar 5 16:25:05 2015
mdadm: /dev/sdk appears to be part of a raid array:
level=raid0 devices=12 ctime=Thu Mar 5 16:25:05 2015
Continue creating array? y
mdadm: array /dev/md2 started.
- After the Raid0 device has been created it then needs to be added to the Raid1 device. The Raid1 device should be /dev/md3 and from our example /dev/md2 is the new device to be added. (adjust the command as needed for your configuration)
[root@exalytics0 ~]# mdadm /dev/md3 --add /dev/md2
mdadm: re-added /dev/md2
- After adding the device we can check the /proc/mdstat file to confirm that it was added and /dev/md3 is now being rebuilt.
[root@exalytics0 ~]# cat /proc/mdstat
Personalities : [raid0] [raid1]
md2 : active raid0 sdk[11] sdm[10] sdl[9] sdn[8] sdj[7] sdg[6] sdi[5] sdh[4] sdc[3] sdd[2] sde[1] sdf[0]
1171873536 blocks 64k chunks
md3 : active raid1 md2[2] md1[0]
1171873472 blocks [2/1] [U_]
[>....................] recovery = 0.1% (2200064/1171873472) finish=88.6min speed=220006K/sec
md1 : active raid0 sdq[0] sdz[11] sdy[10] sdw[9] sdx[8] sds[7] sdu[6] sdt[5] sdv[4] sdp[3] sdo[2] sdr[1]
1171873536 blocks 64k chunks
unused devices: <none>
- Since the md2 device was newly created it will now have a different UUID from what was previously used by the system so the /etc/mdadm.conf file will need to be re-created. Use mdadm --detail --scan --verbose to recreate the file and then cat the file to check that it was properly created:
[root@exalytics0 ~]# mdadm --detail --scan --verbose > /etc/mdadm.conf
[root@exalytics0 ~]# cat /etc/mdadm.conf
ARRAY /dev/md1 level=raid0 num-devices=12 metadata=0.90 UUID=9ed80169:d18856c9:0aa0e629:d3f42c85
devices=/dev/sdq,/dev/sdr,/dev/sdo,/dev/sdp,/dev/sdv,/dev/sdt,/dev/sdu,/dev/sds,/dev/sdx,/dev/sdw,/dev/sdy,/dev/sdz
ARRAY /dev/md2 level=raid0 num-devices=12 metadata=0.90 UUID=b1c9a2d9:f3f67cff:378f99d5:8b66747f
devices=/dev/sdf,/dev/sde,/dev/sdd,/dev/sdc,/dev/sdh,/dev/sdi,/dev/sdg,/dev/sdj,/dev/sdn,/dev/sdl,/dev/sdm,/dev/sdk
ARRAY /dev/md3 level=raid1 num-devices=2 metadata=0.90 spares=1 UUID=3601f54c:a8714701:92076bd2:aee83fb2
devices=/dev/md1,/dev/md2
- The rebuild time will vary depending on the device sizes and system activity. If the system is actively using the flash then the rebuild time will be extended. After confirming the recovery finished successfully the Raid restoration is complete.
7B. Raid10 restoration steps.
- Looking at the output of the exalytics_CheckFlash.sh script we can see that the system has assigned the same devices to the four flash drives on the replaced card as were mapped to the original card (/dev/sdc /dev/sdd /dev/sde /dev/sdf). This is normal and expected but be aware that the Operating System may map new/different devices to the flash card. If this happens you will need to recreate the RAID using the new devices as listed. Compare to the original output from step 1 to confirm if the devices are the same or are now different.
- Since the replaced Flash card contains four flash devices the SW Raid will now show four missing Raid1 devices (one for each flash module on the replaced card). Check the /proc/mdstat file to see the SW Raid status for the arrays made up by the flash devices. Each of the md devices will be listed with the devices that they include. We should see four md devices that only have a single sd device listed and the second line for these devices will end with something similar to [2/1] [_U] showing that only one device of the mirror is attached.
[root@exalytics0 ~]# cat /proc/mdstat
Personalities : [raid1] [raid0]
md12 : active raid1 sdn[0] sdw[1]
97656128 blocks [2/2] [UU]
md11 : active raid1 sdk[0] sdz[1]
97656128 blocks [2/2] [UU]
md10 : active raid1 sdm[0] sdy[1]
97656128 blocks [2/2] [UU]
md9 : active raid1 sdl[0] sdx[1]
97656128 blocks [2/2] [UU]
md8 : active raid1 sdi[0] sdt[1]
97656128 blocks [2/2] [UU]
md7 : active raid1 sdh[0] sds[1]
97656128 blocks [2/2] [UU]
md6 : active raid1 sdg[0] sdu[1]
97656128 blocks [2/2] [UU]
md5 : active raid1 sdj[0] sdv[1]
97656128 blocks [2/2] [UU]
md4 : active raid1 sdr[1]
97656128 blocks [2/1] [_U]
md3 : active raid1 sdo[1]
97656128 blocks [2/1] [_U]
md2 : active raid1 sdq[1]
97656128 blocks [2/1] [_U]
md1 : active raid1 sdp[1]
97656128 blocks [2/1] [_U]
md0 : active raid0 md1[0] md12[11] md11[10] md10[9] md9[8] md8[7] md7[6] md6[5] md5[4] md4[3] md3[2] md2[1]
1171872768 blocks 64k chunks
unused devices: <none>
- For each of the md devices missing a drive we need to add the drive back to the mirror device. In this example devices md1, md2, md3, md4 need to be fixed. Check the /etc/mdadm.conf file to see what the correct configuration should be.
[root@exalytics0 ~]# cat /etc/mdadm.conf
ARRAY /dev/md1 level=raid1 num-devices=2 metadata=0.90 UUID=89faee91:6e526f55:26db5046:da08a63e
devices=/dev/sdc,/dev/sdp
ARRAY /dev/md2 level=raid1 num-devices=2 metadata=0.90 UUID=a0281a64:a994b40f:5a4ea80d:91e32d2e
devices=/dev/sdd,/dev/sdq
ARRAY /dev/md3 level=raid1 num-devices=2 metadata=0.90 UUID=63f1ff4e:9a61866e:806e498c:5a831c6f
devices=/dev/sdf,/dev/sdo
ARRAY /dev/md4 level=raid1 num-devices=2 metadata=0.90 UUID=a661d73a:9e931391:b27c46ff:d688bf32
devices=/dev/sde,/dev/sdr
ARRAY /dev/md5 level=raid1 num-devices=2 metadata=0.90 UUID=4b322f2e:2bff469b:21598d07:c5615470
devices=/dev/sdj,/dev/sdv
ARRAY /dev/md6 level=raid1 num-devices=2 metadata=0.90 UUID=c520a9d7:e999463b:5e80ef1b:f264f0be
devices=/dev/sdg,/dev/sdu
ARRAY /dev/md7 level=raid1 num-devices=2 metadata=0.90 UUID=e76868ea:05739f1e:e8cf3d6f:04458164
devices=/dev/sdh,/dev/sds
ARRAY /dev/md8 level=raid1 num-devices=2 metadata=0.90 UUID=addff802:9e1224a8:ac162f13:fdbfb9ea
devices=/dev/sdi,/dev/sdt
ARRAY /dev/md9 level=raid1 num-devices=2 metadata=0.90 UUID=122a324a:db47c2cc:7ed15282:8852295f
devices=/dev/sdl,/dev/sdx
ARRAY /dev/md10 level=raid1 num-devices=2 metadata=0.90 UUID=a4fcb5ad:eaeaee79:e915db55:f7b8fad5
devices=/dev/sdm,/dev/sdy
ARRAY /dev/md11 level=raid1 num-devices=2 metadata=0.90 UUID=9bd49315:193992a3:b9f33e0d:b8ce8dfc
devices=/dev/sdk,/dev/sdz
ARRAY /dev/md12 level=raid1 num-devices=2 metadata=0.90 UUID=8ffd1fdd:e74dea56:7262b350:208b376c
devices=/dev/sdn,/dev/sdw
ARRAY /dev/md0 level=raid0 num-devices=12 metadata=0.90 UUID=7dcd48a9:d3caab2b:9b18df73:a229b9a4
devices=/dev/md1,/dev/md2,/dev/md3,/dev/md4,/dev/md5,/dev/md6,/dev/md7,/dev/md8,/dev/md9,/dev/md10,/dev/md11,/dev/md12
-
In our example we match the four md devices to the sd devices they should be made up of so that we can add the correct sd device to the correct md. Here we can see that we need to add sdc to md1, sdd to md2, sdf to md3 and sde to md4 because these are the devices missing their second disk. This is done by comparing the mdstat output to the mdadm.conf file and seeing what the md device should have vs. what they do have. We need our four md devices to contain their proper sd devices:
/dev/md1 - /dev/sdc, /dev/sdp /dev/md2 - /dev/sdd, /dev/sdq /dev/md3 - /dev/sdf, /dev/sdo /dev/md4 - /dev/sde, /dev/sdr
- Use the mdadm --add command to add the replaced device to each of the four md devices missing their drives:
[root@exalytics0 ~]# mdadm /dev/md1 --add /dev/sdc
mdadm: added /dev/sdc
[root@exalytics0 ~]# mdadm /dev/md2 --add /dev/sdd
mdadm: added /dev/sdd
[root@exalytics0 ~]# mdadm /dev/md3 --add /dev/sdf
mdadm: added /dev/sdf
[root@exalytics0 ~]# mdadm /dev/md4 --add /dev/sde
mdadm: added /dev/sde
- After adding the devices we can check the /proc/mdstat file to confirm that they were added and are now being rebuilt.
[root@exalytics0 ~]# cat /proc/mdstat Personalities : [raid1] [raid0] md12 : active raid1 sdn[0] sdw[1] 97656128 blocks [2/2] [UU] md11 : active raid1 sdk[0] sdz[1] 97656128 blocks [2/2] [UU] md10 : active raid1 sdm[0] sdy[1] 97656128 blocks [2/2] [UU] md9 : active raid1 sdl[0] sdx[1] 97656128 blocks [2/2] [UU] md8 : active raid1 sdi[0] sdt[1] 97656128 blocks [2/2] [UU] md7 : active raid1 sdh[0] sds[1] 97656128 blocks [2/2] [UU] md6 : active raid1 sdg[0] sdu[1] 97656128 blocks [2/2] [UU] md5 : active raid1 sdj[0] sdv[1] 97656128 blocks [2/2] [UU] md4 : active raid1 sde[2] sdr[1] 97656128 blocks [2/1] [_U] [===>.................] recovery = 17.6% (17241472/97656128) finish=6.4min speed=206060K/sec md3 : active raid1 sdf[2] sdo[1] 97656128 blocks [2/1] [_U] [====>................] recovery = 20.0% (19615872/97656128) finish=6.3min speed=204086K/sec md2 : active raid1 sdd[2] sdq[1] 97656128 blocks [2/1] [_U] [====>................] recovery = 22.8% (22334848/97656128) finish=6.0min speed=206060K/sec md1 : active raid1 sdc[2] sdp[1] 97656128 blocks [2/1] [_U] [=====>...............] recovery = 28.2% (27618432/97656128) finish=5.6min speed=206250K/sec md0 : active raid0 md1[0] md12[11] md11[10] md10[9] md9[8] md8[7] md7[6] md6[5] md5[4] md4[3] md3[2] md2[1] 1171872768 blocks 64k chunks unused devices: <none>
- The rebuild time will vary depending on the device sizes and system activity. If the system is actively using the flash then the rebuild time will be extended. After confirming that the recovery for each device finished successfully the Raid restoration is complete.
7C. Raid05 restoration steps.
- Looking at the output of the exalytics_CheckFlash.sh script we can see that the system has assigned the same devices to the four flash drives on the replaced card as were mapped to the original card (/dev/sdc /dev/sdd /dev/sde /dev/sdf). This is normal and expected but be aware that the Operating System may map new/different devices to the flash card. If this happens you will need to recreate the RAID using the new devices as listed. Compare to the original output from step 1 to confirm if the devices are the same or are now different.
- Check the /proc/mdstat file to see the SW Raid status.
[root@exalytics0 ~]# cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4]
md6 : active raid0 sdy[0] sdz[3] sdx[2] sdw[1]
390624512 blocks 64k chunks
md5 : active raid0 sdu[0] sdv[3] sds[2] sdt[1]
390624512 blocks 64k chunks
md4 : active raid0 sdp[0] sdo[3] sdq[2] sdr[1]
390624512 blocks 64k chunks
md3 : active raid0 sdm[0] sdk[3] sdn[2] sdl[1]
390624512 blocks 64k chunks
md2 : active raid0 sdg[0] sdj[3] sdh[2] sdi[1]
390624512 blocks 64k chunks
md0 : active raid5 md2[1] md6[5] md5[4] md4[3] md3[2]
1953122240 blocks level 5, 64k chunk, algorithm 2 [6/5] [_UUUUU]
unused devices: <none>
- Since the replaced Flash card contained all four flash disks that made up one of the Raid0 devices we will see that our Raid5 device (/dev/md0) shows that one of it's six devices is now missing. /dev/md0 should show something similar to [6/5] [_UUUUU] at the end of it's output showing that only 5 of the 6 devices are attached. md0 should be made up of md1, md2, md3, md4, md5 and md6 but in our example we are missing md1. So md1 will need to be re-created. (if your system is missing a different device then adjust your commands to use the md device that is missing) Using the devices listed in the exalytics_CheckFlash.sh output as discussed in step a we will re-create the raid0 md device using mdadm --create. (If the flash card being used for replacement was previously setup in a raid configuration then you may see a warning that the device was previously part of another raid device, if you see this you will need to reply 'y' when asked if you want to continue to create the array)
[root@exalytics0 ~]# mdadm /dev/md1 --create --raid-devices=4 --level=0 /dev/sdc /dev/sdd /dev/sde /dev/sdf
mdadm: array /dev/md1 started.
- After the Raid0 device has been created it then needs to be added to the Raid1 device. The Raid1 device should be /dev/md0 and from our example /dev/md1 is the new device to be added. (adjust the command as needed for your configuration)
[root@exalytics0 ~]# mdadm /dev/md0 --add /dev/md1
mdadm: added /dev/md1
- After adding the device we can check the /proc/mdstat file to confirm that it was added and /dev/md0 is now being rebuilt.
[root@exalytics0 ~]# cat /proc/mdstat
Personalities : [raid0] [raid6] [raid5] [raid4]
md1 : active raid0 sdf[3] sde[2] sdd[1] sdc[0]
390624512 blocks 64k chunks
md6 : active raid0 sdy[0] sdz[3] sdx[2] sdw[1]
390624512 blocks 64k chunks
md5 : active raid0 sdu[0] sdv[3] sds[2] sdt[1]
390624512 blocks 64k chunks
md4 : active raid0 sdp[0] sdo[3] sdq[2] sdr[1]
390624512 blocks 64k chunks
md3 : active raid0 sdm[0] sdk[3] sdn[2] sdl[1]
390624512 blocks 64k chunks
md2 : active raid0 sdg[0] sdj[3] sdh[2] sdi[1]
390624512 blocks 64k chunks
md0 : active raid5 md1[6] md2[1] md6[5] md5[4] md4[3] md3[2]
1953122240 blocks level 5, 64k chunk, algorithm 2 [6/5] [_UUUUU]
[>....................] recovery = 0.2% (982644/390624448) finish=46.2min speed=140377K/sec
unused devices: <none>
- Since the md1 device was newly created it will now have a different UUID from what was previously used by the system so the /etc/mdadm.conf file will need to be re-created. Use mdadm --detail --scan --verbose to recreate the file and then cat the file to check that it was properly created:
[root@exalytics0 ~]# mdadm --detail --scan --verbose > /etc/mdadm.conf
[root@exalytics0 ~]# cat /etc/mdadm.conf
ARRAY /dev/md2 level=raid0 num-devices=4 metadata=0.90 UUID=438c339d:73a3b0b9:1327795f:7d646e9a
devices=/dev/sdg,/dev/sdi,/dev/sdh,/dev/sdj
ARRAY /dev/md3 level=raid0 num-devices=4 metadata=0.90 UUID=b283e2c7:6ebb785c:2491a388:7d818402
devices=/dev/sdm,/dev/sdl,/dev/sdn,/dev/sdk
ARRAY /dev/md4 level=raid0 num-devices=4 metadata=0.90 UUID=1df3cb6d:9e704f02:5c46d821:1d7a18ce
devices=/dev/sdp,/dev/sdr,/dev/sdq,/dev/sdo
ARRAY /dev/md5 level=raid0 num-devices=4 metadata=0.90 UUID=83146227:c25f757c:6a524238:a131fa8a
devices=/dev/sdu,/dev/sdt,/dev/sds,/dev/sdv
ARRAY /dev/md6 level=raid0 num-devices=4 metadata=0.90 UUID=372b91b2:e646c239:a5de6d0b:1a536762
devices=/dev/sdy,/dev/sdw,/dev/sdx,/dev/sdz
ARRAY /dev/md1 level=raid0 num-devices=4 metadata=0.90 UUID=c0c59206:4d0e8e4a:b90a8500:a1fdacac
devices=/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf
ARRAY /dev/md0 level=raid5 num-devices=6 metadata=0.90 spares=1 UUID=8779e6d9:6d027e33:62be3a70:25f1cf5c
devices=/dev/md1,/dev/md2,/dev/md3,/dev/md4,/dev/md5,/dev/md6
- The rebuild time will vary depending on the device sizes and system activity. If the system is actively using the flash then the rebuild time will be extended. After confirming the recovery for each device finished successfully the Raid restoration is complete.
OBTAIN CUSTOMER ACCEPTANCE
WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:
Boot up system and verify full functionality
REFERENCE INFORMATION:
Oracle Exalytics In-Memory Machine Documentation Library
http://docs.oracle.com/cd/E41246_01/index.htm
Sun Server X2-4 Documentation
http://docs.oracle.com/cd/E20781_01/index.html
Attachments
This solution has no attachment
|