Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1599510.1
Update Date:2018-04-30
Keywords:

Solution Type  Technical Instruction Sure

Solution  1599510.1 :   How to Replace a Flash Accelerator PCIe Card in an Oracle Exalytics X2-4 or X3-4 system  


Related Items
  • Exalytics In-Memory Machine X2-4
  •  
  • Exalytics In-Memory Machine X3-4
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
  •  




Oracle Confidential INTERNAL - Do not distribute to customer (OracleConfidential).
Reason: internal support doc All Exalytic components are FRU replaced

Applies to:

Exalytics In-Memory Machine X3-4 - Version All Versions to All Versions [Release All Releases]
Exalytics In-Memory Machine X2-4 - Version Not Applicable to Not Applicable [Release N/A]
x86_64

Goal

How to Replace a Flash Accelerator PCIe Card in an Oracle Exalytics X2-4 or X3-4 system

Solution

CAP PROBLEM OVERVIEW: F40/F80 Flash PCIe Card replacement

DISPATCH INSTRUCTIONS

WHAT SKILLS DOES THE ENGINEER NEED:
Oracle Exalytics Server Training

TIME ESTIMATE: 60 minutes

TASK COMPLEXITY: 3-FRU

FIELD ENGINEER INSTRUCTIONS

WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY? :

If the system is still up and functioning, the customer should be ready to perform an orderly and graceful shutdown of applications and OS. Access to the system's OS root login may be needed if the flash card failure needs to be confirmed.

A data backup is not a prerequisite but is a wise precaution.

WHAT ACTION DOES THE ENGINEER NEED TO TAKE:

1. Check the Flash card status and confirm/identify the failed card.

  1. Check the status of the flash cards using the exalytics_CheckFlash.sh script. The following example shows a failure of one of the devices on Flash Card 1 (extra output from the other cards has been cut from the output for brevity)
    [root@exalytics0 ~]# /opt/exalytics/bin/exalytics_CheckFlash.sh
    Checking Exalytics Flash Drive Status
    
    Fetching some info on installed flash drives ....
    Driver version :   01.250.41.04 (2012.06.04) 
    
    Supported number of flash drives detected (6)
    
    Flash card 1 :
    Overall health status : ERROR. Use --detail for more info
    Size (in MB) : 286101
    Capacity (in bytes) : 300000000000
    Firmware Version : 109.05.26.00
    Devices:  /dev/sde /dev/sdc /dev/sdf
    
    
    :
    ---cut---
    :
    Raid Array Info (/dev/md3):
    /dev/md3: 1117.59GiB raid1 2 devices, 0 spares. Use mdadm --detail for more detail.
    /dev/md3: No md super block found, not an md component.
    
    
    Summary:
    Healthy flash drives : 5
    Broken flash drives  : 1
    Fail : Flash card health check failed. See above for more details.
     
  2. The script will report an "ERROR" on the health status line for the card that has experienced a failure. In the above example we see that device /dev/sdd has failed and is no longer seen by Flash card 1. Make note of the devices assigned to this card so that they can be checked against the devices assigned to the replacement card later.
  3. A failed Flash card should have it's status LED lit amber or red. The status LED is the middle led on the rear of the card. (the top led is the Life LED and the bottom is the Activity LED). Check the rear of the system to confirm that the failed card can be identified by an amber/red status LED. For a normal/non-failed card this LED should be solid green. If the failed card to be replaced can be identified by it's status led make note of it's location and proceed to the next step to perform the physical replacment of the card.
  4. If the card has failed in such a way that the status LED is not showing a fault then the card to be replaced will need to be identified manually. In an Exalytics X2-4 or X3-4 system the Flash cards will populate PCIe slots 1,2,3,5,7,8. In step 1a above we should have identified the flash card that has failed. We can use this information to identify the card to be replaced by matching the ID number to the list of flashcards in the "ddcli -listall" command output to get the PCI Address.
    [root@exalytics0 ~]# /opt/exalytics/flashUtil/ddcli -listall
    
    ****************************************************************************
       LSI Corporation WarpDrive Management Utility
       Version  107.00.00.04 (2012.06.05) 
       Copyright (c) 2011 LSI Corporation. All Rights Reserved.
    ****************************************************************************
    
    ID    WarpDrive     Package Version    PCI Address    
    --    ---------     ---------------    -----------    
    1     ELP-4x100-4d-n    08.05.01.00        00:11:00:00
    2     ELP-4x100-4d-n    08.05.01.00        00:21:00:00
    3     ELP-4x100-4d-n    08.05.01.00        00:31:00:00
    4     ELP-4x100-4d-n    08.05.01.00        00:a1:00:00
    5     ELP-4x100-4d-n    08.05.01.00        00:c1:00:00
    6     ELP-4x100-4d-n    08.05.01.00        00:d1:00:00
    
    LSI WarpDrive Management Utility: Execution completed successfully.
    Use this output list to confirm the physical PCIe slot to be replaced by matching the PCI Address for the Flash card identified above to the list below.
    ID  PCI Address    Physical Slot
    --  -----------    -------------
    1   00:11:00:00    slot 1
    2   00:21:00:00    slot 2
    3   00:31:00:00    slot 3
    4   00:a1:00:00    slot 5
    5   00:c1:00:00    slot 7
    6   00:d1:00:00    slot 8  
  5. You can also use the "locate" sub-command of ddcli to identify the card. Using this command will cause the status led to blink for a couple minutes so that the card may be identified physically. The following example will turn on the locate feature for Flash card 5 which is listed as "5     ELP-4x100-4d-n    08.05.01.00        00:c1:00:00" and is physically located in PCIe slot 7 
    [root@exalytics0 ~]# ddcli -c 5 -locate on
    
    ****************************************************************************
       LSI Corporation WarpDrive Management Utility
       Version  107.00.00.04 (2012.06.05) 
       Copyright (c) 2011 LSI Corporation. All Rights Reserved.
    ****************************************************************************
    
    LSI WarpDrive Management Utility: Execution completed successfully.
     
  6. Once the physical location of the PCIe card to be replaced has been identified proceed to the replacement steps.

2. Prepare the server for service.

  1. Power off the server and disconnect the power cords from the power supplies.
  2. Extend the server to the maintenance position in the rack.
  3. Attach an anti-static wrist strap.
  4. Remove the top cover.

3. Locate and Remove the PCIe card.

  1. The server has ten PCIe slots.They are numbered 0 through 9 from left to right when you view the server from the rear (the onboard ports/connectors are located between slots 4 and 5)
  2. Identify the location of the PCIe slot that contains the failed Flash card using the previous steps.
  3. Disengage the PCIe slot crossbar from it's locked position and rotate it into it's upright position.
  4. Carefully remove the Flash PCIe card from the PCIe card slot by lifting it straight up from it's connector.
  5. Place the PCIe card on an antistatic mat.

4. Install the replacement Flash PCIe card.

  1. Remove the replacment Flash PCIe card from it's anti-static bag and place on an anti-static mat.
  2. Make sure to re-install the card into the same location from which the previous card was removed.
  3. Insert the PCIe card into the correct slot.
  4. Return the PCIe card slot crossbar to its closed and locked position to secure the PCIe cards in place.

5. Return the Server to operation

  1. Replace the top cover
  2. Remove any anti-static measures that were used.
  3. Return the server to it's normal operating position within the rack.
  4. Re-install the AC power cords and any data cables that were removed.
  5. Power on server. Verify that the Power/OK indicator led lights steady on.
  6. Allow the system to boot into the OS.

6. Confirm replacement card is healthy and identify the raid configuration type.

  1. After the system boots up into the OS with the replacement Flash card installed we should observe that the status LED for the new card is now lit green. Physically check to make sure the status LED of the new card is solid green.
  2. Execute the /opt/exalytics/bin/exalytics_CheckFlash.sh script to check on the status of the Flash cards and to see what devices are mapped to the newly replaced card. Confirm that the system now reports all 6 cards as GOOD/Healthy.
    [root@exalytics0 ~]# /opt/exalytics/bin/exalytics_CheckFlash.sh
    Checking Exalytics Flash Drive Status
    
    Fetching some info on installed flash drives ....
    Driver version :   01.250.41.04 (2012.06.04) 
    
    Supported number of flash drives detected (6)
    
    Flash card 1 :
    Overall health status : GOOD
    Size (in MB) : 381468
    Capacity (in bytes) : 400000000000
    Firmware Version : 108.05.00.00
    Devices:  /dev/sdf /dev/sde /dev/sdd /dev/sdc
    
    Flash card 2 :
    Overall health status : GOOD
    Size (in MB) : 381468
    Capacity (in bytes) : 400000000000
    Firmware Version : 108.05.00.00
    Devices:  /dev/sdj /dev/sdi /dev/sdh /dev/sdg
    
    Flash card 3 :
    Overall health status : GOOD
    Size (in MB) : 381468
    Capacity (in bytes) : 400000000000
    Firmware Version : 108.05.00.00
    Devices:  /dev/sdn /dev/sdm /dev/sdl /dev/sdk
    
    Flash card 4 :
    Overall health status : GOOD
    Size (in MB) : 381468
    Capacity (in bytes) : 400000000000
    Firmware Version : 108.05.00.00
    Devices:  /dev/sdr /dev/sdq /dev/sdp /dev/sdo
    
    Flash card 5 :
    Overall health status : GOOD
    Size (in MB) : 381468
    Capacity (in bytes) : 400000000000
    Firmware Version : 108.05.00.00
    Devices:  /dev/sdv /dev/sdu /dev/sdt /dev/sds
    
    Flash card 6 :
    Overall health status : GOOD
    Size (in MB) : 381468
    Capacity (in bytes) : 400000000000
    Firmware Version : 108.05.00.00
    Devices:  /dev/sdx /dev/sdz /dev/sdy /dev/sdw
    
    Raid Array Info (/dev/md3):
    /dev/md3: 1117.59GiB raid1 2 devices, 0 spares. Use mdadm --detail for more detail.
    /dev/md3: No md super block found, not an md component.
    
    
    Summary:
    Healthy flash drives : 6
    Broken flash drives  : 0
    Pass : Flash card health check passed
  3. At this point the card has been replaced and confirmed to be working properly from the hardware level. If the system is using a "bare-metal" install and the flash is configured as a Raid01, Raid10 or Raid05 created by using the config_flash.sh script (as done during a normal EIS install) then the following steps can be followed to bring the new card back into use by the SW raid array. If the system is Virtualized or not using a standard raid configuration then the following steps do not apply and should not be followed - the HW replacement is now complete, the system administrator will need to take care of putting the new card back into use for Virtualized and non-standard raid configurations.
  4. With Exalytics image versions up to 1.0.0.6 the flash creation script created a Raid01 array for the flash devices. Starting with image 1.0.0.7 the flash can be configured using either a Raid10 or Raid05. Each type requires a different process to add the newly replaced flash devices to the raid configuration. To identify the Raid configuration type check the "Raid Array Info" section near the bottom of the exalytics_CheckFlash.sh output. A Raid01 configuration will show /dev/md3 as a Raid1 with 2 devices. A Raid10 configuration will show /dev/md0 as a Raid0 with 12 devices and a Raid05 configuration will show /dev/md0 as a Raid 5 with 6 devices.
    • For a Raid01 configuration follow the steps in section 7A
    • For a Raid10 configuration follow the steps in section 7B
    • For a Raid05 configuration follow the steps in section 7C

7A. Raid01 restoration steps.

  1. Looking at the output of the exalytics_CheckFlash.sh script we can see that the system has assigned the same devices to the four flash drives on the replaced card as were mapped to the original card (/dev/sdc /dev/sdd /dev/sde /dev/sdf). This is normal and expected but be aware that the Operating System may map new/different devices to the flash card. If this happens you will need to recreate the RAID using the new devices as listed. Compare to the original output from step 1 to confirm if the devices are the same or are now different.
  2. Check the /proc/mdstat file to see the SW Raid status.
    [root@exalytics0 ~]# cat /proc/mdstat
    Personalities : [raid0] [raid1] 
    md3 : active raid1 md1[0]
          1171873472 blocks [2/1] [U_]
          
    md2 : inactive sdh[4](S) sdk[11](S) sdm[10](S) sdl[9](S) sdn[8](S) sdj[7](S) sdg[6](S) sdi[5](S)
          781249024 blocks
           
    md1 : active raid0 sdq[0] sdz[11] sdy[10] sdw[9] sdx[8] sds[7] sdu[6] sdt[5] sdv[4] sdp[3] sdo[2] sdr[1]
          1171873536 blocks 64k chunks
          
    unused devices: <none>
  3. Since the replaced Flash card contains four flash devices the SW Raid will now show that the affected Raid0 device (md2 in our example) is missing four sd devices (one for each flash module on the replaced card). This device will also show as inactive so it will no longer be attached to the mirror array /dev/md3. The mirror /dev/md3 wil be active but it's listing should end with something similar to [2/1] [U_] showing that only one device of the mirror is attached.
  4. The inactive Raid0 md device will need to be recreated (if your system is missing a different device then adjust your commands to use the md device that is missing) First stop the md device that will be recreated.  
    [root@exalytics0 ~]# mdadm --stop /dev/md2
    mdadm: stopped /dev/md2
  5. Then check the /etc/mdadm.conf file to see which devices made up the md device.    
    [root@exalytics0 ~]# cat /etc/mdadm.conf 
    ARRAY /dev/md1 level=raid0 num-devices=12 metadata=0.90 UUID=9ed80169:d18856c9:0aa0e629:d3f42c85
       devices=/dev/sdq,/dev/sdr,/dev/sdo,/dev/sdp,/dev/sdv,/dev/sdt,/dev/sdu,/dev/sds,/dev/sdx,/dev/sdw,/dev/sdy,/dev/sdz
    ARRAY /dev/md2 level=raid0 num-devices=12 metadata=0.90 UUID=c9fb8de5:7361d6e9:a7bb483d:d62b01e2
       devices=/dev/sdf,/dev/sde,/dev/sdd,/dev/sdc,/dev/sdh,/dev/sdi,/dev/sdg,/dev/sdj,/dev/sdn,/dev/sdl,/dev/sdm,/dev/sdk
    ARRAY /dev/md3 level=raid1 num-devices=2 metadata=0.90 UUID=3601f54c:a8714701:92076bd2:aee83fb2
       devices=/dev/md1,/dev/md2
  6. Using the devices listed in the output we will re-create the raid0 md device using mdadm --create (again note that if the output of the exalytics_CheckFlash.sh script showed that the sd device names changed make sure to use the new devices listed). You will likely see a warning that some of the devices were previously part of another raid device, if you see this you will need to reply 'y' when asked if you want to continue to create the array.    
    [root@exalytics0 ~]# mdadm /dev/md2 --create --raid-devices=12 --level=0 /dev/sdf /dev/sde /dev/sdd /dev/sdc /dev/sdh /dev/sdi /dev/sdg /dev/sdj /dev/sdn /dev/sdl /dev/sdm /dev/sdk
    mdadm: /dev/sdh appears to be part of a raid array:
        level=raid0 devices=12 ctime=Thu Mar  5 16:25:05 2015
    mdadm: /dev/sdi appears to be part of a raid array:
        level=raid0 devices=12 ctime=Thu Mar  5 16:25:05 2015
    mdadm: /dev/sdg appears to be part of a raid array:
        level=raid0 devices=12 ctime=Thu Mar  5 16:25:05 2015
    mdadm: /dev/sdj appears to be part of a raid array:
        level=raid0 devices=12 ctime=Thu Mar  5 16:25:05 2015
    mdadm: /dev/sdn appears to be part of a raid array:
        level=raid0 devices=12 ctime=Thu Mar  5 16:25:05 2015
    mdadm: /dev/sdl appears to be part of a raid array:
        level=raid0 devices=12 ctime=Thu Mar  5 16:25:05 2015
    mdadm: /dev/sdm appears to be part of a raid array:
        level=raid0 devices=12 ctime=Thu Mar  5 16:25:05 2015
    mdadm: /dev/sdk appears to be part of a raid array:
        level=raid0 devices=12 ctime=Thu Mar  5 16:25:05 2015
    Continue creating array? y
    mdadm: array /dev/md2 started.
  7. After the Raid0 device has been created it then needs to be added to the Raid1 device. The Raid1 device should be /dev/md3 and from our example /dev/md2 is the new device to be added. (adjust the command as needed for your configuration)   
    [root@exalytics0 ~]# mdadm /dev/md3 --add /dev/md2
    mdadm: re-added /dev/md2
  8. After adding the device we can check the /proc/mdstat file to confirm that it was added and /dev/md3 is now being rebuilt.     
    [root@exalytics0 ~]# cat /proc/mdstat
    Personalities : [raid0] [raid1] 
    md2 : active raid0 sdk[11] sdm[10] sdl[9] sdn[8] sdj[7] sdg[6] sdi[5] sdh[4] sdc[3] sdd[2] sde[1] sdf[0]
          1171873536 blocks 64k chunks
          
    md3 : active raid1 md2[2] md1[0]
          1171873472 blocks [2/1] [U_]
          [>....................]  recovery =  0.1% (2200064/1171873472) finish=88.6min speed=220006K/sec
          
    md1 : active raid0 sdq[0] sdz[11] sdy[10] sdw[9] sdx[8] sds[7] sdu[6] sdt[5] sdv[4] sdp[3] sdo[2] sdr[1]
          1171873536 blocks 64k chunks
          
    unused devices: <none>
  9. Since the md2 device was newly created it will now have a different UUID from what was previously used by the system so the /etc/mdadm.conf file will need to be re-created. Use mdadm --detail --scan --verbose to recreate the file and then cat the file to check that it was properly created:    
    [root@exalytics0 ~]# mdadm --detail --scan --verbose > /etc/mdadm.conf
    [root@exalytics0 ~]# cat /etc/mdadm.conf 
    ARRAY /dev/md1 level=raid0 num-devices=12 metadata=0.90 UUID=9ed80169:d18856c9:0aa0e629:d3f42c85
       devices=/dev/sdq,/dev/sdr,/dev/sdo,/dev/sdp,/dev/sdv,/dev/sdt,/dev/sdu,/dev/sds,/dev/sdx,/dev/sdw,/dev/sdy,/dev/sdz
    ARRAY /dev/md2 level=raid0 num-devices=12 metadata=0.90 UUID=b1c9a2d9:f3f67cff:378f99d5:8b66747f
       devices=/dev/sdf,/dev/sde,/dev/sdd,/dev/sdc,/dev/sdh,/dev/sdi,/dev/sdg,/dev/sdj,/dev/sdn,/dev/sdl,/dev/sdm,/dev/sdk
    ARRAY /dev/md3 level=raid1 num-devices=2 metadata=0.90 spares=1 UUID=3601f54c:a8714701:92076bd2:aee83fb2
       devices=/dev/md1,/dev/md2
  10. The rebuild time will vary depending on the device sizes and system activity. If the system is actively using the flash then the rebuild time will be extended. After confirming the recovery finished successfully the Raid restoration is complete.

7B. Raid10 restoration steps.

  1. Looking at the output of the exalytics_CheckFlash.sh script we can see that the system has assigned the same devices to the four flash drives on the replaced card as were mapped to the original card (/dev/sdc /dev/sdd /dev/sde /dev/sdf). This is normal and expected but be aware that the Operating System may map new/different devices to the flash card. If this happens you will need to recreate the RAID using the new devices as listed. Compare to the original output from step 1 to confirm if the devices are the same or are now different.
  2. Since the replaced Flash card contains four flash devices the SW Raid will now show four missing Raid1 devices (one for each flash module on the replaced card). Check the /proc/mdstat file to see the SW Raid status for the arrays made up by the flash devices. Each of the md devices will be listed with the devices that they include. We should see four md devices that only have a single sd device listed and the second line for these devices will end with something similar to [2/1] [_U] showing that only one device of the mirror is attached.
    [root@exalytics0 ~]# cat /proc/mdstat
    Personalities : [raid1] [raid0] 
    md12 : active raid1 sdn[0] sdw[1]
          97656128 blocks [2/2] [UU]
          
    md11 : active raid1 sdk[0] sdz[1]
          97656128 blocks [2/2] [UU]
          
    md10 : active raid1 sdm[0] sdy[1]
          97656128 blocks [2/2] [UU]
          
    md9 : active raid1 sdl[0] sdx[1]
          97656128 blocks [2/2] [UU]
          
    md8 : active raid1 sdi[0] sdt[1]
          97656128 blocks [2/2] [UU]
          
    md7 : active raid1 sdh[0] sds[1]
          97656128 blocks [2/2] [UU]
          
    md6 : active raid1 sdg[0] sdu[1]
          97656128 blocks [2/2] [UU]
          
    md5 : active raid1 sdj[0] sdv[1]
          97656128 blocks [2/2] [UU]
          
    md4 : active raid1 sdr[1]
          97656128 blocks [2/1] [_U]
          
    md3 : active raid1 sdo[1]
          97656128 blocks [2/1] [_U]
          
    md2 : active raid1 sdq[1]
          97656128 blocks [2/1] [_U]
          
    md1 : active raid1 sdp[1]
          97656128 blocks [2/1] [_U]
          
    md0 : active raid0 md1[0] md12[11] md11[10] md10[9] md9[8] md8[7] md7[6] md6[5] md5[4] md4[3] md3[2] md2[1]
          1171872768 blocks 64k chunks
          
    unused devices: <none>
    
  3. For each of the md devices missing a drive we need to add the drive back to the mirror device. In this example devices md1, md2, md3, md4 need to be fixed. Check the /etc/mdadm.conf file to see what the correct configuration should be.  
    [root@exalytics0 ~]# cat /etc/mdadm.conf
    ARRAY /dev/md1 level=raid1 num-devices=2 metadata=0.90 UUID=89faee91:6e526f55:26db5046:da08a63e
       devices=/dev/sdc,/dev/sdp
    ARRAY /dev/md2 level=raid1 num-devices=2 metadata=0.90 UUID=a0281a64:a994b40f:5a4ea80d:91e32d2e
       devices=/dev/sdd,/dev/sdq
    ARRAY /dev/md3 level=raid1 num-devices=2 metadata=0.90 UUID=63f1ff4e:9a61866e:806e498c:5a831c6f
       devices=/dev/sdf,/dev/sdo
    ARRAY /dev/md4 level=raid1 num-devices=2 metadata=0.90 UUID=a661d73a:9e931391:b27c46ff:d688bf32
       devices=/dev/sde,/dev/sdr
    ARRAY /dev/md5 level=raid1 num-devices=2 metadata=0.90 UUID=4b322f2e:2bff469b:21598d07:c5615470
       devices=/dev/sdj,/dev/sdv
    ARRAY /dev/md6 level=raid1 num-devices=2 metadata=0.90 UUID=c520a9d7:e999463b:5e80ef1b:f264f0be
       devices=/dev/sdg,/dev/sdu
    ARRAY /dev/md7 level=raid1 num-devices=2 metadata=0.90 UUID=e76868ea:05739f1e:e8cf3d6f:04458164
       devices=/dev/sdh,/dev/sds
    ARRAY /dev/md8 level=raid1 num-devices=2 metadata=0.90 UUID=addff802:9e1224a8:ac162f13:fdbfb9ea
       devices=/dev/sdi,/dev/sdt
    ARRAY /dev/md9 level=raid1 num-devices=2 metadata=0.90 UUID=122a324a:db47c2cc:7ed15282:8852295f
       devices=/dev/sdl,/dev/sdx
    ARRAY /dev/md10 level=raid1 num-devices=2 metadata=0.90 UUID=a4fcb5ad:eaeaee79:e915db55:f7b8fad5
       devices=/dev/sdm,/dev/sdy
    ARRAY /dev/md11 level=raid1 num-devices=2 metadata=0.90 UUID=9bd49315:193992a3:b9f33e0d:b8ce8dfc
       devices=/dev/sdk,/dev/sdz
    ARRAY /dev/md12 level=raid1 num-devices=2 metadata=0.90 UUID=8ffd1fdd:e74dea56:7262b350:208b376c
       devices=/dev/sdn,/dev/sdw
    ARRAY /dev/md0 level=raid0 num-devices=12 metadata=0.90 UUID=7dcd48a9:d3caab2b:9b18df73:a229b9a4
       devices=/dev/md1,/dev/md2,/dev/md3,/dev/md4,/dev/md5,/dev/md6,/dev/md7,/dev/md8,/dev/md9,/dev/md10,/dev/md11,/dev/md12
    
  4. In our example we match the four md devices to the sd devices they should be made up of so that we can add the correct sd device to the correct md. Here we can see that we need to add sdc to md1, sdd to md2, sdf to md3 and sde to md4 because these are the devices missing their second disk. This is done by comparing the mdstat output to the mdadm.conf file and seeing what the md device should have vs. what they do have. We need our four md devices to contain their proper sd devices:

    /dev/md1 - /dev/sdc, /dev/sdp
    /dev/md2 - /dev/sdd, /dev/sdq
    /dev/md3 - /dev/sdf, /dev/sdo
    /dev/md4 - /dev/sde, /dev/sdr
  5. Use the mdadm --add command to add the replaced device to each of the four md devices missing their drives:  
    [root@exalytics0 ~]# mdadm /dev/md1 --add /dev/sdc
    mdadm: added /dev/sdc
    [root@exalytics0 ~]# mdadm /dev/md2 --add /dev/sdd
    mdadm: added /dev/sdd
    [root@exalytics0 ~]# mdadm /dev/md3 --add /dev/sdf
    mdadm: added /dev/sdf
    [root@exalytics0 ~]# mdadm /dev/md4 --add /dev/sde
    mdadm: added /dev/sde
    
  6. After adding the devices we can check the /proc/mdstat file to confirm that they were added and are now being rebuilt.  
    [root@exalytics0 ~]# cat /proc/mdstat
    Personalities : [raid1] [raid0]
    md12 : active raid1 sdn[0] sdw[1]
          97656128 blocks [2/2] [UU]
          
    md11 : active raid1 sdk[0] sdz[1]
          97656128 blocks [2/2] [UU]
          
    md10 : active raid1 sdm[0] sdy[1]
          97656128 blocks [2/2] [UU]
          
    md9 : active raid1 sdl[0] sdx[1]
          97656128 blocks [2/2] [UU]
          
    md8 : active raid1 sdi[0] sdt[1]
          97656128 blocks [2/2] [UU]
          
    md7 : active raid1 sdh[0] sds[1]
          97656128 blocks [2/2] [UU]
          
    md6 : active raid1 sdg[0] sdu[1]
          97656128 blocks [2/2] [UU]
          
    md5 : active raid1 sdj[0] sdv[1]
          97656128 blocks [2/2] [UU]
          
    md4 : active raid1 sde[2] sdr[1]
          97656128 blocks [2/1] [_U]
          [===>.................]  recovery = 17.6% (17241472/97656128) finish=6.4min speed=206060K/sec
          
    md3 : active raid1 sdf[2] sdo[1]
          97656128 blocks [2/1] [_U]
          [====>................]  recovery = 20.0% (19615872/97656128) finish=6.3min speed=204086K/sec
          
    md2 : active raid1 sdd[2] sdq[1]
          97656128 blocks [2/1] [_U]
          [====>................]  recovery = 22.8% (22334848/97656128) finish=6.0min speed=206060K/sec
          
    md1 : active raid1 sdc[2] sdp[1]
          97656128 blocks [2/1] [_U]
          [=====>...............]  recovery = 28.2% (27618432/97656128) finish=5.6min speed=206250K/sec
          
    md0 : active raid0 md1[0] md12[11] md11[10] md10[9] md9[8] md8[7] md7[6] md6[5] md5[4] md4[3] md3[2] md2[1]
          1171872768 blocks 64k chunks
          
    unused devices: <none>
  7. The rebuild time will vary depending on the device sizes and system activity. If the system is actively using the flash then the rebuild time will be extended. After confirming that the recovery for each device finished successfully the Raid restoration is complete.

7C. Raid05 restoration steps.

  1. Looking at the output of the exalytics_CheckFlash.sh script we can see that the system has assigned the same devices to the four flash drives on the replaced card as were mapped to the original card (/dev/sdc /dev/sdd /dev/sde /dev/sdf). This is normal and expected but be aware that the Operating System may map new/different devices to the flash card. If this happens you will need to recreate the RAID using the new devices as listed. Compare to the original output from step 1 to confirm if the devices are the same or are now different.
  2. Check the /proc/mdstat file to see the SW Raid status.  
    [root@exalytics0 ~]# cat /proc/mdstat 
    Personalities : [raid0] [raid6] [raid5] [raid4] 
    md6 : active raid0 sdy[0] sdz[3] sdx[2] sdw[1]
          390624512 blocks 64k chunks
          
    md5 : active raid0 sdu[0] sdv[3] sds[2] sdt[1]
          390624512 blocks 64k chunks
          
    md4 : active raid0 sdp[0] sdo[3] sdq[2] sdr[1]
          390624512 blocks 64k chunks
          
    md3 : active raid0 sdm[0] sdk[3] sdn[2] sdl[1]
          390624512 blocks 64k chunks
          
    md2 : active raid0 sdg[0] sdj[3] sdh[2] sdi[1]
          390624512 blocks 64k chunks
          
    md0 : active raid5 md2[1] md6[5] md5[4] md4[3] md3[2]
          1953122240 blocks level 5, 64k chunk, algorithm 2 [6/5] [_UUUUU]
          
    unused devices: <none>
    
     
  3. Since the replaced Flash card contained all four flash disks that made up one of the Raid0 devices we will see that our Raid5 device (/dev/md0) shows that one of it's six devices is now missing. /dev/md0 should show something similar to [6/5] [_UUUUU] at the end of it's output showing that only 5 of the 6 devices are attached. md0 should be made up of md1, md2, md3, md4, md5 and md6 but in our example we are missing md1. So md1 will need to be re-created. (if your system is missing a different device then adjust your commands to use the md device that is missing) Using the devices listed in the exalytics_CheckFlash.sh output as discussed in step a we will re-create the raid0 md device using mdadm --create. (If the flash card being used for replacement was previously setup in a raid configuration then you may see a warning that the device was previously part of another raid device, if you see this you will need to reply 'y' when asked if you want to continue to create the array)
    [root@exalytics0 ~]# mdadm /dev/md1 --create --raid-devices=4 --level=0 /dev/sdc /dev/sdd /dev/sde /dev/sdf
    mdadm: array /dev/md1 started.
    
  4. After the Raid0 device has been created it then needs to be added to the Raid1 device. The Raid1 device should be /dev/md0 and from our example /dev/md1 is the new device to be added. (adjust the command as needed for your configuration)
    [root@exalytics0 ~]# mdadm /dev/md0 --add /dev/md1
    mdadm: added /dev/md1
     
  5. After adding the device we can check the /proc/mdstat file to confirm that it was added and /dev/md0 is now being rebuilt.    
    [root@exalytics0 ~]# cat /proc/mdstat
    Personalities : [raid0] [raid6] [raid5] [raid4] 
    md1 : active raid0 sdf[3] sde[2] sdd[1] sdc[0]
          390624512 blocks 64k chunks
          
    md6 : active raid0 sdy[0] sdz[3] sdx[2] sdw[1]
          390624512 blocks 64k chunks
          
    md5 : active raid0 sdu[0] sdv[3] sds[2] sdt[1]
          390624512 blocks 64k chunks
          
    md4 : active raid0 sdp[0] sdo[3] sdq[2] sdr[1]
          390624512 blocks 64k chunks
          
    md3 : active raid0 sdm[0] sdk[3] sdn[2] sdl[1]
          390624512 blocks 64k chunks
          
    md2 : active raid0 sdg[0] sdj[3] sdh[2] sdi[1]
          390624512 blocks 64k chunks
          
    md0 : active raid5 md1[6] md2[1] md6[5] md5[4] md4[3] md3[2]
          1953122240 blocks level 5, 64k chunk, algorithm 2 [6/5] [_UUUUU]
          [>....................]  recovery =  0.2% (982644/390624448) finish=46.2min speed=140377K/sec
          
    unused devices: <none>
    
     
  6. Since the md1 device was newly created it will now have a different UUID from what was previously used by the system so the /etc/mdadm.conf file will need to be re-created. Use mdadm --detail --scan --verbose to recreate the file and then cat the file to check that it was properly created:  
    [root@exalytics0 ~]# mdadm --detail --scan --verbose > /etc/mdadm.conf
    [root@exalytics0 ~]# cat /etc/mdadm.conf
    ARRAY /dev/md2 level=raid0 num-devices=4 metadata=0.90 UUID=438c339d:73a3b0b9:1327795f:7d646e9a
       devices=/dev/sdg,/dev/sdi,/dev/sdh,/dev/sdj
    ARRAY /dev/md3 level=raid0 num-devices=4 metadata=0.90 UUID=b283e2c7:6ebb785c:2491a388:7d818402
       devices=/dev/sdm,/dev/sdl,/dev/sdn,/dev/sdk
    ARRAY /dev/md4 level=raid0 num-devices=4 metadata=0.90 UUID=1df3cb6d:9e704f02:5c46d821:1d7a18ce
       devices=/dev/sdp,/dev/sdr,/dev/sdq,/dev/sdo
    ARRAY /dev/md5 level=raid0 num-devices=4 metadata=0.90 UUID=83146227:c25f757c:6a524238:a131fa8a
       devices=/dev/sdu,/dev/sdt,/dev/sds,/dev/sdv
    ARRAY /dev/md6 level=raid0 num-devices=4 metadata=0.90 UUID=372b91b2:e646c239:a5de6d0b:1a536762
       devices=/dev/sdy,/dev/sdw,/dev/sdx,/dev/sdz
    ARRAY /dev/md1 level=raid0 num-devices=4 metadata=0.90 UUID=c0c59206:4d0e8e4a:b90a8500:a1fdacac
       devices=/dev/sdc,/dev/sdd,/dev/sde,/dev/sdf
    ARRAY /dev/md0 level=raid5 num-devices=6 metadata=0.90 spares=1 UUID=8779e6d9:6d027e33:62be3a70:25f1cf5c
       devices=/dev/md1,/dev/md2,/dev/md3,/dev/md4,/dev/md5,/dev/md6
    
      
  7. The rebuild time will vary depending on the device sizes and system activity. If the system is actively using the flash then the rebuild time will be extended. After confirming the recovery for each device finished successfully the Raid restoration is complete. 

 

OBTAIN CUSTOMER ACCEPTANCE

WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:

Boot up system and verify full functionality

REFERENCE INFORMATION:

Oracle Exalytics In-Memory Machine Documentation Library

http://docs.oracle.com/cd/E41246_01/index.htm

Sun Server X2-4 Documentation

http://docs.oracle.com/cd/E20781_01/index.html


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback