Asset ID: |
1-71-2103181.1 |
Update Date: | 2017-05-01 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
2103181.1
:
How to remove and replace the Disk Cage on Exadata X3-2, X4-2, and X5-2 Storage Server to support 8TB disk upgrades.
Related Items |
- Exadata X3-2 Hardware
- Exadata X4-2 Hardware
- Exadata X5-8 Hardware
- Exadata X5-2 Full Rack
- Exadata X5-2 Eighth Rack
- Exadata X4-2 Quarter Rack
- Exadata X5-2 Hardware
- Exadata X3-2 Half Rack
- Exadata X3-2 Full Rack
- Exadata X5-2 Quarter Rack
- Exadata X4-8 Hardware
- Exadata X4-2 Half Rack
- Exadata X3-8 Hardware
- Exadata X5-2 Half Rack
- Exadata X3-2 Eighth Rack
- Exadata X4-2 Full Rack
- Exadata X4-2 Eighth Rack
- Exadata X3-8b Hardware
- Exadata X3-2 Quarter Rack
|
Related Categories |
- PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
|
In this Document
Oracle Confidential PARTNER - Available to partners (SUN).
Reason: Disk cage replacement is a FRU procedure.
Applies to:
Exadata X5-2 Half Rack - Version All Versions and later
Exadata X4-2 Hardware - Version All Versions and later
Exadata X3-2 Quarter Rack - Version All Versions and later
Exadata X3-2 Eighth Rack - Version All Versions and later
Exadata X4-2 Full Rack - Version All Versions and later
Information in this document applies to any platform.
Goal
How to remove and replace the Disk Cage on Exadata X3-2, X4-2, and X5-2 Storage Server to support 8TB disk upgrades.
Solution
CAP PROBLEM OVERVIEW: Disk cage upgrade procedure.
DISPATCH INSTRUCTIONS
WHAT SKILLS DOES THE ENGINEER NEED:
Oracle Exadata X3-2, X4-2, and X5-2 Storage Server Training.
TIME ESTIMATE: 60 minutes, for X3/X4 up to 90 minutes
TASK COMPLEXITY: 2-FRU
FIELD ENGINEER INSTRUCTIONS:
PROBLEM OVERVIEW:
To support an 8TB disk drive upgrade an Exadata X3-2, X4-2, and X5-2 Storage CELL must have the disk cage replaced to support the physically larger 8TB drives. Any X5-2 Storage CELL shipped with 8TB drives from the factory already has the correct disk cage present. Only X5-2 Storage CELLs shipped with 4TB disk drives or smaller require the upgrade. All X3-2 and X4-2 Storage CELLs require the upgraded disk cage.
NOTE: Without the disk cage upgrade, inserting an 8TB disk drive will cause physical damaged to the drives in slots 8, 9, 10, and 11, voiding the disk vendor HW warranty.
NOTE: If the 8TB disk upgrade and disk cage upgrade is being done on a rolling basis, then ACS will require the Storage Cells to have image 12.1.2.1.2 or later already installed, to support the new disks after installation. The customer is responsible for patching, details of image patches are available on MOS Note 888828.1, or the customer can separately purchase ACS patching service in advance of the disk and cage upgrade.
Ensure the customer has the correct Upgrade Kit for the type of Storage CELL:
High Capacity Upgrade Kit - Twelve 8TB HDDs plus disk cage for X5 racks (Marketing #7113574)
High Capacity Upgrade Kit - Twelve 8TB HDDs plus disk cage for X3 and X4 racks (Marketing #7113575)
WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:
The Storage Cell containing the disk cage is required to be powered off prior to cage replacement.
It is expected that the customer's DBA has completed these steps prior to arriving to replace the cage. The following commands are provided as guidance in case the customer needs assistance checking the status of the system prior to replacement. If the customer or the FSE requires more assistance prior to the physical replacement of the cage, EEST/TSC should be contacted.
1. Locate the server in the rack being serviced. The cell server within the rack can be determined from the hostname usually, and the known default Exadata server numbering scheme. Exadata Storage Servers are identified by a number 1 through 18, where 1 is the lowest most Storage Server in the rack installed in RU2, counting up to the top of the rack.
Turn on the locate indicator light for easier identification of the server being repaired. If the server number has been identified then the Locate Button on the front panel may be pressed. To turn on remotely, use either of the following methods:
From a login to the CellCli on Exadata Storage Servers:
CellCli> alter cell led on
From a login to the server’s ILOM:
-> set /SYS/LOCATE value=Fast_Blink
Set 'value' to 'Fast_Blink
From a login to the server’s ‘root’ account:
# ipmitool sunoem cli ‘set /SYS/LOCATE value=Fast_Blink’
Connected. Use ^D to exit.
-> set /SYS/LOCATE value=Fast_Blink
Set 'value' to 'Fast_Blink'
-> Session closed
Disconnected
2. Shutdown the node for which the disk cage requires replacement.
a) For Extended information on this section check MOS Note: ID 1188080.1 Steps to shut down or reboot an Exadata storage cell without affecting ASM
This is also documented in the Exadata Maintenance Guide section titled "Maintaining Exadata Storage Servers" subsection "Shutting Down Exadata Storage Server" available on the customer's cell server image in the /opt/oracle/cell/doc directory or at https://docs.oracle.com/cd/E50790_01/doc/doc.121/e51951/storage.htm#DBMMN21129.
In the following examples the SQL commands should be run by the customers DBA prior to doing the hardware replacement. These should be done by the field engineer only if the customer directs them to, or is unable to do them. The cellcli commands will need to be run as root.
Note the following when powering off Exadata Storage Servers:
- Verify there are no other storage servers with disk faults. Shutting down a storage server while another disk is failed may result in the running database processes and Oracle ASM to crash if it loses both disks in the partner pair when this server’s disks go offline.
- Powering off one Exadata Storage Server with no disk faults in the rest of the rack will not affect running database processes or Oracle ASM.
b) ASM drops a disk shortly after they are taken offline. Powering off or restarting Exadata Storage Servers can impact database performance if the storage server is offline for longer than the ASM disk repair timer to be restored. The default DISK_REPAIR_TIME attribute value of 3.6hrs may not be adequate for replacing components if something breaks, we recommend changing to 24hrs to give time to order and receive replacement parts if something breaks. To check this parameter, have the Customer log into ASM and perform the following query:
SQL> select dg.name,a.value from v$asm_attribute a, v$asm_diskgroup dg where a.name = 'disk_repair_time' and a.group_number = dg.group_number;
As long as the value is large enough to comfortably replace the components being replaced, then there is no need to change it.
c) Check if ASM will be OK if the grid disks go OFFLINE.
# cellcli -e list griddisk attributes name,asmmodestatus,asmdeactivationoutcome
...sample ...
DBFS_DG_FD_06_exdx5_tvp_a_cel3 ONLINE Yes
DBFS_DG_FD_07_exdx5_tvp_a_cel3 ONLINE Yes
RECOC1_FD_00_exdx5_tvp_a_cel3 ONLINE Yes
RECOC1_FD_01_exdx5_tvp_a_cel3 ONLINE Yes
RECOC1_FD_02_exdx5_tvp_a_cel3 ONLINE Yes
RECOC1_FD_03_exdx5_tvp_a_cel3 ONLINE Yes
...repeated for all griddisks....
If one or more disks return asmdeactivationoutcome='No', then wait for some time and repeat this command. Once all disks return asmdeactivationoutcome='Yes', proceed to the next step.
d) Run cellcli command to Inactivate all grid disks on the cell that needs to be powered down for maintenance. (this could take up to 10 minutes or longer)
# cellcli
CellCLI> ALTER GRIDDISK ALL INACTIVE
...sample ...
GridDisk DBFS_DG_FD_06_exdx5_tvp_a_cel3 successfully altered
GridDisk DBFS_DG_FD_07_exdx5_tvp_a_cel3 successfully altered
GridDisk RECOC1_FD_00_exdx5_tvp_a_cel3 successfully altered
GridDisk RECOC1_FD_01_exdx5_tvp_a_cel3 successfully altered
GridDisk RECOC1_FD_02_exdx5_tvp_a_cel3 successfully altered
GridDisk RECOC1_FD_03_exdx5_tvp_a_cel3 successfully altered
...repeated for all griddisks...
e) Execute the command below and the output should show asmmodestatus='UNUSED' or 'OFFLINE' and asmdeactivationoutcome=Yes for all griddisks once the disks are offline and inactive in ASM.
CellCLI> list griddisk attributes name,status,asmmodestatus,asmdeactivationoutcome
...sample...
DBFS_DG_FD_06_exdx5_tvp_a_cel3 inactive OFFLINE Yes
DBFS_DG_FD_07_exdx5_tvp_a_cel3 inactive OFFLINE Yes
RECOC1_FD_00_exdx5_tvp_a_cel3 inactive OFFLINE Yes
RECOC1_FD_01_exdx5_tvp_a_cel3 inactive OFFLINE Yes
RECOC1_FD_02_exdx5_tvp_a_cel3 inactive OFFLINE Yes
RECOC1_FD_03_exdx5_tvp_a_cel3 inactive OFFLINE Yes
...repeated for all griddisks...
f) Once all disks are offline and inactive, the customer may shutdown the Cell using the following command:
# shutdown -hP now
When powering off Exadata Storage Servers, all storage services are automatically stopped.
WHAT ACTION DOES THE ENGINEER NEED TO TAKE:
Remove the existing Disk Cage:
1. Prepare the server for service.
- Power off the server and disconnect the power cords from the power supplies.
- Extend the server into the maintenance position.
- Attach an antistatic wrist strap to your wrist, and then to a metal area on the chassis.
- Remove the server top cover.
2. Remove all the disk drives from the front of the server.
NOTE: If the 8TB disk upgrade is not upgrading the actual disks at the same time as the disk cage upgrade, ensure that you mark disk drive slot locations when removing from the disk drive cage, as the current disk drives must be reinserted into the correct slots after disk cage replacement.
3. Remove the disk cage mounting screws to loosen the cage, making it easier to remove the cables. Depending on your server type, do one of the following:
- If your server is X5-2L, perform the following procedures:
- Remove the server fan assembly door.
- Remove the two screws on each side of the chassis and the three screws on top of the chassis.
- Slide the fan assembly door toward the rear of the server, and lift and remove the door from the chassis.
- Remove the four remaining No. 2 Phillips screws (two on each side of the chassis) that secure the disk cage assembly to the server chassis.
- If your server is X4-2L or X3-2L, perform the following procedures:
- Open the top cover fan assembly door.
- Remove the six No. 2 Phillips screws (three on each side of the chassis) that secure the disk cage assembly to the server chassis.
4. Remove the fan modules from the server.
5. Disconnect all cables from the front disk backplane.
- If your server is X4-2L or X3-2L, perform the following procedures:
- Remove the mid-wall from the chassis.
- Using a No. 2 Phillips screwdriver, remove the two screws on each side of the chassis that secure the mid-wall to the chassis.
- Using a No. 2 Phillips screwdriver, loosen the four green captive screws that secure the mid-wall to the bottom of the chassis.
- Lift the mid-wall out of the chassis.
- This allows for easier access to the cables.
- Use caution when disconnecting the signal cable between the disk backplane and motherboard, the cable is quite fragile. If upgrading more than 3 CELLs you may want to have a spare cable when performing the procedure.
6. Disconnect the left LED indicator module cable and the right LED/USB indicator module cable from the motherboard.
7. Slide the disk cage assembly forward, and then gently lift the disk cage assembly from the chassis. Place the disk cage on an antistatic mat.
8. Remove the disk backplane from the disk cage. Depending on your server type, do one of the following:
- If your server is X5-2L, perform the following procedures:
- Using a No. 2 Phillips screwdriver, loosen the right-side spring-mounted screw that secures the storage drive backplane to the disk cage.
- Lift the storage drive backplane up to release it from the standoff hooks and off of the disk cage.
- Place the storage drive backplane on an antistatic mat until it's ready to be installed in the new disk cage.
- If your server is X4-2L or X3-2L, perform the following procedures:
- Using a No. 2 Phillips screwdriver, loosen the two screws that secure the storage drive backplane to the disk cage.
- Slide the backplane toward the front of the cage to release it from the three mushroom-shaped standoffs, and lift it off of the disk cage.
- Place the storage drive backplane on an antistatic mat until it's ready to be installed in the new disk cage.
- Repeat 1 through 3, to remove the 2nd and 3rd drive backplanes. For the 2nd and 3rd disk backplane you may need to use a wrench (or multi-tool) to loosen the two screws that secure the storage drive backplane to the disk cage, as the "mounting bracket" of the 1st disk backplane makes access to the screws of the 2nd disk backplane difficult.
Install the new disk cage:
1. Before beginning the process of installing the new cage you must transfer serial number label identity from the old cage to the new cage. To do this you must carefully peel the serial number labels from the RFID tag, and the top left front corner of the old disk cage face. The top label can be seen when looking down over the top. Once the labels are removed, carefully re-affix them to the same locations on the new disk cage.
2. Install the disk backplane from the old disk cage into the new disk cage. Depending on your server type, do one of the following:
- If your server is X5-2L, perform the following procedures:
- Lower the storage drive backplane onto the disk cage, and position it to engage the standoff hooks.
- Using a No. 2 Phillips screwdriver, install and tighten the right-side spring-mounted screw to secure the storage drive backplane to the disk cage.
- If your server is X4-2L or X3-2L, perform the following procedures:
- Lower the storage drive backplane onto the disk cage, and position it to engage the three mushroom standoffs.
- Using a No. 2 Phillips screwdriver, install and tighten the two screws to secure the storage drive backplane to the disk cage.
- Repeat 1 and 2, to install the 2nd and 3rd drive backplanes. For the 2nd and 3rd disk backplane you may need to use a wrench (or multi-tool) to tighten the two screws that secure the storage drive backplane to the disk cage, as the "mounting bracket" of the 1st disk backplane makes access to the screws of the 2nd disk backplane difficult.
3. Gently lift the disk cage assembly and set it into the server chassis.
Carefully push the disk cage assembly into the server chassis to ensure that the disk cage screw holes are correctly aligned with the server chassis. Also ensure the mushroom rivets and the square teeth on the bottom front edge of the chassis are intertwined with the disk cage. Be careful to not pinch the left and right LED cables when inserting the cage. Do not secure the cage screws until step 6, to make it easier to connect the cables.
4. Reconnect the left LED indicator module cable and the right LED/USB indicator module cable to the motherboard.
5. Reconnect all cables to the front disk backplane.
- If your server is X4-2L or X3-2L, perform the following procedures:
- Install the mid-wall into the chassis.
- Lift and place the mid-wall into the chassis.
- Using a No. 2 Phillips screwdriver, tighten the four green captive screws that secure the mid-wall to the bottom of the chassis.
- Using a No. 2 Phillips screwdriver, insert and tighten the two screws on each side of the chassis that secure the mid-wall to the chassis.
- Use caution in reconnecting the signal cable between the disk backplane and motherboard, the cable is quite fragile. If upgrading more than 3 CELLs you may want to have a spare when performing the procedure.
6. Secure the cage to the chassis. Depending on your server type, do one of the following:
- If your server is X5-2L, perform the following procedures:
- Install the four No. 2 Phillips screws (two on each side of the chassis) that secure the disk cage assembly to the server chassis.
- Install the server fan assembly door.
- Place the fan assembly door on the chassis and slightly over the fan assembly.
- Slide the fan assembly door forward and under the lip of the forward top cover until it latches into place.
- Use a No. 2 Phillips screwdriver to install and secure the fan assembly door.
- Install and tighten the two screws on each side of the chassis and the three screws on top of the chassis.
- If your server is X4-2L or X3-2L, perform the following procedures:
- Install the six No. 2 Phillips screws (three on each side of the chassis) that secure the disk cage assembly to the server chassis.
7. Install the fan modules in the server, and close the top cover fan assembly door.
8. Either re-install all the original HDDs back into the server, or upgrade to the new 8TB HDDs.
9. Return the server to operation:
- Install the server top cover.
- Reconnect the power cords to the power supplies, and power on the server.
- Once the ILOM has booted you will see a slow blink on the green LED for the server. Power on the server by pressing the power button on the front of the unit.
Server Services Startup Validation:
NOTE: If you installed the new 8TB disks now, then your involvement ends here. ACS should take over to finish validation, to configure the servers and bring the new disks online. See Oracle Exadata Database Machine Disk Swap Service Process (Doc ID 1544637.1), for further details.
If the customer plans to install the 8TB disks later and not at the same time as the disk cage, and the original disks were re-installed back into the server in the correct slots then continue on with the following Validation steps.
Optional step for new 8TB drives ONLY.
There is no requirement for disktests to be run on new 8TB drives during this procedure which are located at /opt/oracle.cellos/validations/init.d/disktests ,this is due to the length of time to run the tests.
If time allows the disktests can be run on the new 8TB drives before they have been configured with celldisks/griddisks but the cells would require an operating system installed on them to allow these to be run ,this step would be performed by ACS engineer.
If the tests are started they can be stopped as follows:
1. # killall orion
2. kill the disktest shell process
Disktests can take a minimum of 25 hours to complete on 8TB drives ,"Be aware that disktest is destructive i.e. it overwrites the data areas on the disk. as it is destructive to data (not the OS image area), so running disktest should only be done on newly installed disks and should never be done on already installed disks that may contain customer data. " .This means drives that have celldisks and griddisks configured would lose data.
Therefore DO NOT run disktests on the original customer drives.
1. As the system boots the hardware/firmware profile will be checked, and either a green "Passed" will be displayed, or a red "Warning" that the check does not match if the firmware on any components is different from what the image expects. If the check passes, then the firmware is correct, and the boot will continue up to the OS login prompt. If the check fails, then the firmware will automatically be updated, and a subsequent reboot will occur. You should not see this unless there were components replaced previously that were not updated.
2. After the OS is up, login as root and verify all the expected disk devices are present:
The following command should show 12 disks:
# lsscsi | grep -i LSI
[0:2:0:0] disk LSI MR9361-8i 4.23 /dev/sda
[0:2:1:0] disk LSI MR9361-8i 4.23 /dev/sdb
[0:2:2:0] disk LSI MR9361-8i 4.23 /dev/sdc
[0:2:3:0] disk LSI MR9361-8i 4.23 /dev/sdd
[0:2:4:0] disk LSI MR9361-8i 4.23 /dev/sde
[0:2:5:0] disk LSI MR9361-8i 4.23 /dev/sdf
[0:2:6:0] disk LSI MR9361-8i 4.23 /dev/sdg
[0:2:7:0] disk LSI MR9361-8i 4.23 /dev/sdh
[0:2:8:0] disk LSI MR9361-8i 4.23 /dev/sdi
[0:2:9:0] disk LSI MR9361-8i 4.23 /dev/sdj
[0:2:10:0] disk LSI MR9361-8i 4.23 /dev/sdk
[0:2:11:0] disk LSI MR9361-8i 4.23 /dev/sdl
If the device count is not correct check also that the LSI controller has the correct Virtual Drives configured and in Optimal state, physically Online and spun up, with no Foreign configuration. There should be Virtual Drives 0 to 11, and the physical slots 0 to 11 should be allocated to 1 each (not necessarily the same 0:0 1:1 etc. mapping).
# /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -a0 | grep "Virtual Drive\|State\|Slot\|Firmware state"
Virtual Drive: 0 (Target Id: 0)
State : Optimal
Slot Number: 0
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 1 (Target Id: 1)
State : Optimal
Slot Number: 1
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 2 (Target Id: 2)
State : Optimal
Slot Number: 2
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 3 (Target Id: 3)
State : Optimal
Slot Number: 3
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 4 (Target Id: 4)
State : Optimal
Slot Number: 4
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 5 (Target Id: 5)
State : Optimal
Slot Number: 5
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 6 (Target Id: 6)
State : Optimal
Slot Number: 6
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 7 (Target Id: 7)
State : Optimal
Slot Number: 7
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 8 (Target Id: 8)
State : Optimal
Slot Number: 8
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 9 (Target Id: 9)
State : Optimal
Slot Number: 9
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 10 (Target Id: 10)
State : Optimal
Slot Number: 10
Firmware state: Online, Spun Up
Foreign State: None
Virtual Drive: 11 (Target Id: 11)
State : Optimal
Slot Number: 11
Firmware state: Online, Spun Up
Foreign State: None
If this is not correct, then there is a problem with the disk volumes that may need additional assistance to correct. The server should be re-opened and the device connections and boards checked to be sure they are secure and well seated BEFORE the following CellCLI commands are issued.
3. Once the hardware is verified as up and running, the Customer's DBA will need to activate the grid disks:
# cellcli
CellCLI> alter griddisk all active
GridDisk CATALOG_CD_09_zdlx5_tvp_a_cel3 successfully altered
GridDisk CATALOG_CD_10_zdlx5_tvp_a_cel3 successfully altered
GridDisk CATALOG_CD_11_zdlx5_tvp_a_cel3 successfully altered
GridDisk DELTA_CD_00_zdlx5_tvp_a_cel3 successfully altered
GridDisk DELTA_CD_01_zdlx5_tvp_a_cel3 successfully altered
GridDisk DELTA_CD_02_zdlx5_tvp_a_cel3 successfully altered
...repeated for all griddisks...
Issue the command below and all disks should show 'active':
CellCLI> list griddisk
CATALOG_CD_09_zdlx5_tvp_a_cel3 active
CATALOG_CD_10_zdlx5_tvp_a_cel3 active
CATALOG_CD_11_zdlx5_tvp_a_cel3 active
DELTA_CD_00_zdlx5_tvp_a_cel3 active
DELTA_CD_01_zdlx5_tvp_a_cel3 active
DELTA_CD_02_zdlx5_tvp_a_cel3 active
...repeated for all griddisks...
4. Verify all grid disks have been successfully put online using the following command. Wait until asmmodestatus is ONLINE for all grid disks and no longer SYNCING. The following is an example of the output early in the activation process.
CellCLI> list griddisk attributes name,status,asmmodestatus,asmdeactivationoutcome
CATALOG_CD_09_zdlx5_tvp_a_cel3 active SYNCING Yes
CATALOG_CD_10_zdlx5_tvp_a_cel3 active SYNCING Yes
CATALOG_CD_11_zdlx5_tvp_a_cel3 active SYNCING Yes
DELTA_CD_00_zdlx5_tvp_a_cel3 active SYNCING Yes
DELTA_CD_01_zdlx5_tvp_a_cel3 active SYNCING Yes
DELTA_CD_02_zdlx5_tvp_a_cel3 active SYNCING Yes
...repeated for all griddisks...
Notice in the above example that the grid disks are still in the 'SYNCING' process. Oracle ASM synchronization is only complete when ALL grid disks show asmmodestatus=ONLINE. This process can take some time depending on how busy the machine is, and has been while this individual server was down for repair.
OBTAIN CUSTOMER ACCEPTANCE
- WHAT ACTION DOES THE FIELD ENGINEER/ADMINISTRATOR NEED TO
TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:
- Verify that HW Components and SW Components are returned to properly functioning state with server up and all ASM disks online on Storage Servers.
REFERENCE INFORMATION:
Oracle Server X5-2L Documentation http://docs.oracle.com/cd/E41033_01/html/E48325/index.html
Sun Server X4-2L Documentation http://docs.oracle.com/cd/E36974_01/index.html
Sun Server X3-2L (formerly X4270M3) Documentation http://docs.oracle.com/cd/E23393_01/index.html
Attachments
This solution has no attachment