Asset ID: |
1-71-2003727.1 |
Update Date: | 2018-04-05 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
2003727.1
:
How to Replace an Exadata X5-2/X6-2 Storage Server NVMe drive
Related Items |
- Oracle SuperCluster T5-8 Full Rack
- Oracle SuperCluster M7 Hardware
- Exadata SL6 Hardware
- Exadata X6-2 Hardware
- Exadata X6-8 Hardware
- Oracle SuperCluster T5-8 Half Rack
- Exadata X5-2 Hardware
- Exadata X5-2 Eighth Rack
- Exadata X5-2 Full Rack
- Exadata X5-2 Quarter Rack
- Exadata X4-8 Hardware
- Exadata X5-2 Half Rack
- Oracle SuperCluster T5-8 Hardware
- Oracle SuperCluster M6-32 Hardware
|
Related Categories |
- PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: x64-CAP VCAP
|
In this Document
Applies to:
Oracle SuperCluster T5-8 Hardware - Version All Versions and later
Exadata X6-2 Hardware - Version All Versions and later
Exadata SL6 Hardware - Version All Versions and later
Exadata X6-8 Hardware - Version All Versions and later
Oracle SuperCluster M7 Hardware - Version All Versions and later
Information in this document applies to any platform.
Goal
Procedure for how to replace an NVMe drive in an Exadata Storage Cell without loss of data or Exadata service
Solution
The following information will be required prior to dispatch of a replacement:
Name/location of storage cell
Slot number of failed drive
Special Instructions for Dispatch are required for this part.
For Attention of Dispatcher:
The parts required in this action plan may be available as spares owned by the customer, which they received with the Engineered System. (These are sometimes referred to as ride-along spares.)
If parts are not available to meet the customer preferred delivery time/planned end date, then request TAM or field manager to contact the customer, and ask if the customer has parts available, and would be prepared to use them.
If customer spare parts are used, inform the customer that Oracle will replenish the customer part stock as soon as we can. More details on this process can be found in GDMR procedure "Handling Where No Parts Available" step 2: https://ptp.oraclecorp.com/pls/apex/f?p=151:138:38504529393::::DN,BRNID,DP,P138_DLID:2,86687,4,9082,
WHAT SKILLS DOES THE ENGINEER NEED:
Have familiarity with the Exadata Storage Servers and replacing hard drives.
TIME ESTIMATE: 60 minutes
Complete process may take longer depending on re-balance time that may be required.
TASK COMPLEXITY: 2
FIELD ENGINEER INSTRUCTIONS:
PROBLEM OVERVIEW:
Failed NVMe drive in Exadata Extreme Flash Storage Server.
NVMe drives are a combined controller and storage device and have very different failure modes compared to SAS devices. So the controller can report a Healthy Status and can also report failure code. If the controller believes the internal state of drive metadata could allow the drive to return incorrect data to the host, the drive will go into Disable Logical mode. This mode will shut down the drive storage device, but the controller will still be visible to the NVMe driver. This is also known as ASSERT or BAD_CONTEXT mode.
WHAT STATE SHOULD THE SYSTEM BE IN TO BE READY TO PERFORM THE RESOLUTION ACTIVITY?:
It is expected that the Exadata Machine is up and running and the storage cell containing the failed drive is booted and available.
If there are multiple drives to be replaced within an Exadata machine (or between an Exadata interconnected with another Exadata or Expansion Cabinet), it is critical that only ONE DRIVE BE REPLACED AT A TIME to avoid the risk of data loss. Before replacing another disk in Exadata, ensure the re-balance operation has completed from the first replacement.
Before proceeding, confirm the part number of the part in hand (either from logistics or an on-site spare) matches the part number dispatched for replacement.
It is expected that the customer's DBA has completed these steps prior to arriving to replace the disk. The following commands are provided as a guidance in case the customer needs assistance checking the system prior to replacement. If the customer or FSE requires more assistance prior to the physical replacement of the device, EEST/TSC should be contacted.
1. Confirm the drive needing replacement based on the output provided, the below shows nvmecli output from a drive that is in Disable Logical state (assert):
[root@exdx5-tvp-a-cel3 ~]# nvmecli --identify --device=/dev/nvme7
================== Controller Information =====================
Serial Number : CVMD437300AX1P6LGN
Model Number : INTEL SSDPE2ME016T4S
Firmware Version : 8DV1RA12
Number of Namespaces : 1
Health Indicator : *ASSERT_40351938 80
Internal Device Error: The command was not completed successfully due to an internal
device error.
or check that the PCIe device is present using lspci | grep 0953 on X5 servers or lspci | grep 172X on X6 servers. Each NVMe device should appear once, there should be either 8 or 12 NVMe devices present depending on the customers configuration:
[root@cel3 ~]# lspci | grep 0953
05:00.0 Non-Volatile memory controller: Intel Corporation Device 0953 (rev 01)
07:00.0 Non-Volatile memory controller: Intel Corporation Device 0953 (rev 01)
25:00.0 Non-Volatile memory controller: Intel Corporation Device 0953 (rev 01)
27:00.0 Non-Volatile memory controller: Intel Corporation Device 0953 (rev 01)
86:00.0 Non-Volatile memory controller: Intel Corporation Device 0953 (rev 01)
88:00.0 Non-Volatile memory controller: Intel Corporation Device 0953 (rev 01)
96:00.0 Non-Volatile memory controller: Intel Corporation Device 0953 (rev 01)
98:00.0 Non-Volatile memory controller: Intel Corporation Device 0953 (rev 01)
Note on the X6 servers the NVME drives were changed so instead of "Intel Corporation Device 0953", you should expect to see "Samsung Electronics Co Ltd NVMe SSD Controller 172X" in the above lspci output.
2. The Oracle ASM disks associated with the grid disks on the physical disk will be automatically dropped with FORCE option, and an ASM re-balance will start immediately to restore the data redundancy.
Validate the failed NVMe drive is no longer part of the ASM diskgroups:
a) Login to a database node with the username for the owner of Oracle Grid Infrastructure home. Typically this is the 'oracle' user.
edx2db01 login: oracle
Password:
Last login: Thu Jul 12 14:43:10 on ttyS0
[oracle@edx2db01 ~]$
b) Select the ASM instance for this DB node and connect to SQL Plus:
[oracle@edx2db01 ~]$ . oraenv
ORACLE_SID = [oracle] ? +ASM1
The Oracle base has been set to /u01/app/oracle
[oracle@edx2db01 ~]$ sqlplus ' / as sysasm'
SQL*Plus: Release 11.2.0.2.0 Production on Thu Jul 12 14:45:20 2012
Copyright (c) 1982, 2010, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL>
In the above output the “1” of “+ASM1” refers to the DB node number. For example, DB node #3 the value would be +ASM3.
c) From the DB node, run the following query, using the name the celldisk is associated with on this physical disk, which is given in the Cell alert, an example is below:
SQL> select group_number,path,header_status,mount_status,mode_status,name from V$ASM_DISK where path like 'ý_05_exdx5_tvp_a_cel3';
no rows selected.
SQL>
This query should return no rows indicating the disk is no longer in the ASM diskgroup configuration. If this returns any other value, then contact the SR owner for further guidance.
Note: If you are not sure what the celldisk name is, or do not have the alert output available, from the CellCLI interface run "list alerthistory"
3. The Cell Management Server daemon monitors and takes action on replacement disks to automatically bring the new disk into the configuration.
a) Login to the cell server and enter the CellCLI interface
# cellcli
CellCLI: Release 12.1.2.1.0 - Production on Thu Apr 16 07:05:44 BST 2015
Copyright (c) 2007, 2013, Oracle. All rights reserved.
Cell Efficiency Ratio: 504
CellCLI>
b. Verify the status of the msStatus is running before replacing the disk:
CellCLI> list cell attributes cellsrvStatus,msStatus,rsStatus detail
cellsrvStatus: running
msStatus: running
rsStatus: running
4. If the failed NVMe drive is in slot 0 or slot 1, then the disk is a system disk which contains the running OS. Verify the root volume is in 'clean' state before hot replacing a system disk. If it is 'active' and the disk is hot removed, the OS may crash making the recovery more difficult.
a. Login as 'root' on the Storage Cell, and use 'df' to determine the md device name for "/" volume:
[root@cel2 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md5 10321144 6340492 3456368 65% /
tmpfs 32791712 4 32791708 1% /dev/shm
/dev/md7 3096272 1655564 1283428 57% /opt/oracle
/dev/md4 483886 27532 431359 6% /boot
/dev/md11 5157312 210004 4685328 5% /var/log/oracle
b. Use 'mdadm' to determine the volume status:
[root@cel2 ~]# mdadm -Q --detail /dev/md5
/dev/md5:
Version : 0.90
Creation Time : Thu Dec 25 12:59:29 2014
Raid Level : raid1
Array Size : 10485696 (10.00 GiB 10.74 GB)
Used Dev Size : 10485696 (10.00 GiB 10.74 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 5
Persistence : Superblock is persistent
Update Time : Thu Apr 16 07:14:10 2015
State : clean <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
UUID : 711f6845:90d4551d:04894333:532a878b
Events : 0.241
Number Major Minor RaidDevice State
0 259 5 0 active sync /dev/nvme0n1p5
1 259 17 1 active sync /dev/nvme1n1p5
WHAT ACTION DOES THE ENGINEER NEED TO TAKE:
Confirm the drive needing replacement based on the output provided ("name" or "slotNumber" value) and LED status of drive. In order to remove the NVMe drive the PCIe hot-plug procedure MUST be followed. If the drive is removed without performing the hot-removal operation, system will crash and reset with a PCIe Surprise Link Down against the drive. There is a clear visual indication (Blue LED) when drive is safe to remove. If Blue LED is not lit, do not remove the drive.
Drives have both a physical slot location, and an instance in /dev, they may not be the same numerically. For example, physical slot 10, may be /dev/nvme7 depending on how many drives were populated at boot time. When preparing to gather data and logs from a drive, always check physical to logical mapping with the below command:
cellcli -e list physicaldisk detail
name: NVME_7
deviceName: /dev/nvme5n1
Name is the physical slot, deviceName is the /dev/nvme entry.
Slot locations for Extreme Flash Exadata Storage Cell:
View from the front:
8 Drive Configuration:
[Filler] [NVMe0] [Filler] [NVMe1] [Filler] [Filler] [Filler] [NVMe3] [Filler] [NVMe4] [Filler] [Filler] [Filler] [NVMe6] [Filler] [NVMe7] [Filler] [Filler] [Filler] [NVMe9] [Filler] [NVMe10] [Filler] [Filler]
1. To prepare a NVMe drive for removal, the following command MUST be run:
cellcli -e alter physicaldisk NVME_# drop for replacement
where NVME_# where # is the slot ID.
eg:
CellCLI> alter physicaldisk NVME_7 DROP FOR REPLACEMENT
Physical disk NVME_7 was dropped for replacement.
CellCLI> list physicaldisk
NVME_0 CVMD4470007K1P6LGN normal
NVME_1 CVMD437300J61P6LGN normal
NVME_3 CVMD447100791P6LGN normal
NVME_4 CVMD439000611P6LGN normal
NVME_6 CVMD4471001E1P6LGN normal
NVME_7 CVMD4471006X1P6LGN normal - dropped for replacement
NVME_9 CVMD4415003D1P6LGN normal
NVME_10 CVMD437300AX1P6LGN normal
CellCLI> list physicaldisk NVME_7 detail
name: NVME_7
deviceName: /dev/nvme5n1
diskType: FlashDisk
luns: 0_7
makeModel: "Oracle NVMe SSD"
notPresentSince: 2015-04-23T08:45:01+01:00
physicalFirmware: 8DV1RA10
physicalInsertTime: 2015-03-31T19:29:29+01:00
physicalSerial: CVMD4471006X1P6LGN
physicalSize: 1.4554837569594383T
slotNumber: 7
status: normal - dropped for replacement
NOTE: The blue OK to Remove status indicator LED on the drive will light once a PCIe hot-remove operation has completed. Do not remove the drive until this LED indicator is lit, otherwise a system reset could occur.
2. On the drive you plan to remove, push the latch release button to open the drive latch
3. Grasp the latch and pull the drive out of the drive slot (Caution: whenever you remove a storage drive, you should replace it with another storage drive or a filler panel, otherwise the server might overheat due to improper airflow.)
4. Wait three minutes for the MS daemon to recognize the removal of the old drive
5. Slide the drive into the slot until the drive is fully seated
6. Close the drive latch to lock the drive in place
7. Drive should automatically power on when inserted. /var/log/messages will report a drive is present and identify the slot ID
8. Wait three minutes for the MS daemon to start rebuilding the virtual drives before proceeding
OBTAIN CUSTOMER ACCEPTANCE
- WHAT ACTION DOES THE CUSTOMER NEED TO TAKE TO RETURN THE SYSTEM TO AN OPERATIONAL STATE:
It is expected that the engineer stay on-site until the customer has given the approval to depart. The customer should check the status of the drive after replacement. The following commands are provided as guidance in case the customer needs assistance checking the status of the system following replacement. If the customer or the FSE requires more assistance following the physical replacement of the device, EEST/TSC should be contacted.
After replacing the NVMe drive on Exadata Storage Server, wait for three minutes before running any commands to query the device from the server. CellCLI (examples below) should be the principle tool to query the drives.
1. Re-enable the NVMe drive after inserting it:
CellCLI> alter physicaldisk NVME_7 reenable
CellCLI> list physicaldisk NVME_7 detail
name: NVME_7
deviceName: /dev/nvme5n1
diskType: FlashDisk
luns: 0_7
makeModel: "Oracle NVMe SSD"
physicalFirmware: 8DV1RA10
physicalInsertTime: 2015-04-23T08:57:44+01:00
physicalSerial: CVMD4471006X1P6LGN
physicalSize: 1.4554837569594383T
slotNumber: 7
status: normal
The "status" field should report "normal". Note also that the physicalInsertTime should be current date and time, and not an earlier time. If they are not, then the old disk entries may still be present and the disk replacement did not complete successfully. If this is the case, refer to the SR owner for further assistance.
2. The firmware of the drive will be automatically upgraded to match the other disks in the system when the new drive is inserted, if it is below the supported version of the current image. If it is above the minimum supported version then no action will be taken, and the newer firmware will remain. This can be validated by the following command:
CellCLI> alter cell validate configuration
3. After the drive is replaced, a lun should be automatically created, and the grid disks and cell disks that existed on the previous disk in that slot are automatically re-created on the new physical disk. If those grid disks were part of an Oracle ASM group, then they will be added back to the disk group and the data will be re-balanced on them, based on the disk group redundancy and asm_power_limit parameter values.
Grid disks and cell disks can be verified with the following CellCLI command, where the lun name is reported in the physicaldisk output from step 1 above ("0_7" in this example"):
CellCLI> list lun 0_7 detail
name: 0_7
cellDisk: FD_05_exdx5_tvp_a_cel3
deviceName: /dev/nvme5n1
diskType: FlashDisk
id: 0_7
isSystemLun: FALSE
lunSize: 1.4554837569594383T
physicalDrives: NVME_7
status: normal
CellCLI> list celldisk where lun=0_7 detail
name: FD_05_exdx5_tvp_a_cel3
comment:
creationTime: 2015-03-31T11:56:22+01:00
deviceName: /dev/nvme5n1
devicePartition: /dev/nvme5n1
diskType: FlashDisk
errorCount: 0
freeSpace: 0
id: 86f50408-9216-43d9-8e28-275d2a19df6f
interleaving: none
lun: 0_7
physicalDisk: CVMD4471006X1P6LGN
raidLevel:
size: 1.455474853515625T
status: normal
CellCLI> list griddisk where celldisk=FD_05_exdx5_tvp_a_cel3 detail
name: BACKUP_FD_05_exdx5_tvp_a_cel3
asmDiskGroupName: BACKUP_DG
asmDiskName: BACKUP_FD_05_EXDX5_TVP_A_CEL3
asmFailGroupName: EXDX5_TVP_A_CEL3
availableTo:
cachedBy: FD_05_exdx5_tvp_a_cel3
cachingPolicy: default
cellDisk: FD_05_exdx5_tvp_a_cel3
comment: "Cluster exdx5-clu1 diskgroup BACKUP"
creationTime: 2015-11-20T22:29:34+00:00
diskType: FlashDisk
errorCount: 0
id: 5fcf1ec5-b05b-4f2e-8f73-2723de18ccfa
offset: 390.625G
size: 293G
status: active
name: DATAC1_FD_05_exdx5_tvp_a_cel3
asmDiskGroupName: DATAC1
asmDiskName: DATAC1_FD_05_EXDX5_TVP_A_CEL3
asmFailGroupName: EXDX5_TVP_A_CEL3
availableTo:
cachedBy: FD_05_exdx5_tvp_a_cel3
cachingPolicy: default
cellDisk: FD_05_exdx5_tvp_a_cel3
comment: "Cluster RacA diskgroup DATAC1"
creationTime: 2015-09-28T16:38:00+01:00
diskType: FlashDisk
errorCount: 0
id: 8ccd73e7-0fdf-4680-8e17-8c0fe0718b6e
offset: 91.625G
size: 257G
status: active
name: FS_DG1_FD_05_exdx5_tvp_a_cel3
asmDiskGroupName: FS_DG1
asmDiskName: FS_DG1_FD_05_EXDX5_TVP_A_CEL3
asmFailGroupName: EXDX5_TVP_A_CEL3
availableTo:
cachedBy: FD_05_exdx5_tvp_a_cel3
cachingPolicy: default
cellDisk: FD_05_exdx5_tvp_a_cel3
comment: "Cluster RacA diskgroup FS_DG1"
creationTime: 2015-09-28T16:37:59+01:00
diskType: FlashDisk
errorCount: 0
id: 3d93b6f2-4677-44cb-bf1e-2a7ce5bc5a70
offset: 74.625G
size: 17G
status: active
name: RECOC1_FD_05_exdx5_tvp_a_cel3
asmDiskGroupName: RECOC1
asmDiskName: RECOC1_FD_05_EXDX5_TVP_A_CEL3
asmFailGroupName: EXDX5_TVP_A_CEL3
availableTo:
cachedBy: FD_05_exdx5_tvp_a_cel3
cachingPolicy: default
cellDisk: FD_05_exdx5_tvp_a_cel3
comment: "Cluster RacA diskgroup RECOC1"
creationTime: 2015-09-28T16:38:01+01:00
diskType: FlashDisk
errorCount: 0
id: b11670a7-6bed-4433-8de9-91666cb33a33
offset: 348.625G
size: 42G
status: active
Status should be normal for the cell disks and active for the grid disks. All of the creation times should also match the insertion time of the replacement disk. If they are not, then the old disk entries may still be present and the disk replacement did not complete successfully. If this is the case, refer to the SR owner for further assistance.
Note: The lun name attribute will also be shown in the original alert generated by the storage cell.
4. To confirm that the status of the re-balance, connect to the ASM instance on a database node, and validate the disks were added back to the ASM diskgroups and a re-balance is running:
SQL> set linesize 132
SQL> col path format a50
SQL> select group_number,path,header_status,mount_status,name from V$ASM_DISK where path like 'ý_05_exdx5_tvp_a_cel3';
GROUP_NUMBER PATH HEADER_STATU MOUNT_S NAME
------------ -------------------------------------------------- ------------ ------- ------------------------------
4 o/192.168.10.28;192.168.10.29/BACKUP_FD_05_exdx5_t MEMBER CACHED BACKUP_FD_05_EXDX5_TVP_A_CEL3
vp_a_cel3
2 o/192.168.10.28;192.168.10.29/FS_DG1_FD_05_exdx5_t MEMBER CACHED FS_DG1_FD_05_EXDX5_TVP_A_CEL3
vp_a_cel3
1 o/192.168.10.28;192.168.10.29/DATAC1_FD_05_exdx5_t MEMBER CACHED DATAC1_FD_05_EXDX5_TVP_A_CEL3
vp_a_cel3
3 o/192.168.10.28;192.168.10.29/RECOC1_FD_05_exdx5_t MEMBER CACHED RECOC1_FD_05_EXDX5_TVP_A_CEL3
vp_a_cel3
GROUP_NUMBER PATH HEADER_STATU MOUNT_S NAME
------------ -------------------------------------------------- ------------ ------- ------------------------------
SQL> select * from gv$asm_operation;
INST_ID GROUP_NUMBER OPERA STAT POWER ACTUAL SOFAR EST_WORK EST_RATE
---------- ------------ ----- ---- ---------- ---------- ---------- ---------- ----------
EST_MINUTES ERROR_CODE
----------- --------------------------------------------
2 3 REBAL WAIT 10
1 3 REBAL RUN 10 10 1541 2422
7298 0
An active re-balance operation can be identified by STATE=RUN. The column group_number and inst_id provide the diskgroup number of the diskgroup been re-balanced and the instance number where the operation is running. The re-balance operation is complete when the above query returns "no rows selected".
Validate the expected number of griddisks per failgroup and diskgroup.
SQL> select group_number,failgroup,mode_status,count(*) from v$asm_disk group by group_number,failgroup,mode_status;
The re-balance operation has completed when there are no "group_number" values of 0, and each disk group has count the same number of disks.
5. If the disk replaced was a system disk in slot 0 or 1, then the status of the OS volume should also be checked. Login as 'root' on the Storage cell and check the status using the same 'df' and 'mdadm' commands listed above:
[root@dbm1cel1 ~]# mdadm -Q --detail /dev/md5
/dev/md5:
Version : 0.90
Creation Time : Tue Mar 31 12:14:45 2015
Raid Level : raid1
Array Size : 10485696 (10.00 GiB 10.74 GB)
Used Dev Size : 10485696 (10.00 GiB 10.74 GB)
Raid Devices : 2
Total Devices : 3
Preferred Minor : 5
Persistence : Superblock is persistent
Update Time : Mon Apr 27 02:53:48 2015
State : active, degraded
Active Devices : 1
Working Devices : 2
Failed Devices : 1
Spare Devices : 1
UUID : e75c1b6a:64cce9e4:924527db:b6e45d21
Events : 0.215
Number Major Minor RaidDevice State
3 65 213 0 spare rebuilding /dev/nvme0n1p5
1 8 21 1 active sync /dev/nvme1n1p5
2 8 5 - faulty spare
[root@dbm1cel1 ~]#
While the system disk is rebuilding, the state will show as "active, degraded" or "active,degraded,recovering" with one indicating it is rebuilding and the 3rd being the 'faulty' disk. After rebuild has started, re-running this command will give a "Rebuild Status: X% complete" line in the output. When the system disk sync status is complete, the state should return to "clean" only with 2 devices.
If the status of any of the above checks (firmware, grid disk / cell disk creation, re-balance) is not successful, re-engage Oracle Support to get the correct action plan to manually complete the required steps.
PARTS NOTE:
Refer to the Exadata Database Machine Owner's Guide Appendix D for part information.
REFERENCE INFORMATION:
Exadata Database Machine Documentation:
Exadata Database Machine Owner's Guide is available on the Storage Server OS image in /opt/oracle/cell/doc/welcome.html
http://amomv0115.us.oracle.com/archive/cd_ns/E13877_01/welcome.html
Oracle Exadata Storage Server X5-2 Extreme Flash Service Manual
Mirror partitions not resynced after replacing failed system drive (lun 0 or 1) (Doc ID 1316829.1)
Internal Only References:
- Replacing a physicaldisk on a storage cell , cellcli list physicaldisk reports two entries on same slot but LUN is not created (Doc ID 1352938.1)
- Exadata Documentation - http://amomv0115.us.oracle.com/archive/cd_ns/E50790_01/doc/doc.121/e51951/storage.htm#DBMMN21046
Attachments
This solution has no attachment