![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||
Solution Type Technical Instruction Sure Solution 1996075.1 : On Exadata during disk replacement physical disk shows normal but cell disk shows proactive failure status
In this Document
Created from <SR 3-10507604991> Applies to:Exadata Database Machine X2-2 Hardware - Version All Versions and laterInformation in this document applies to any platform. GoalThe goal of this document is to describe a situation where after disk replacement the physical disk status is "normal" however the cell disk status shows as "proactive failure". The reason for this is that 1) The cell disk is still pointing to old disk serial number and not dropped automatically showing status "proactive failure" 2) Due to wrong pointer the auto-create of new cell disk and grid disk failed. 3) Even manually dropping the cell disk is failing with error CELL-04519: Cannot complete the drop of cell disk SolutionActual machine/disk/cell/grid disks have been renamed to generic for security purpose. Steps to manually drop cell disk and create cell/grid disks on Exadata to fix above situation. 1. In this example we are showing the issue is with disk 11. It has been recently replaced. You confirm the status in alerthistory and notice the old cell/grid disks were NOT automatically dropped and shows status at "Proactive failure". alerthistory.out:
================== 55_3 2015-03-25T23:39:18+01:00 critical "Data hard disk failed. Status : NOT PRESENT Manufacturer : HITACHI Model Number : H723*******3.0T Size : 3.0TB Serial Number : 121*****ZD Firmware : A690 Slot Number : 11 Cell Disk : CD_11_cellnode03a Grid Disk : DATA_CD_11_cellnode03a, DBFS_DG_CD_11_cellnode03a, RECO_CD_11_cellnode03a" celldisk-detail.out:
=============== name: CD_11_cellnode03a comment: creationTime: 2013-06-24T10:58:13+02:00 deviceName: /dev/sdad devicePartition: /dev/sdl diskType: HardDisk errorCount: 5 freeSpace: 0 id: d5c3ef97***********14956469694 interleaving: none lun: 0_11 physicalDisk: R5S8ZD <<<<<==== cell disk pointing to old physical disk which is removed raidLevel: 0 size: 2793.953125G status: proactive failure <<<<<<======= griddisk-detail.out:
=============== name: DATA_CD_11_cellnode03a asmDiskGroupName: DATA asmDiskName: DATA_CD_11_cellnode03A asmFailGroupName: cellnode03A availableTo: cachingPolicy: default cellDisk: CD_11_cellnode03a comment: creationTime: 2013-06-24T11:00:53+02:00 diskType: HardDisk errorCount: 1 id: 1ccd50e**************15ecf offset: 32M size: 2208G status: proactive failure name: DBFS_DG_CD_11_cellnode03a asmDiskGroupName: DBFS_DG asmDiskName: DBFS_DG_CD_11_cellnode03A asmFailGroupName: cellnode03A availableTo: cachingPolicy: default cellDisk: CD_11_cellnode03a comment: creationTime: 2013-06-24T11:00:41+02:00 diskType: HardDisk errorCount: 1 id: a39f54be*************674070d621 offset: 2760.15625G size: 33.796875G status: proactive failure name: RECO_CD_11_cellnode03a asmDiskGroupName: RECO asmDiskName: RECO_CD_11_cellnode03A asmFailGroupName: cellnode03A availableTo: cachingPolicy: none cellDisk: CD_11_cellnode03a comment: creationTime: 2013-06-24T11:00:58+02:00 diskType: HardDisk errorCount: 1 id: aa8f912a*************f6528a5c3f offset: 2208.046875G size: 552.109375G status: proactive failure 2. Since the cell/grid disks pointing to old physical disk was not dropped automatically the new cell/grid disks auto-create also fails
alerthistory.out:
================== 55_4 2015-03-31T10:26:00+02:00 warning "Oracle Exadata Storage Server failed to auto-create cell disk and grid disks on the newly inserted physical disk. Physical Disk : 20:11 Status : NORMAL Manufacturer : HITACHI Model Number : H723********.0T Size : 3.0TB Serial Number : 14******GK Firmware : A690 Slot Number : 11 "
3. Next we confirm the physical disk and lun are fine.
physicaldisk-detail.out:
======================== name: 20:11 deviceId: 21 diskType: HardDisk enclosureDeviceId: 20 errMediaCount: 0 errOtherCount: 0 luns: 0_11 makeModel: "HITACHI H7*********.0T" physicalFirmware: A690 physicalInsertTime: 2015-03-31T13:42:17+02:00 physicalInterface: sas physicalSerial: RJ52GK <<<<==== Correct physical disk serial for new disk physicalSize: 2794.5199813842773G slotNumber: 11 status: normal lun-detail.out: =============== name: 0_11 cellDisk: CD_11_cellnode03a deviceName: /dev/sdad diskType: HardDisk id: 0_11 isSystemLun: FALSE lunSize: 2793.966796875G lunUID: 0_11 physicalDrives: 20:11 raidLevel: 0 lunWriteCacheMode: "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU" status: normal 4. You need to manually drop the cell disk pointing to wrong disk. Drop the celldisk with force. cellcli> drop celldisk CD_11_cellnode03a force;
The above command might mail with below error. CELL-04519: Cannot complete the drop of cell disk: CD_11_cellnode03a. Received error: CELL-02583: The operation is not permitted on this cell disk. Cell disks not dropped: CD_11_cellnode03a 5. If the drop cell disk step fails, which actually happens in most cases, you need to reboot the affected cell node and retry drop cell disk which should succeed now. "shutdown -r now"
cellcli> drop celldisk CD_11_cellnode03a force;
Dropping cell disk will drop the griddisks also. 6. Validate is done by running command. It should be null. cellcli> list celldisk CD_11_cellnode03a detail.
7. Create the celldisk manually and assign the same lun 0_11 cellcli> create celldisk CD_11_cellnode03a lun=0_11
8. Create the new grid disks in the order BASED ON THE OFFSET, using the same sizes as they had earlier. You can refer sundiag logs or command to find size reference from other grid disks as well. Please review this carefully.
Run the below query on cell disk 10 to get size/offset reference values
CellCLI> list griddisk where celldisk=CD_10_cellnode03a attributes name,size,offset Create the new grid disks in the order Based on the Offset, using the sizes shown from the previuos command.
CellCLI> create griddisk DATA_CD_11_cellnode03a celldisk=CD_11_cellnode03a,size=2208G CellCLI> create griddisk RECO_CD_11_cellnode03a celldisk=CD_11_cellnode03a,size=552.109375G CellCLI> create griddisk DBFS_DG_CD_11_cellnode03a celldisk=CD_11_cellnode03a,size=33.796875G Run below query on the new grid disks and make sure all the offsets are matching ( in the third column):
CellCLI> list griddisk where celldisk=CD_11_cellnode03a attributes name,size,offset 9. At the ASM level, the old diskgroups were dropped while dropping celldisk. Add the griddisks to the ASM diskgroups by login into +ASM1 instance and add the new disk. Set the rebalance power higher (11) to perform faster.
Add each griddisk to the diskgroup by running:
sql> alter diskgroup DATA add disk '<path for data>' rebalance power 11; sql> alter diskgroup RECO add disk <path for reco>' rebalance power 11; sql> alter diskgroup DBFS_DG add disk <path for reco>' rebalance power 11; 10. Run a sundiag again for this cell node to verify "Normal" status for cell/grid disks. Attachments This solution has no attachment |
||||||||||||||
|