HALRT-02002: System hard disk predictive failure

Asset ID:	1-72-1113003.1
Update Date:	2016-11-08
Keywords:

Solution Type Problem Resolution Sure

Solution 1113003.1 : HALRT-02002: System hard disk predictive failure

Applies to:

Oracle Exadata Storage Server Software - Version 11.2.1.3.1 and later
Exadata Database Machine V2
Exadata Database Machine X2-2 Full Rack
Exadata Database Machine X2-2 Half Rack
Exadata Database Machine X2-2 Hardware
Information in this document applies to any platform.
***Checked for relevance on 30-Jul-2012***

Symptoms

Oracle Exadata alerts are generated when a system disk enters predictive failure status.

Cause

The physical drive is in predictive failure status.

Impact:

You may need to replace a physical disk because the disk is in predictive failure status. The predictive failure status indicates that the physical disk will soon fail, and should be replaced at the earliest opportunity. The Oracle ASM disks associated with the grid disks on the physical drive are automatically dropped, and an Oracle ASM rebalance will relocate the data from the predictively failed disk to other disks.

If the drop did not complete before the physical drive dies, then refer to Document 1112994.1HALRT-02001: System hard disk failure

Solution

To replace a disk due to disk failure, perform the following procedure:

1. To identify the physical disk to be replaced, use the following commands:

CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status="predictive failure" DETAIL

name: 28:1

deviceId: 19

diskType: HardDisk

enclosureDeviceId: 28

errMediaCount: 0

errOtherCount: 0

foreignState: false

luns: 0_1

makeModel: "SEAGATE ST360057SSUN600G"

physicalFirmware: 0705

physicalInterface: sas

physicalSerial: E07L8E

physicalSize: 558.9109999993816G

slotNumber: 1

status: predictive failure

Use the disk slot number shown in the slotNumber attribute to locate the affected disk.

2. Wait until the Oracle ASM disks associated with the grid disks on this physical drive have been successfully dropped by querying HEADER_STATUS in V$ASM_DISK on the Oracle ASM instance. For dropped disk HEADER_STATUS will be FORMER.

Caution:
The disks in the first two slots are system disks which store the operating system and Oracle Exadata Storage Server Software. One system disk must be in working condition to keep up the cell.

Wait until ALTER CELL VALIDATE CONFIGURATION shows no mdadm errors, which indicates the system disk resync has completed, before replacing the other system disk.

3. Replace the physical disk .The physical disk is hot-pluggable, and can be replaced when the power is on.

When you remove the disk, you will get an alert. The grid disks and cell disks that existed on the previous disk in the slot will be re-created on the new physical disk. If those grid disks were part of an Oracle ASM disk group, then they will be added back to the disk group and the data will be rebalanced based on disk group redundancy and the asm_power_limit parameter.

Note:
When you replace a physical disk, the disk must be acknowledged by the RAID controller before you can use it. This does not take a long time, but you should use the LIST PHYSICALDISK command to ensure the status is NORMAL.

Oracle ASM rebalance occurs when dropping or adding a disk. To check the status of the rebalance, do the following:

The rebalance operation may have been successfully run. Check the Oracle ASM alert logs to confirm.

The rebalance operation may be currently running. Check the GV$ASM_OPERATION view to determine if the rebalance operation is still running.

The rebalance operation may have failed. Check the GV$ASM_OPERATION.ERROR view to determine if the rebalance operation has failed.

Rebalance operations from multiple disk groups can be done on different Oracle ASM instances in the same cluster if the physical disk being replaced contains Oracle ASM disks from multiple disk groups. One Oracle ASM instance can execute one rebalance operation at a time. If all Oracle ASM instances are busy, then rebalance operations will be queued.

References:

Sun Fire™ X4170, X4270, and X4275 Servers Service Manual

Attachments

This solution has no attachment