HLRR on T1200: syscheck error - meta: FAILURE:: md status check failed.

Asset ID:	1-71-2339843.1
Update Date:	2018-01-04
Keywords:

Solution Type Technical Instruction Sure

Solution 2339843.1 : HLRR on T1200: syscheck error - meta: FAILURE:: md status check failed.

Applies to:

BNS Platform Hardware - Version HLRR 4.0 and later
Information in this document applies to any platform.

Goal

To understand, and attempt to avoid a Disaster Recovery (DR) of a T1200 HLRR 4.1 server because the following syscheck disk failure.

* meta: FAILURE:: MAJOR::3000000000000002 -- Server Internal Disk Error
* meta: FAILURE:: md status check failed.

Solution

*****Update: Procedural Resolution Found. Please see updated procedure (DiskDrive_Replacement_on_a_T1200_EXHR_server_v3.pdf)*****

If 'syscheck' and 'alarmMgr' return the following errors. Please open a SR and request the Oracle DSR TAC engineer to open an internal case with Platform Engineering/TPD.

$ sudo syscheck -v disk all
Running modules in class disk...
* meta: FAILURE:: MAJOR::3000000000000002 -- Server Internal Disk Error
* meta: FAILURE:: md status check failed.
One or more module in class "disk" FAILED

A strange behavior, and failure, has been observed during a replacement of a hard-drive, specifically during the 'cpDiskCfg' command. This failure has forced a Disaster Recovery (DR) of the T1200 server.

# /usr/TKLC/plat/sbin/cpDiskCfg
/proc/scsi/scsi shows a device on 0:0:0:0
/proc/scsi/scsi shows a device on 2:0:1:0
/proc/scsi/scsi shows a device on 2:0:2:0
/proc/scsi/scsi shows a device on 2:0:3:0
probing for 'sdc' on SCSI 6:0:2:0
ERROR: could not open /sys/class/scsi_host/host6/scan for writing, No such file or directory

A troubleshooting session with TPD engineering should be done prior to taking any actions or replacing the hard-drive. If possible, an alternative to a DR should be found.

Collect the savelogs_plat:

$ sudo /usr/TKLC/plat/sbin/savelogs_plat --workdir=/var/TKLC/db/filemgmt

or if the T1200 server reporting the alarm is a NOAM server use:

$ sudo /usr/TKLC/plat/sbin/savelogs_plat --disable-test=resolv --workdir=/var/TKLC/db/filemgmt

Reference CPE-940

References

<NOTE:2244802.1> - Manage Disk Replacement in T1200 Server
https://myjira.us.oracle.com/browse/CPE-940

Attachments

This solution has no attachment