Sun Storage 3000 Arrays: Troubleshooting Disk Failures

Asset ID:	1-75-1008190.1
Update Date:	2017-10-05
Keywords:

Solution Type Troubleshooting Sure

Solution 1008190.1 : Sun Storage 3000 Arrays: Troubleshooting Disk Failures

Applies to:

Sun Storage 3511 SATA Array - Version Not Applicable and later
Sun Storage 3320 SCSI Array - Version Not Applicable and later
Sun Storage 3510 FC Array - Version Not Applicable and later
Sun Storage 3310 Array - Version Not Applicable and later
All Platforms

Purpose

This document addresses troubleshooting disk devices in the following Sun Storage Arrays running 4.x firmware:

Sun Storage 3310 Array
Sun Storage 3320 Array
Sun Storage 3510 Array
Sun Storage 3511 Array

Symptoms may include:

sccli> show disk shows failed or missing drive.
sccli> show logical-drive indicates degraded or dead logical drive.
sccli> show events reports drive failure, rebuild, bad blocks or recoverable error messages.
drive replacement failures
amber drive LED indicating failure
multiple disk drive failures

Troubleshooting Steps

Before replacing a failed drive, save the NVRAM configuration settings as described in <Document 1012254.1> Sun Storage 3000 Arrays: Saving and Restoring NVRAM and Logical Drive Configuration.

Step 1: Verify the physical disk status is ONLINE.

Issue a sccli>show disks command, or use the firmware interface steps described in "Viewing the Status of a Physical Drive" in the Sun StorEdge 3000 Family RAID Firmware 4.2x User's Guide.

Refer to the Physical Drive Status Table for a list of all possible drive statuses.

Drives reported as BAD, ABSENT and MISSING, or drives not listed but physically installed in the array indicate drive issues have occurred.

The state NONE USED can be seen on a 3510 or 3511 array if there has been a loop issue and the drive was temporarily taken off the loop and then scanned back in by the controller. In this circumstance, the drive does not require replacing.

Step 2: Verify the logical drive status is GOOD, INITING, or REBUILDING.

Issue a sccli> show logical-drive command, or use the firmware interface steps described in "Viewing the Logical Drive Status Table" in the Sun StorEdge 3000 Family RAID Firmware 4.2x User's Guide, which also includes a list of possible logical drive states.

Step3: View the event log to identify any problem events involving disk drives.

Use sccli> show events to display events since the last reboot of the array.

Use sccli> show persistent to display events including ones prior to the last reboot of the array. This command only works for arrays with 4.x firmware and sccli 2.x versions, and it requires network access. In-band sccli does not allow execution of this command.

Step 4: If more than one drive is in a MISSING or BAD state, or a logical drive is in a FATAL FAIL state, determine that this is not a redundant loop failure as described in <Document 1006856.1> Sun Storage 3510 and 3511 Arrays: Troubleshooting Redundant Loop Failures.

For 3510 arrays only, issue the sccli>show disks command to determine if the drives are Fujitsu drives and meet all the following criteria:

media scan is enabled and running
controller just reseated, replaced or failed
drive firmware revisions are older than: MAP3147FC with 1701, MAS3735FC with 0901, MAS3367FC with 0901

If all the above conditions are met, the resolution is to upgrade the drive firmware to the latest revision available from MOS.

Step 5: If more than one drive is in a MISSING or BAD state, or the logical drive is in a FATAL FAIL state, follow the steps described in "Recovering From Fatal Drive Failure" in the Sun StorEdge 3000 Family Installation, Operation, and Service Manual. This manual is different for each array model, and you can find it from the Oracle Disk Storage Systems Documentation main array documentation page.

Step 6: Determine if one or more drives is not listed in the sccli> show disks output but is physically in the array.

Check the persistent event log for controller reboots. If a drive has issues during a controller reboot, the drive can be ignored by the controller and therefore not listed in the show disks output. Try reseating the problem drive. The following two documents will help you identify the correct drive.

For the 3310 and 3320, refer to <Document 1012313.1> Sun Storage 3310 and 3320 Arrays: How to Identify Disk IDs and Correct Backend Cabling.

For the 3510 and 3511, refer to <Document 1007692.1> Sun Storage 3510 and 3511 Arrays: How to Identify Switch IDs, Disk IDs and Correct Backend Cabling.

Check the event log to see if the drive has been scanned in by the controller automatically. If so, check the state of the drive in the show disks output. If not listed, manually scan the drive ID to see if it is seen by the controllers. Refer to "Scanning the New Drive and Related Procedures for RAID Arrays" in the Sun StorEdge 3000 Family FRU Installation Guide.

Step 7: For multiple drives with an amber LED status, or if a logical drive is in an INCOMPLETE or DRV ABSENT state where there are more than 2 drives missing or failed, power cycle the array following the procedures in "Checking and Performing the Correct Power-up Sequence" in the Sun StorEdge 3000 Family FRU Installation Guide.

Step 8: If the logical drive is in a degraded (DRV FAIL) state and you have one failed drive (BAD or ABSENT), replace the drive following the procedures in "Replacing a Disk Drive" in the Sun StorEdge 3000 Family FRU Installation Guide.

Step 9: Verify the state of the new disk is FRMT, NEW, USED, or GOOD by following Step 1 above.

Step 10: Verify the state of the logical drive by following Step 2 above to verify the status is either GOOD or REBUILDING.

If the target logical drive status is GOOD, the spare disk is successfully protected and is now integrated into the logical drive, and the replacement disk drive is available to be assigned as a global spare. See "Assigning a Disk Drive as a Spare" in the Sun StorEdge 3000 Family FRU Installation Guide.

If the target logical drive status is DEGRADED, follow the steps in "Assigning a Disk Drive as a Spare" and then initiate a rebuild operation.

Step 11: Run the command sccli>show events to determine if there are "Drive Recovered Error Reported" messages.

If there are, refer to <Document 1008255.1> Sun StorEdge[TM] 351x Arrays: How to Handle "Drive Recovered Error Reported" and Other Disk Drive Messages.

Step 12: If disk problems persist, refer to <Document 1011431.1> Troubleshooting Sun Storage 3000 Array Series Hardware.

Do you still have questions? You can use My Oracle Support Communities. Communities put you in touch with industry professionals like yourself. They are monitored by Oracle support engineers, so you can expect reliable and correct answers. Ask questions and see what others are asking about in the Disk Storage 2000, 3000, 6000 RAID Arrays & JBODs Community.

Previously Published As
89045

References

<NOTE:1012313.1> - Sun Storage 3310 and 3320 Arrays: How to Identify Disk IDs and Correct Backend Cabling
<NOTE:1008255.1> - Sun Storage 35xx Arrays: How to Handle "Drive Recovered Error Reported" and Other Disk Drive Messages
<NOTE:1006856.1> - Sun Storage 3510 and 3511 Arrays: Troubleshooting Redundant Loop Failures
<NOTE:1011431.1> - Troubleshooting Sun Storage 3000 Array Series Hardware
<NOTE:1012254.1> - Sun Storage 3000 Arrays: Saving and Restoring NVRAM and Logical Drive Configuration

Attachments

This solution has no attachment