Troubleshooting Sun StorEdge[TM] T3 and 6120 Disk Failures

Asset ID:	1-75-1009745.1
Update Date:	2017-07-19
Keywords:

Solution Type Troubleshooting Sure

Solution 1009745.1 : Troubleshooting Sun StorEdge[TM] T3 and 6120 Disk Failures

Applies to:

Sun Storage 6120 Array - Version All Versions to All Versions [Release All Releases]
Sun Storage T3+ Array - Version All Versions to All Versions [Release All Releases]
Sun Storage T3 Array - Version All Versions to All Versions [Release All Releases]
All Platforms

Purpose

This document addresses the identification of failed or failing disk drive(s) in the array via various symptoms provided.

Symptoms:

Performance degraded
Disk Fault LED lit/on
Global Fault LED lit/on

Please validate that each troubleshooting step below is true for your environment. The steps will provide instructions or a link to a document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.

Troubleshooting Steps

1. Validate that you can telnet into your 6120/T3.

If you cannot log into your array via telnet, <Document: 1012660.1> Sun Storage 3000 and T3 Arrays: Troubleshooting Serial and Network Management Port Connections
Otherwise continue to Step 2.

2. Validate disk drive status against a detailed FRU status report by executing the command fru stat.

example:

CTLR    STATUS   STATE       ROLE        PARTNER    TEMP
------ ------- ---------- ---------- -------    ----
u1ctr   ready    enabled     master      u2ctr      41.0
u2ctr   ready    enabled     alt master u1ctr      37.0

DISK    STATUS   STATE       ROLE        PORT1      PORT2      TEMP VOLUME
------ ------- ---------- ---------- --------- --------- ---- ------
u1d1    ready    enabled     data disk   ready      ready      26    v0
u1d2    ready    enabled     data disk   ready      ready      35    v0
u1d3    ready    enabled     data disk   ready      ready      42    v0

If status is ready-enabled, go to Step 5.
If status is substituted, go to Step 7.
If status is ready-disabled, go to Step 3.
If status is fault-disabled, go to Step 4.
If there are more than one drive in a state other than ready-enabled, go to Step 5.

3. Validate local and/or global hot spare presence and state.

Verify the presence and status of a hotspare by:
a) executing the command vol list to confirm the existence of local hotspare under the column "standby"
b) executing the command global_standby list to confirm the existence of global hotspare

The command global_standby list is not available on arrays running firmware lower than a 3.x version. You can use the command ver, to see the version.

If a hot spare is present, go to Step 4, to verify if a reconstruction is in progress.
If there are no hot spares configured, go to Step 8.

4. Verify the presence of an ongoing reconstruction by:

Executing the command proc list to confirm the existence of a process vol recon

Example:
myarray:/:<1>proc list
VOLUME          CMD_REF PERCENT    TIME COMMAND
tray0_pool1             21568      74 53928:47 vol verify
tray1_pool2             25666      27 178:04 vol recon <--- reconstruction process.

If there is no reconstruction process, AND the drive is ready-disabled, go to Step 8.
If there is no reconstruction process, AND the drive is fault-substituted, go to Step 7.
If a reconstruction is ongoing, allow it to complete before preceding, and re-evaluate the drive status in Step 2.
If there is no ongoing reconstruction and the drive isn't in a fault-substituted state, go to Step 8.

5. Check the status of the volume associated to the drive by executing the command vol stat.

myaray:/:<3>vol stat

v0 u1d1 u1d2 u1d3 u1d4 u1d5 u1d6 u1d7 u1d8 u1d9
mounted 0 0 0 0 0 0 0 0 0
myarray:/:<4>

If the volume is "mounted", but more than one drive has a non-zero status, go to Step 8.
If volume is "unmounted", you have sustained a drive failure beyond the capabilities of your RAID level for the volume.
Otherwise continue to Step 6.

6. Validate LED existence against disk drive in ready-enabled state.

If there is an amber fault LED or any other LED lit for the disk drive, go to Step 8.
If there is no LED's lit. You have verified that the disk drive is healthy.

7. You have validated that a drive has failed in the array, and requires replacement.

Collect the the following information and contact Oracle Support for a drive replacement:

The output of:
fru stat
vol stat
proc list
fru list

OR

Collect the array data from a Solaris host by running: /opt/SUNWexplo/explorer -w !default,t3extended

8. At this point, if you have validated that each troubleshooting step above is true for your environment, and the issue still exists, further troubleshooting is required.

Please open a Service Request with Oracle Support.

Please include:

Statement of Symptoms you see that pertain to the disk drive
Collect the array data from a Solaris host by running: /opt/SUNWexplo/explorer -w !default,t3extended

T3, T3+, normalized, failed hard drive, vol verify, multiple disk failure, Audited
Previously Published As
86534

Change History
Date: 2007-11-13
User Name: 7058
Action: Approved
Comment: Internal link referenced in external section.
Fixed.
Version: 7
Date: 2007-11-13
User Name: 7058
Action: Update Started
Comment: Fix link
Version: 0
Date: 2007-07-16
User Name: 7058
Action: Approved
Comment: Notes for Normalizaton:
Subset of: N/A
Subset Root path: N/A
References: 86540, 52569
Project: Minnow Normalization

Do you still have questions? You can use My Oracle Support Communities. Communities put you in touch with industry professionals like yourself. They are monitored by Oracle support engineers, so you can expect reliable and correct answers. Ask questions and see what others are asking about in the Disk Storage 2000, 3000, 6000 RAID Arrays & JBODs Community.

Attachments

This solution has no attachment