Sun Storage 3000 Arrays: How to Resolve the Degraded Status of a Redundant (RAID-5 or RAID-1) Logical Drive

Asset ID:	1-72-1601641.1
Update Date:	2017-09-19
Keywords:

Solution Type Problem Resolution Sure

Solution 1601641.1 : Sun Storage 3000 Arrays: How to Resolve the Degraded Status of a Redundant (RAID-5 or RAID-1) Logical Drive

Applies to:

Sun Storage 3511 SATA Array - Version Not Applicable and later
Sun Storage 3320 SCSI Array - Version Not Applicable and later
Sun Storage 3310 SCSI Array - Version Not Applicable and later
Sun Storage 3510 FC Array - Version Not Applicable and later
Information in this document applies to any platform.

Symptoms

This document provides the procedure to return a DEGRADED logical drive, on a Sun Storage 3310, 3320, 3510 or 3511 Array, to a GOOD status. The information in this document is primarily for arrays running 3.x firmware, but may be used on a 4.x array assuming the initial set of conditions are identical. Normally, for arrays running 4.x firmware, you should follow the procedures in "Rebuilding a Logical Drive" from the Sun StorEdgeTM 3000 Family RAID Firmware 4.2x User’s Guide.

CAUTION: This procedure is to be used only when a logical drive is in the DEGRADED status and if there is no spare drive is ready for logical drive rebuild. It should not be used when the logical drive is in the DEAD status. Using this procedure for a dead logical drive may result in lost data.

The problem will reveal itself with the data collected from the sccli interface. In this example, you can see two Degraded RAID-5 Logical Drives. show logical-drives lets us know there is a failed disk in each LD. show disks lets us know what the two good disks are. However, there is nothing reported that lets us know which disks have failed.

sccli> show disks
Ch     Id      Size   Speed LD     Status     IDs                      Rev
----------------------------------------------------------------------------
2(3)   0 136.73GB   200MB NONE    USED     SEAGATE ST314680FSUN146G 0207
                                                   S/N 3HY065RD00007308
                                                  WWNN 20000004CF564B0F
2(3)   1 136.73GB   200MB ld0    ONLINE     SEAGATE ST314680FSUN146G 0207
                                                   S/N 3HY0CGR100007328
                                                  WWNN 20000004CFD8E546
2(3)   2 136.73GB   200MB ld0    ONLINE     SEAGATE ST314680FSUN146G 0307
                                                   S/N 3HY0DY4100007328
                                                  WWNN 20000004CFD8E21C
2(3)   3 136.73GB   200MB NONE    USED      SEAGATE ST314680FSUN146G 0207
                                                   S/N 3HY0DYZT00007327
                                                  WWNN 20000004CFD8DF4A
2(3)   4 136.73GB   200MB ld1    ONLINE     SEAGATE ST314680FSUN146G 0307
                                                   S/N 3HY0DYNF00007327
                                                  WWNN 20000004CFD8DF88
2(3)   5 136.73GB   200MB ld1    ONLINE     SEAGATE ST314680FSUN146G 0307
                                                   S/N 3HY0DX3N00007328
                                                  WWNN 20000004CFD8E010

sccli> show logical-drives
LD    LD-ID        Size Assigned Type   Disks Spare Failed Status
------------------------------------------------------------------------
ld0   66EAFEB0   8.00GB Primary   RAID5 2     0      1    Degraded
                         Write-Policy Default          StripeSize 128KB
ld1   39FCC1C4 15.65GB Primary   RAID5 2     0      1    Degraded
                         Write-Policy Default          StripeSize 128KB

Changes

There have been no changes per se. The problem manifests itself via a series of events on the array. The event logs may show some disk failures, drive channel failures, loop initialization's (LIP's), usually followed by a controller reset. The timing and combination of these Events is unknown. However, when the array comes back online, the result is a Degraded Logical Drive with no associated failed disks.

Cause

The theory behind this problem is that the controllers temporarily lost access to the disk. It may be an intermittent disk problem, or maybe even a data path problem. If show-disks were captured real-time (when the problem was occuring), it would probably report the drives as BAD, or ABSENT.

sccli> show disks
Ch     Id      Size   Speed LD     Status     IDs                      Rev
----------------------------------------------------------------------------
2      0       N/A   N/A    ld0    BAD        SEAGATE ST314680FSUN146G 0207
                                                   S/N 3HY065RD00007308
2      3       N/A   N/A    ld1    BAD        SEAGATE ST314680FSUN146G 0207
                                                   S/N 3HY0DYZT00007327

If the event logs are still reporting disk errors and failures even after a Controller reset, see <Document 1008190.1> Sun Storage 3000 Arrays: Troubleshooting Disk Failures.

When the controller resets, the BAD drives are re-discovered. The meta data on the drive identifies it as a member of the Logical Disk, but, the controller knows the data may be stale. Instead of incorporating the disk back in to the Logical Disk, (risking data corruption), it marks it as USED or FRMT.

Solution

You can repair the Logical Drive with either the sccli interface, or the tip/telnet interface for the array. Examples for both are provided.

Here, the sccli interface is used to repair Logical Disk 0. Disk 2.0 is configured as a local spare, followed by the command to rebuild. You can then monitor the progress.

sccli> configure local-spare 2.0 ld0
sccli> rebuild ld0
sccli> show logical-drives rebuilding
       LD      LD-ID     Status
       ------------------------
       ld0     66EAFEB0 7% complete

sccli> show logical-drives
       LD    LD-ID        Size Assigned Type   Disks Spare Failed Status
       ------------------------------------------------------------------------
       ld0   66EAFEB0   8.00GB Primary   RAID5 2     0      1    Rebuilding
                                Write-Policy Default          StripeSize 128KB
       ld1   39FCC1C4 15.65GB Primary   RAID5 2     0      1    Degraded
                                Write-Policy Default          StripeSize 128KB

sccli> show disks
       Ch     Id      Size   Speed LD     Status     IDs                      Rev
       ----------------------------------------------------------------------------
       2(3)   0 136.73GB   200MB ld0    REBUILD    SEAGATE ST314680FSUN146G 0207
                                                         S/N 3HY065RD00007308
                                                        WWNN 20000004CF564B0F
       2(3)   1 136.73GB   200MB ld0    ONLINE     SEAGATE ST314680FSUN146G 0207
                                                         S/N 3HY0CGR100007328
                                                        WWNN 20000004CFD8E546
       2(3)   2 136.73GB   200MB ld0    ONLINE     SEAGATE ST314680FSUN146G 0307
                                                         S/N 3HY0DY4100007328
                                                        WWNN 20000004CFD8E21C

Similarly, you can rebuild Logical Drive 1 by creating a local spare and starting a rebuild.

Create a Local Spare
   view and edit Drives->
   Select Drive 2.3->    Chl   ID Size(MB) Speed LG_DRV Status   Vendor Product ID
                         2(3) 3 140009   200MB   NONE   USED DRV SEAGATE ST314680FSUN146G
   add Local spare drive ->
   Select P1 ->
   Add Local Spare Drive ? Yes

Start your Rebuild
   view and edit Logical drives->
   Select P1 -> LG   ID       LV RAID   Size(MB) Status      1 2 3 O C #LN #SB #FL NAME
                P1   39FCC1C4 NA RAID5 16024     DRV FAILED i     7 T 2   0   1
   Rebuild logical drive ->
   Rebuild Logical Drive ? Yes

Monitor your Rebuild
   view and edit Logical drives->
   Select P1 -> LG   ID       LV RAID   Size(MB) Status      1 2 3 O C #LN #SB #FL NAME
                P1   39FCC1C4 NA RAID5 16024     DRV FAILED i     7 T 2   0   1
   Rebuild progress ->
   Rebuilding 28% Completed.

Attachments

This solution has no attachment