Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1431648.1
Update Date:2017-11-28
Keywords:

Solution Type  Sun Alert Sure

Solution  1431648.1 :   Failed Disk in a ZFS Storage Pool May Erroneously Consume Multiple Spare Disks Which Fail to Detach after Resilvering  


Related Items
  • Sun Storage 7410 Unified Storage System
  •  
  • Sun ZFS Storage 7320
  •  
  • Sun Storage 7210 Unified Storage System
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Sun ZFS Storage 7420
  •  
  • Sun Software - Generic
  •  
  • Sun Storage 7110 Unified Storage System
  •  
  • Sun ZFS Storage 7120
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun Alert
  •  
  • _Old GCS Categories>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  




In this Document
Description
Occurrence
Symptoms
Workaround
History
References


Applies to:

Sun ZFS Storage 7320 - Version Not Applicable and later
Sun ZFS Storage 7420 - Version Not Applicable and later
Sun Software - Generic - Version Not Applicable and later
Sun Microsystems > Storage - Disk > Unified Storage
Sun Microsystems > Storage Software
Information in this document applies to any platform.
_____________________________________



Date of Resolved Release: 07-Mar-2012

***Checked for relevance 09-Aug-2013***
_____________________________________

Description

Systems with a failed disk in a ZFS storage pool may erroneously consume multiple spare disks due to an interaction between the fault management infrastructure and the resilvering operation. These spares then fail to be detached after the resilvering process has completed, and will not be available for subsequent failures. The resulting ZFS storage pool structure then needs to be manually cleaned up to make the spare disks available again.

Occurrence

This issue can occur in the following releases:

  • Sun ZFS Storage Appliance Software 2010.Q3.1.0 to 2010.Q3.3.1

Notes:

1. This issue applies to all Sun ZFS Storage Appliance platforms:

  • Sun ZFS Storage Appliance 7110, 7120, 7210, 7310, 7410, 7320, 7420

2. To determine the software release on Appliance systems, do the following from the Browser User Interface (BUI) to access "info" about the release name:

a) Navigate to: Maintenance ->  System
b) Click on the "i" next to the "Current System Software" entry in the table of available releases.

A pop-up will show the release. For example: "2010.Q3.3.1"

Symptoms

The following symptoms may occur with this issue:

- Extremely long resilver times for the replaced disk
- Incorrect maintenance procedures such as physically removing drives without the prescribed administrative commands
- Failing components that cause multiple errors and faults

The following is an example "zpool status" output which shows the problem. In this example, the disk referenced as "c4t5000C5001A654F3Cd0" has failed, and been physically replaced. First the spare disc referenced as "c4t5000C5001A48520Dd0" was used to replace the failed disk, and subsequently, the spare disk referenced as "c4t5000C5001A402290d0" was also added to the ZFS Storage Pool configuration erroneously.

Note: Many forms of the invalid pool structure are possible.

In the following example, note that resilvering of the pool was still in progress at the time that the second spare was created:

pool: Pool_1
state: DEGRADED
scan: resilvered 42.7G in 2h18m with 0 errors on Fri Nov 12 12:03:57 2010
config:
NAME                                           STATE     READ WRITE CKSUM
   Pool_1                                    DEGRADED     0     0     0
raidz2-0                                     DEGRADED     0     0     0
  c4t5000C5001A6FBE6Ed0                      ONLINE       0     0     0
  c4t5000C5001A63F66Ad0                      ONLINE       0     0     0
  c4t5000C5001A69DECFd0                      ONLINE       0     0     0
  c4t5000C5001A69FE19d0                      ONLINE       0     0     0
  c4t5000C5001A71FB32d0                      ONLINE       0     0     0
  c4t5000C5001A533E93d0                      ONLINE       0     0     0
  c4t5000C50026518A89d0                      ONLINE       0     0     0
  c4t5000C5001A534BB6d0                      ONLINE       0     0     0
  c4t5000C5001A651A9Fd0                      ONLINE       0     0     0
  spare-9                                    DEGRADED     0     0     0
    replacing-0                              DEGRADED     0     0     0
      spare-0                                DEGRADED     0     0     0
        c4t5000C5001A654F3Cd0                UNAVAIL      0     0
0  cannot open
        c4t5000C5001A402290d0                ONLINE       0     0     0
      c4t5000C5001A0FE201d0                  ONLINE       0     0     0
    c4t5000C5001A48520Dd0                    ONLINE       0     0     0
raidz2-1                                     ONLINE       0     0     0
  c4t5000C5001A655CEAd0                      ONLINE       0     0     0
  c4t5000C5001A730BA4d0                      ONLINE       0     0     0
  c4t5000C5001A5347F1d0                      ONLINE       0     0     0
  c4t5000C5001A5352F4d0                      ONLINE       0     0     0
  c4t5000C5001A5354B6d0                      ONLINE       0     0     0
  c4t5000C5001A5365E6d0                      ONLINE       0     0     0
  c4t5000C50026520201d0                      ONLINE       0     0     0
  c4t5000C5001A6980EDd0                      ONLINE       0     0     0
  c4t5000C5001A7311D5d0                      ONLINE       0     0     0
  c4t5000C5001A7386D7d0                      ONLINE       0     0     0
logs
mirror-2                                     ONLINE       0     0     0
  c4tATASTECZEUSIOPS018GBYTESSTM0000C00D8d0  ONLINE       0     0     0
  c4tATASTECZEUSIOPS018GBYTESSTM0000E5994d0  ONLINE       0     0     0
cache
c0t0d0                                       ONLINE       0     0     0
c0t1d0                                       ONLINE       0     0     0

spares
c4t5000C5001A48520Dd0                        INUSE     currently in use
c4t5000C5001A402290d0                        INUSE     currently in use

Workaround

There is no workaround for this issue.

This issue is addressed in the following release:

For all Sun Storage 7000 Series Unified Storage Systems

  • Sun ZFS Storage Appliance Software 2010.Q3.4.0 and later

For a listing of ZFS Storage Appliance Software Releases and version information, please see <Document:2021771.1>

History

07-Mar-2012: Date of Resolved Release
09-Aug-2013: Checked for currency/relevance; no change in content


See Also CR 6981518 (Integrated into ak-2010Q3.2.1

    - The fix for 6981518 addressed the greatest part of the failure window.

    - The fix for 6999699 fixes other codepaths which could cause the same result

On some systems (eg: ZFSSA), further steps may be required.

In the following example, "c0t5000C500104FF387d0" would be replaced with the disk ID as reported by zpool status, and the subsequent values returned by mdb will differ from system to system.

An example of how to find the GUID of the drive:

  $>mdb -k
  > ::spa -v ! grep c0t5000C500104FF387d0
    ffffff84dc1992c0 CANT_OPEN OPEN_FAILED        /dev/dsk/c0t5000C500104FF387d0s0
  > ffffff84dc1992c0::print vdev_t vdev_guid | =E
               782195572428017554

The zpool command is:

zpool detach pool01a 782195572428017554

So far this has not been reported on Solaris.

Solaris 11: The resilver and disk replacement code was first putback into build snv_143, and this issue was fixed in Build snv_152.

Please send technical questions to:
sunalertpublication_us_grp@oracle.com
and copy the Responsible Engineer/Contributor listed

Internal Eng Business Unit Group: Systems RPE
Oracle Knowledge Analyst: david.mariotto@oracle.com
Internal Contributor/Submitter: William.D.Johnston@oracle.com
Internal Eng Responsible Engineer: William.D.Johnston@oracle.com
Internal Services Knowledge Engineer: david.mariotto@oracle.com
Internal Eng Business Unit Group: Systems RPE
Internal Pending Patches: N/A
Internal Resolution Patches:N/A
Internal Escalation IDs:
3-2595143342, 3-2741145531, 3-3042257831, 3-3203745781, 3-3216724052,
3-3225248231, 3-3259602063, 3-3287004951, 3-3314126521, 3-3332566033,
3-3332573361, 3-3337778356, 3-3348689479, 3-3354554281, 3-3379144221,
3-3396092301, 3-3409014410, 3-3409089446, 3-3420341331, 3-3424489243,
3-3424569914, 3-3425870723, 3-3428019116, 3-3433841401

References

<NOTE:2021771.1> - Oracle ZFS Storage Appliance: Software Updates

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback