ZFS Storage Appliance (ZFSSA) may not Bring in a Hot Spare or Create an Alert when a Disk Goes Into UNAVAILABLE State

Asset ID:	1-72-2291468.1
Update Date:	2017-07-28
Keywords:

Solution Type Problem Resolution Sure

Solution 2291468.1 : ZFS Storage Appliance (ZFSSA) may not Bring in a Hot Spare or Create an Alert when a Disk Goes Into UNAVAILABLE State

Applies to:

Sun ZFS Storage 7420 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7320 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7120 - Version All Versions to All Versions [Release All Releases]
Oracle ZFS Storage ZS4-4 - Version All Versions to All Versions [Release All Releases]
Oracle ZFS Storage ZS5-4 - Version All Versions to All Versions [Release All Releases]
7000 Appliance OS (Fishworks)

Symptoms

The ZFS filesystem will mark the disk as "Unavailable", which in turn will not create a fault, will not invoke a spare, and will leave disks as "Unavailable".

On the ZFS Storage Appliance, this condition will be seen as a degraded pool without corresponding problems or faults.

You can use the attached workflow to determine if any of your filesystems are affected by this issue. Download the file, navigate to Maintenance/Workflows in the BUI, and then click the "+" symbol to upload the workflow to the ZFSSA. Once this is done, double click the workflow in the BUI to check your system.

Changes

This problem is known to occur on software versions 8.6 (Version string ak-2013.06.05.6.x) and below.

Cause

Due to a race condition in the affected versions of the software, as described in Bug #22067574, if ZFS is unable to open or probe a disk, it may mark the disk as "Unavailable". Once this occurs, this condition will not allow the system to create a fault or invoke a spare.

Solution

Upgrade to the latest version of the ZFS appliance software. This was fixed in software version 8.7.0 (version string 2013.06.05.7.x).

For customers unable to immediately upgrade to a version which resolves this issue, contact Oracle Support for a workaround. However, it is strongly recommended to perform this upgrade as soon as possible, as the workaround will not prevent future occurrences.

Manual Recovery Procedure (for customers with affected disks who are unable to upgrade immediately).

1. Reboot (forces a re-probe) and re-check pool status.

2. Alternatively, zpool clear and replace can be used to attempt recovery.

2(a) Zpool clear (without -F or -n options) can to be used to attempt recovery by forcing a re-probe, as in the following example:

Syntax: zpool clear [-F [-n]] pool [device]

Example :

# zpool status pool0 ... mirror-4 DEGRADED 0 0 0 c4t8d0 UNAVAIL 115 39 0 c4t9d0 ONLINE 0 0 0 spares c4t11d0 AVAIL c4t6d0 AVAIL

# zpool clear pool0

The following command will initiate a re-probe and either bring the disk back online or trigger a spare:

# zpool status pool0 ... mirror-4 DEGRADED 0 0 0 spare-0 DEGRADED 0 0 0 c4t8d0 UNAVAIL 115 39 0 c4t11d0 DEGRADED 0 0 0 (resilvering) c4t9d0 ONLINE 0 0 0 spares c4t11d0 INUSE c4t6d0 AVAIL

If this doesn't initiate a re-probe then try the following :

#zpool clear -F -n pool0

which will check whether discarding last few transactions can return the pool back to openable state but won't discard any transaction.

For recovery, run without "-n" as follows :

#zpool clear -F pool0

If this also doesn't work, proceed to next step:

2(b) Run 'zpool replace' to invoke a spare manually, as in the following example :

Syntax: zpool replace [-f] pool old_device [new_device]

Example :

#zpool replace -f pool0 c4t8d0 c4t11d0

Note : Even if the workaround with zpool clear/replace is successful, for the ZFSSA platform the customer should upgrade to 8.7 at their earliest opportunity to avoid this issue in future.

Attachments

This solution has no attachment