Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2064305.1
Update Date:2016-02-10
Keywords:

Solution Type  Problem Resolution Sure

Solution  2064305.1 :   Oracle ZFS Storage Appliance: AK Software (and Solaris 10/11) may intermittently experience Write Hangs during Snapshot Destroys or Replication  


Related Items
  • Sun ZFS Storage 7320
  •  
  • Oracle ZFS Storage ZS3-BA
  •  
  • Oracle ZFS Storage Appliance Racked System ZS4-4
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun ZFS Storage 7420
  •  
  • Oracle ZFS Storage ZS4-4
  •  
  • Sun ZFS Storage 7120
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Applies to:

Sun ZFS Storage 7420 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7320 - Version All Versions to All Versions [Release All Releases]
Oracle ZFS Storage ZS4-4 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7120 - Version All Versions to All Versions [Release All Releases]
Oracle ZFS Storage ZS3-4 - Version All Versions to All Versions [Release All Releases]
7000 Appliance OS (Fishworks)

Symptoms

 Input / Output activity to a ZFS pool appears hang and the pool is non-responsive.  On ZFSSA arrays the array itself may appear unresponsive.

 

Changes

There is known snapshot deletion activity or there is replication activity in progress.

 

Cause

When large, but empty ZFS snapshots are destroyed, writes to the ZFS pool may hang until the snapshot destroy is complete.

This activity may be triggered by automatic snapshot deletes from replication. 

 

Further analysis may be obtained using mdb:

# mdb -k
> ::stacks -c dsl_scan_sync

and looking for stacks similar to:

              zfs`dbuf_hold_impl+0x1
              zfs`dnode_hold_impl+0xc3
              zfs`dnode_hold+0x2b
              zfs`dmu_bonus_hold+0x32
              zfs`bpobj_open+0x6d
              zfs`bpobj_iterate_impl+0x35d
              zfs`bpobj_iterate_impl+0x3ff
              zfs`bpobj_iterate_impl+0x3ff
              zfs`bpobj_iterate_impl+0x3ff
              zfs`bpobj_iterate_impl+0x3ff
              zfs`bpobj_iterate_impl+0x3ff
              zfs`bpobj_iterate_impl+0x3ff
              zfs`bpobj_iterate_impl+0x3ff
              zfs`bpobj_iterate+0x23
              zfs`dsl_scan_sync+0x11b
              zfs`spa_sync+0x447
              zfs`txg_sync_thread+0x244
              unix`thread_start+0x8

The exact number of "bpobj_iterate_impl" entries may vary greatly, but repeated runs of the command will show the spa_sync thread in bpobj_iterate for an extended period of time.

Solution

There is no workaround for this issue.

There is no method to know in advance that a given snapshot will cause this behavior when destroyed, or once started, how long the destroy will take to complete.

The resolution is to apply the fix.  Any future imports of the pool without the fix will continue the destroy. 

On a clustered ZFSSA, do not NMI the node with the problem, as it will simply cause the destroy to continue when the pool is imported on the other cluster node.

On a single node system, you can:
- boot the system to milestone=none
- on a ZFSSA, move the stash entries for disk/zfs to /var/ak/dropbox
- reboot to milestone=none
- upgrade to a version containing the fix
- import the pools once upgrade is complete



The complete procedure is out of scope of this article and support (TSC/RPE) should be contacted as needed. 

This issue is addressed in the following releases:

SPARC Platform:
    Solaris 11.3 resolution is pending, but expected to be fixed in the s11u3sru2 release.

x86 Platform:
    Solaris 11.3 resolution is pending, but expected to be fixed in the s11u3sru2 release.

ZFSSA Platform
    2013.1.4.6 (version string 2013.06.05.4.6,1-1.1) or later



 

References

<BUG:20693077> - SNAPSHOT DELETES DON'T CHECK TIMEOUT WHILE ITERATING

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback