![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 2064305.1 : Oracle ZFS Storage Appliance: AK Software (and Solaris 10/11) may intermittently experience Write Hangs during Snapshot Destroys or Replication
In this Document
Applies to:Sun ZFS Storage 7420 - Version All Versions to All Versions [Release All Releases]Sun ZFS Storage 7320 - Version All Versions to All Versions [Release All Releases] Oracle ZFS Storage ZS4-4 - Version All Versions to All Versions [Release All Releases] Sun ZFS Storage 7120 - Version All Versions to All Versions [Release All Releases] Oracle ZFS Storage ZS3-4 - Version All Versions to All Versions [Release All Releases] 7000 Appliance OS (Fishworks) SymptomsInput / Output activity to a ZFS pool appears hang and the pool is non-responsive. On ZFSSA arrays the array itself may appear unresponsive.
ChangesThere is known snapshot deletion activity or there is replication activity in progress.
CauseWhen large, but empty ZFS snapshots are destroyed, writes to the ZFS pool may hang until the snapshot destroy is complete. This activity may be triggered by automatic snapshot deletes from replication.
Further analysis may be obtained using mdb: # mdb -k
> ::stacks -c dsl_scan_sync and looking for stacks similar to: zfs`dbuf_hold_impl+0x1 zfs`dnode_hold_impl+0xc3 zfs`dnode_hold+0x2b zfs`dmu_bonus_hold+0x32 zfs`bpobj_open+0x6d zfs`bpobj_iterate_impl+0x35d zfs`bpobj_iterate_impl+0x3ff zfs`bpobj_iterate_impl+0x3ff zfs`bpobj_iterate_impl+0x3ff zfs`bpobj_iterate_impl+0x3ff zfs`bpobj_iterate_impl+0x3ff zfs`bpobj_iterate_impl+0x3ff zfs`bpobj_iterate_impl+0x3ff zfs`bpobj_iterate+0x23 zfs`dsl_scan_sync+0x11b zfs`spa_sync+0x447 zfs`txg_sync_thread+0x244 unix`thread_start+0x8 The exact number of "bpobj_iterate_impl" entries may vary greatly, but repeated runs of the command will show the spa_sync thread in bpobj_iterate for an extended period of time. SolutionThere is no workaround for this issue. There is no method to know in advance that a given snapshot will cause this behavior when destroyed, or once started, how long the destroy will take to complete. The resolution is to apply the fix. Any future imports of the pool without the fix will continue the destroy. On a clustered ZFSSA, do not NMI the node with the problem, as it will simply cause the destroy to continue when the pool is imported on the other cluster node. On a single node system, you can:
This issue is addressed in the following releases:
References<BUG:20693077> - SNAPSHOT DELETES DON'T CHECK TIMEOUT WHILE ITERATINGAttachments This solution has no attachment |
||||||||||||||||||||
|