![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 2099739.1 : Oracle ZFS Storage Appliance: Unexpected Reboot on Failback during Replication
In this Document
Created from <SR 3-12062049081> Applies to:Oracle ZFS Storage ZS3-2 - Version All Versions to All Versions [Release All Releases]Oracle ZFS Storage ZS3-4 - Version All Versions to All Versions [Release All Releases] Sun ZFS Storage 7420 - Version All Versions to All Versions [Release All Releases] Sun ZFS Storage 7320 - Version All Versions to All Versions [Release All Releases] Sun ZFS Storage 7120 - Version All Versions to All Versions [Release All Releases] 7000 Appliance OS (Fishworks) SymptomsZFS Storage may reboot/panic during a failback. No dumps collected during this issue.
LOGS: AKD service logs : PANIC: failed to export ak:/zfs/nfsmw04: cannot export 'nfsmw04': pool is busy
akd.ak.txt : Sat Jan 23 06:20:37 2016: PANIC: failed to export ak:/zfs/nfsmw04: cannot export 'nfsmw04': pool is busy
rm.ak log : Sat Jan 23 06:20:17 2016: export of ak:/ndmp/nfsmw04 succeeded in 0.062s
Sat Jan 23 06:20:18 2016: export of ak:/replication/nfsmw04 succeeded in 0.250s Sat Jan 23 06:20:18 2016: export of ak:/smb/aggr2 succeeded in 0.149s Sat Jan 23 06:20:18 2016: export of ak:/net/aggr2 succeeded in 0.005s Sat Jan 23 06:20:20 2016: export of ak:/nas/nfsmw04 succeeded in 2.380s Sat Jan 23 06:20:37 2016: [zfs export] zpool_export_force() failed in 16.894s with 12 retries
CauseThis issue is seen when a replication is ongoing and at the same time failback is initiated. The code in nas_repl_rm_pool_export() does not wait for the receive threads to finish before allowing the pool to be exported. This seems like a bug, since there is the potential for a thread to still be stuck in zfs_receive() when the export is done. This would cause the pool export to fail with EBUSY.
SolutionThe recommendation is to stop/cancel all replications before executing the takeover or failback. This kind of problem will not occur when you execute a takeover and/or reboot one head, as there is no wait to export the pool(s).
This issue is fixed in Appliance Firmware Release 2013.1.6.0
For final solution, we need to fix to be backported for <BUG:20610346>">20610346>
References<BUG:21608389> - LONGSTANDING ZFS HOLDS PREVENT POOL EXPORT<BUG:19075997> - LONGSTANDING ZFS HOLDS PREVENT POOL EXPORT <BUG:20610346> - NAS_REPL_RM_POOL_EXPORT DOES NOT WAIT FOR RECEIVE THREADS TO FINISH Attachments This solution has no attachment |
||||||||||||||||||
|