Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2099739.1
Update Date:2017-10-05
Keywords:

Solution Type  Problem Resolution Sure

Solution  2099739.1 :   Oracle ZFS Storage Appliance: Unexpected Reboot on Failback during Replication  


Related Items
  • Sun ZFS Storage 7320
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun Storage 7410 Unified Storage System
  •  
  • Sun ZFS Storage 7420
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Oracle ZFS Storage ZS4-4
  •  
  • Sun ZFS Storage 7120
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: ZS
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-12062049081>

Applies to:

Oracle ZFS Storage ZS3-2 - Version All Versions to All Versions [Release All Releases]
Oracle ZFS Storage ZS3-4 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7420 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7320 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7120 - Version All Versions to All Versions [Release All Releases]
7000 Appliance OS (Fishworks)

Symptoms

ZFS Storage may reboot/panic during a failback.

No dumps collected during this issue.

 

LOGS:

AKD service logs :

PANIC: failed to export ak:/zfs/nfsmw04: cannot export 'nfsmw04': pool is busy

 

akd.ak.txt :

Sat Jan 23 06:20:37 2016: PANIC: failed to export ak:/zfs/nfsmw04: cannot export 'nfsmw04': pool is busy

 

rm.ak log :

Sat Jan 23 06:20:17 2016: export of ak:/ndmp/nfsmw04 succeeded in 0.062s
Sat Jan 23 06:20:18 2016: export of ak:/replication/nfsmw04 succeeded in 0.250s
Sat Jan 23 06:20:18 2016: export of ak:/smb/aggr2 succeeded in 0.149s
Sat Jan 23 06:20:18 2016: export of ak:/net/aggr2 succeeded in 0.005s
Sat Jan 23 06:20:20 2016: export of ak:/nas/nfsmw04 succeeded in 2.380s
Sat Jan 23 06:20:37 2016: [zfs export] zpool_export_force() failed in 16.894s with 12 retries

 

Cause

This issue is seen when a replication is ongoing and at the same time failback is initiated.

The code in nas_repl_rm_pool_export() does not wait for the receive threads to finish before allowing the pool to be exported.

This seems like a bug, since there is the potential for a thread to still be stuck in zfs_receive() when the export is done.

This would cause the pool export to fail with EBUSY.

 

Solution

The recommendation is to stop/cancel all replications before executing the takeover or failback.

This kind of problem will not occur when you execute a takeover and/or reboot one head, as there is no wait to export the pool(s).

 

This issue is fixed in Appliance Firmware Release 2013.1.6.0

 

For final solution, we need to fix to be backported for <BUG:20610346>">20610346>

 

References

<BUG:21608389> - LONGSTANDING ZFS HOLDS PREVENT POOL EXPORT
<BUG:19075997> - LONGSTANDING ZFS HOLDS PREVENT POOL EXPORT
<BUG:20610346> - NAS_REPL_RM_POOL_EXPORT DOES NOT WAIT FOR RECEIVE THREADS TO FINISH

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback