Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2003660.1
Update Date:2018-01-05
Keywords:

Solution Type  Problem Resolution Sure

Solution  2003660.1 :   Oracle ZFS Storage Appliance: Solaris client ZFS pool (constructed from FC LUNs exported from ZFS-SA) becomes suspended due to appliance takeover.  


Related Items
  • Sun ZFS Storage 7320
  •  
  • Sun Storage 7210 Unified Storage System
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Sun Storage 7410 Unified Storage System
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun ZFS Storage 7420
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Sun ZFS Storage 7120
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-10406486941>

Applies to:

Sun Storage 7410 Unified Storage System - Version All Versions to All Versions [Release All Releases]
Sun Storage 7310 Unified Storage System - Version All Versions to All Versions [Release All Releases]
Sun Storage 7210 Unified Storage System - Version All Versions to All Versions [Release All Releases]
Oracle ZFS Storage ZS3-2 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7320 - Version All Versions to All Versions [Release All Releases]
7000 Appliance OS (Fishworks)

Symptoms

Solaris 11.2 client with STMS/MPXIO configured reporting zpool 'suspended' when there is a takeover on the ZFS appliance.

Fibre Channel (FC) LUNs are mirrored by ZFS on the Solaris client.

# zpool status data01
   pool: data01
  state: SUSPENDED
 status: One or more devices are unavailable in response to IO failures.
         The pool is suspended.
 action: Make sure the affected devices are connected, then run 'zpool clear'
 or
         'fmadm repaired'.
         Run 'zpool status -v' to see device specific details.
    see: http://support.oracle.com/msg/ZFS-8000-HC
   scan: resilvered 5.57M in 0h0m with 0 errors on Tue Mar 24 17:28:47 2015
 config:
 .
         NAME                                       STATE     READ WRITE CKSUM
         data01                                     SUSPENDED     0   110    0
           mirror-0                                 ONLINE       0   130     0
             c0t600144F0B97C139B00005510F3350002d0  ONLINE       0   140     0
             c0t600144F0D232395600005510F2A90001d0  ONLINE       0   138     0

 

The ZFS appliance shows a short takeover/failback time.

 

The ZFS-SA exported FC LUNs are configured correctly with a target and host group configured.

Solaris client FMA shows both side of mirrored luns with probe failure before IO got suspended.

Mar 24 17:19:13 ZFS-8000-NX    fault.fs.zfs.vdev.probe_failure  600144f0b97c139b00005510f3350002  <<--
Mar 24 17:19:13 ZFS-8000-FD    fault.fs.zfs.vdev.io  600144f0b97c139b00005510f3350002
Mar 24 17:19:14 ZFS-8000-NX    fault.fs.zfs.vdev.probe_failure n600144f0d232395600005510f2a90001  <<--
Mar 24 17:19:15 ZFS-8000-FD    fault.fs.zfs.vdev.io   n600144f0d232395600005510f2a90001
Mar 24 17:31:27 ZFS-8000-8A    fault.fs.zfs.object.corrupt_data pool_name=data01
Mar 24 17:31:29 ZFS-8000-HC    fault.fs.zfs.io_failure_wait pool_name=data01 <<-- suspended I/O

 

The 'rm.ak' and 'debug.sys'  logs show

Tue Mar 24 06:19:09 2015: takeover completed in 4.107s 
Mar 24 06:19:10 BRSUA2-SAN-HEAD02 fct: [ID 469330 kern.notice] NOTICE: qlt0,0  LINK UP, portid ef, topology Private Loop, speed 8G.
Tue Mar 24 06:27:58 2015: ak_rm_fail_back phase 1 complete in 2.997s
Tue Mar 24 06:28:03 2015: ak_rm_fail_back phase 2 complete in 4.706s
Mar 24 06:28:04 brsua2-san-head01 fct: [ID 469330 kern.notice] NOTICE: qlt0,0  LINK UP, portid ef, topology Private Loop, speed 8G.

 

Changes

 FC directly connected to ZFS appliance without an FC switch

 

Cause

Connectivity options: Point-to-Point (FC-P2P) and switch attach (FC-SW) connectivity is supported unless where noted specifically.

No support is provided for arbitrated loop (FC-AL) connectivity.

 

Solution

FC direct connection supportability is available in

https://stbeehive.oracle.com/teamcollab/wiki/ZFSSA+Interop:ZFSSA+Interoperability+Testing+Matrix+-+2013.1.3.0#Fiber+Channel

Connectivity options: Point-to-Point (FC-P2P) and switch attach (FC-SW) connectivity is supported unless where noted specifically. No support is provided for arbitrated loop (FC-AL) connectivity.

 

16Gb Qlogic FC HBA indicates no support for 16Gb FC-AL connection

Topologies supported: FC-SW switched fabric (N_Port), FC-AL arbitrated loop (not supported at 16 Gb) (NL_ Port), and Point-to-point (N_Port)

http://docs.oracle.com/cd/E24651_01/html/E24460/z40003111016271.html#scrolltoc

 

In this case, the Solaris initiator should be forced to use Fibre Channel Point-to-Point (FC-P2P).

          Set connection-options=1 in /kernel/drv/qlc.conf

 

I/O error should be issued only after a appropriate timeout to cover port flaps.

 

Update Solaris client to minimum SRU 11.2.9.5.0

Workaround and best practise is to use FC switches.

References

<BUG:20802234> - LUNS PRESENTED TO SOLARIS CLIENT BECOME SUSPENDED DURING ZFS APPLIANCE TAKEOVER
<NOTE:1434184.1> - Sun Storage 7000 Unified Storage System: How to Troubleshoot Fibre-Channel Problems
<NOTE:1672221.1> - Oracle Solaris 11.2 Support Repository Updates (SRU) Index
http://www.oracle.com/technetwork/server-storage/sun-unified-storage/documentation/o12-019-fclun-7000-rs-1559284.pdf
<NOTE:1402545.1> - Sun Storage 7000 Unified Storage System: How to Troubleshoot Cluster Problems
<BUG:18969626> - I/O STOPS WHEN OTHER PATH PULLED OUT AND INSERTED AFTER A PATH IS DEGRADED.

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback