Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2125074.1
Update Date:2017-01-25
Keywords:

Solution Type  Problem Resolution Sure

Solution  2125074.1 :   Oracle ZFS Storage Appliance: QLT port flapping at the end of resilver causes VMWARE clients to disconnect  


Related Items
  • Sun ZFS Storage 7420
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun ZFS Storage 7120
  •  
  • Sun ZFS Storage 7320
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-11217856511>

Applies to:

Oracle ZFS Storage ZS3-4 - Version All Versions and later
Sun ZFS Storage 7420 - Version All Versions and later
Oracle ZFS Storage ZS3-2 - Version All Versions and later
Sun ZFS Storage 7120 - Version All Versions and later
Sun ZFS Storage 7320 - Version All Versions and later
7000 Appliance OS (Fishworks)

Symptoms

At the end of a resilver process, customers will notice fiber channel connectivity issues which causes their Virtual Machines to go down.

Alerts are seen similar to the following:

2015-8-18 09:49:40 Show alert details8840dbae-3b03-ee6b-eb2b-b30e6c7b645e The ZFS pool 'pool-0' has finished resilvering. Minor Alert
2015-8-18 09:49:36 Show alert detailsd184f76b-950c-e672-eaac-dc0f2514bea1 Fibre Channel connectivity via port 21:00:00:24:ff:3b:73:7e (PCIe 0: Port 1) has been lost. Major alert
2015-8-18 09:46:45 Show alert details52a048dd-cae3-489a-a5a3-ab7c5be1a17b Fibre Channel connectivity via port 21:00:00:24:ff:3e:19:c7 (PCIe 5: Port 2) has been lost.

2015-8-12 03:54:34 Show alert details3cf97282-9ee8-4201-d327-a8db85981225 Fibre Channel connectivity via port 21:00:00:24:ff:3e:18:c4 (PCIe 0: Port 1) has been established. Minor alert
2015-8-12 03:54:31 Show alert detailsac7fb417-89af-6d01-92f2-bed9624dd842 Fibre Channel connectivity via port 21:00:00:24:ff:3e:19:d5 (PCIe 9: Port 2) has been established. Minor alert
2015-8-12 03:54:21 Show alert details383ab386-711a-4af7-97b4-90d187189dad The ZFS pool 'pool-1' has finished resilvering. Minor Alert
2015-8-12 03:53:38 Show alert detailscd9f1e88-12f7-ce77-e48d-bfba602722e8 Fibre Channel connectivity via port 21:00:00:24:ff:3e:19:d5 (PCIe 9: Port 2) has been lost. Major alert
2015-8-12 03:53:37 Show alert detailsef703072-4df3-c71d-b1e0-b503c7f92430 Fibre Channel connectivity via port 21:00:00:24:ff:3e:18:c4 (PCIe 0: Port 1) has been lost

 

 

Cause

 At the end of a resilver during the final cleanup there is a very short port flapping issue that causes VMWARE clients to disconnect.

Solution

 This is a known issue and a code fix is available. Please do the following in order to confirm you have run into this bug.

  1. - If you are running ak code 2013.1.4.9 and above: the qlt logs are included in the Support Bundle. Move to step 2.
    - If you are running anything below 2013.1.4.9: run the attached workflow to collect additional qlt driver logs which is put into dropbox.
  2. Collect a Support Bundle which will include the qlt logs.
  3. Contact Oracle Support to open a new Service Request who will verify the next steps.

 

The fix for this issue is available in Appliance Firmware Release OS8.6.0 / 2013.1.6.0 (or later)

 

The fix for Bug 22599649 changed how the Fibre ports respond to the hosts from busy to queue full which increases the time before the clients timeout.

The attached workflow will collect the qlt firmware dump and qlt logs which can help verify if this bug was hit.

 

Alerts will show FC ports going down/up after a resilver:

Tue Aug 18 16:46:45 2015
nvlist version: 0
class = alert.ak.appliance.nas.fc.port.down
source = svc:/appliance/kit/akd:default
slot_label = PCIe 5
port_name = Port 2
port_wwn = 21:00:00:24:ff:3e:19:c7
uuid = 52a048dd-cae3-489a-a5a3-ab7c5be1a17b
link =

Tue Aug 18 16:49:36 2015
nvlist version: 0
class = alert.ak.appliance.nas.fc.port.down
source = svc:/appliance/kit/akd:default
slot_label = PCIe 0
port_name = Port 1
port_wwn = 21:00:00:24:ff:3b:73:7e
uuid = d184f76b-950c-e672-eaac-dc0f2514bea1
link =

Tue Aug 18 16:49:40 2015
nvlist version: 0
class = alert.fs.zfs.pool.resilver.finish
source = svc:/appliance/kit/akd:default
zpool_name = pool-0
zpool_guid = 3938589240219173693
link = 148cea48-a528-6843-a093-8e18c20248ec
uuid = 8840dbae-3b03-ee6b-eb2b-b30e6c7b645e

Tue Aug 18 16:49:51 2015
nvlist version: 0
class = alert.ak.appliance.nas.fc.port.up
source = svc:/appliance/kit/akd:default
slot_label = PCIe 0
port_name = Port 1
port_wwn = 21:00:00:24:ff:3b:73:7e
uuid = 4951e841-710e-eecf-9330-a40e4047c502
link =

Tue Aug 18 16:49:52 2015
nvlist version: 0
class = alert.ak.appliance.nas.fc.port.up
source = svc:/appliance/kit/akd:default
slot_label = PCIe 5
port_name = Port 2
port_wwn = 21:00:00:24:ff:3e:19:c7
uuid = 58bfcefe-7cfa-c1f5-9dbe-9086dd8d886f
link =

 

Similar link up/down in the qlt trace:

Feb 11 22:44:14 ATL-ZFS-1 qlt: [ID 882656 kern.notice] NOTICE: qlt0: LINK DOWN, pid(EF), topgy(2h) speed(8h)
Feb 11 22:44:14 ATL-ZFS-1 fct: [ID 580862 kern.notice] NOTICE: qlt0,0 LINK DOWN, portid ef, topology Private Loop,speed 8G
Feb 11 22:47:14 ATL-ZFS-1 fct: [ID 469330 kern.notice] NOTICE: qlt0,0 LINK UP, portid ef, topology Private Loop, speed 8G
Feb 11 23:28:11 ATL-ZFS-1 qlt: [ID 882656 kern.notice] NOTICE: qlt1: LINK DOWN, pid(EF), topgy(2h) speed(8h)
Feb 11 23:28:11 ATL-ZFS-1 fct: [ID 580862 kern.notice] NOTICE: qlt1,0 LINK DOWN, portid ef, topology Private Loop,speed 8G
Feb 11 23:31:03 ATL-ZFS-1 fct: [ID 469330 kern.notice] NOTICE: qlt1,0 LINK UP, portid ef, topology Private Loop, speed 8G
Feb 12 00:50:07 ATL-ZFS-1 qlt: [ID 882656 kern.notice] NOTICE: qlt0: LINK DOWN, pid(EF), topgy(2h) speed(8h)
Feb 12 00:50:07 ATL-ZFS-1 fct: [ID 580862 kern.notice] NOTICE: qlt0,0 LINK DOWN, portid ef, topology Private Loop,speed 8G
Feb 12 00:50:49 ATL-ZFS-1 fct: [ID 469330 kern.notice] NOTICE: qlt0,0 LINK UP, portid ef, topology Private Loop, speed 8G
Feb 12 00:53:12 ATL-ZFS-1 qlt: [ID 882656 kern.notice] NOTICE: qlt0: LINK DOWN, pid(EF), topgy(2h) speed(8h)
Feb 12 00:53:12 ATL-ZFS-1 fct: [ID 580862 kern.notice] NOTICE: qlt0,0 LINK DOWN, portid ef, topology Private Loop,speed 8G
Feb 12 00:55:14 ATL-ZFS-1 fct: [ID 469330 kern.notice] NOTICE: qlt0,0 LINK UP, portid ef, topology Private Loop, speed 8G

 

References

<BUG:22080255> - ZFSSA BACKEND "RESILVERING" CAUSES FC FRONT END PORT INACCESIBLE TO REMOTE HOST
<BUG:21787694> - BACKPORT BUG 22518671 TO AK-2013-REL
<BUG:22599649> - QLT PORT FLAPPING AFTER RESILVERING COMPLETION DUE TO EXCHG NOT BEING TERMINATED
<BUG:20639544> - ADD QLT LOGS IN AK BUNDLE
<BUG:21071219> - COMSTAR FRAMEWORK SHOULD ALLOW A CMD TO ABORT PRIOR TO ZFS I/O COMPLETION

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback