![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 2081402.1 : SR-IOV - Virtual FC HBA Lost on Guest LDOM when IO Root Domain Reboot
In this Document
Created from <SR 3-11147799001> Applies to:Fujitsu M10-4 - Version All Versions and laterQlogic FC HBA - Version All Versions and later Emulex FC HBA - Version All Versions and later Solaris Operating System - Version 10 3/05 and later Information in this document applies to any platform. SymptomsThis is a Oracle Solaris 11.2.11.5.0 guest domain with two virtual FC HBAs ------------------------------------------------------------------------------
NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME guest01 active -n---- 5001 4 80G 26% 26% 39m IO DEVICE PSEUDONYM OPTIONS two FC virtual functions: pci@8200/pci@4/pci@0/pci@11/SUNW,qlc@0,2 /BB0/PCI6/IOVFC.PF0.VF0 --> from IO root domain rootio01 pci@8100/pci@4/pci@0/pci@0/pci@0/pci@0/pci@0/pci@1/pci@0/pci@10/pci@0/pci@0/SUNW,qlc@0,2 /BB0/PCI0/SLOT4/IOVFC.PF0.VF0 --> from control domain
------------------------------------------------------------------------------
NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME rootio01 active -n--v- 5000 6 16G 0.1% 0.1% 15m IO DEVICE PSEUDONYM OPTIONS pci@8200/pci@4/pci@0/pci@11 /BB0/PCI6
Jul 19 02:56:44 guest01 qlc: [ID 139792 kern.info] Qlogic qlc(0) FCA Driver v141204-5.03
Jul 19 02:56:44 guest01 qlc: [ID 139792 kern.info] Qlogic qlc(1) FCA Driver v141204-5.03 Jul 19 02:56:44 guest01 qlc: [ID 279254 kern.info] Qlogic qlc(1,0,2) WWPN=100000144ffc0101 : WWNN=200000144ffc0101 Jul 19 02:56:44 guest01 qlc: [ID 279254 kern.info] Qlogic qlc(0,0,2) WWPN=100000144ffc0100 : WWNN=200000144ffc0100 Jul 19 02:56:45 guest01 qlc: [ID 336086 kern.info] NOTICE: Qlogic qlc(0,0,2): Link ONLINE Jul 19 02:56:45 guest01 qlc: [ID 874193 kern.info] NOTICE: Qlogic qlc(0): Firmware version 7.05.01, device id 2431 Jul 19 02:56:45 guest01 pcieb: [ID 586369 kern.info] PCIE-device: SUNW,qlc@0,2, qlc0 Jul 19 02:56:45 guest01 genunix: [ID 936769 kern.info] qlc0 is /pci@8200/pci@4/pci@0/pci@11/SUNW,qlc@0,2 Jul 19 02:56:45 guest01 qlc: [ID 336086 kern.info] NOTICE: Qlogic qlc(1,0,2): Link ONLINE Jul 19 02:56:45 guest01 qlc: [ID 874193 kern.info] NOTICE: Qlogic qlc(1): Firmware version 7.05.01, device id 2431 Jul 19 02:56:45 guest01 pcieb: [ID 586369 kern.info] PCIE-device: SUNW,qlc@0,2, qlc1 Jul 19 02:56:45 guest01 genunix: [ID 936769 kern.info] qlc1 is /pci@8100/pci@4/pci@0/pci@0/pci@0/pci@0/pci@0/pci@1/pci@0/pci@10/pci@0/pci@0/SUNW,qlc@0,2 Jul 19 02:56:45 guest01 genunix: [ID 936769 kern.info] fp0 is /pci@8200/pci@4/pci@0/pci@11/SUNW,qlc@0,2/fp@0,0 Jul 19 02:56:45 guest01 genunix: [ID 936769 kern.info] fp2 is /pci@8100/pci@4/pci@0/pci@0/pci@0/pci@0/pci@0/pci@1/pci@0/pci@10/pci@0/pci@0/SUNW,qlc@0,2/fp@0,0
Jul 19 02:56:45 guest01 scsi: [ID 583861 kern.info] ssd0 at scsi_vhci0: unit-address g60060e801666500000016650000060df: f_sym
Jul 19 02:56:45 guest01 genunix: [ID 936769 kern.info] ssd0 is /scsi_vhci/ssd@g60060e801666500000016650000060df Jul 19 02:56:45 guest01 genunix: [ID 408114 kern.info] /scsi_vhci/ssd@g60060e801666500000016650000060df (ssd0) online Jul 19 02:56:45 guest01 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g60060e801666500000016650000060df (ssd0) multipath status: degraded: path 1 fp0/ssd@w50060e801666500b,d is online Jul 19 02:56:45 guest01 scsi: [ID 583861 kern.info] ssd1 at scsi_vhci0: unit-address g60060e801666500000016650000060de: f_sym Jul 19 02:56:45 guest01 genunix: [ID 936769 kern.info] ssd1 is /scsi_vhci/ssd@g60060e801666500000016650000060de Jul 19 02:56:45 guest01 genunix: [ID 408114 kern.info] /scsi_vhci/ssd@g60060e801666500000016650000060de (ssd1) online Jul 19 02:56:45 guest01 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g60060e801666500000016650000060de (ssd1) multipath status: degraded: path 2 fp0/ssd@w50060e801666500b,c is online
Jul 19 03:18:20 guest01 pcie: [ID 297812 kern.info] NOTICE: Live Suspend: port pci.0,80: child dev igbvf#0(4000d1ceee8) and descendants
Jul 19 03:18:20 guest01 mac: [ID 486395 kern.info] NOTICE: igbvf0 link down Jul 19 03:18:20 guest01 pcie: [ID 286789 kern.info] NOTICE: Live Suspend: igbvf0 suspended successfully Jul 19 03:18:20 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8200/pci@4/pci@0/pci@8/network@0,80, Reason: root domain is rebooting, Action: Hotplug LSR Suspend, Result: success, Current state: suspended Jul 19 03:18:20 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8200/pci@4/pci@0/pci@8, Reason: root domain is rebooting, Action: DDI Suspend, Result: success, Current state: suspended Jul 19 03:18:20 guest01 pcie: [ID 297812 kern.info] NOTICE: Live Suspend: port pci.0,2: child dev qlc#0(4000d1ce618) and descendants Jul 19 03:18:20 guest01 genunix: [ID 634673 kern.info] NOTICE: ddihp_lsr_suspend_branch: 4000d1ce038:fp#-1 is offline Jul 19 03:18:20 guest01 pcie: [ID 286789 kern.info] NOTICE: Live Suspend: qlc0 suspended successfully Jul 19 03:18:20 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8200/pci@4/pci@0/pci@11/SUNW,qlc@0,2, Reason: root domain is rebooting, Action: Hotplug LSR Suspend, Result: success, Current state: suspended Jul 19 03:18:20 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8200/pci@4/pci@0/pci@11, Reason: root domain is rebooting, Action: DDI Suspend, Result: success, Current state: suspended Jul 19 03:18:20 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8200/pci@4/pci@0, Reason: root domain is rebooting, Action: DDI Suspend, Result: success, Current state: suspended Jul 19 03:18:20 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8200/pci@4, Reason: root domain is rebooting, Action: DDI Suspend, Result: success, Current state: suspended Jul 19 03:18:20 guest01 in.mpathd[88]: [ID 215189 daemon.error] The link has gone down on net0 Jul 19 03:18:20 guest01 in.mpathd[88]: [ID 968981 daemon.error] IP interface failure detected on net0 of group PUB_SVC_ipmp0 Jul 19 03:18:20 guest01 pseudo: [ID 129642 kern.info] pseudo-device: mdesc0 Jul 19 03:18:20 guest01 genunix: [ID 936769 kern.info] mdesc0 is /pseudo/mdesc@0 Jul 19 03:18:20 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8200, Reason: root domain is rebooting, Action: DDI Suspend, Result: success, Current state: suspended Jul 19 03:18:21 guest01 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g60060e801666500000016650000060df (ssd0) multipath status: degraded: path 1 fp0/ssd@w50060e801666500b,d is offline Jul 19 03:18:21 guest01 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g60060e801666500000016650000060de (ssd1) multipath status: degraded: path 2 fp0/ssd@w50060e801666500b,c is offline ... Jul 19 03:18:21 guest01 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g60060e801666500000016650000060d3 (ssd12) multipath status: degraded: path 13 fp0/ssd@w50060e801666500b,1 is offline Jul 19 03:18:21 guest01 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g60060e801666500000016650000060d2 (ssd13) multipath status: degraded: path 14 fp0/ssd@w50060e801666500b,0 is offline Jul 19 03:18:21 guest01 genunix: [ID 408114 kern.info] /pci@8200/pci@4/pci@0/pci@11/SUNW,qlc@0,2/fp@0,0 (fp0) offline Jul 19 03:18:21 guest01 genunix: [ID 408114 kern.info] /pci@8200/pci@4/pci@0/pci@11/SUNW,qlc@0,2 (qlc0) offline Jul 19 03:18:21 guest01 pcieb: [ID 586369 kern.info] PCIE-device: SUNW,qlc@0,2, qlc0
Jul 19 03:22:10 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8200, Reason: root domain is booted, Action: DDI Resume, Result: success, Current state: online
Jul 19 03:22:10 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8200/pci@4, Reason: root domain is booted, Action: DDI Resume, Result: success, Current state: online Jul 19 03:22:10 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8200/pci@4/pci@0, Reason: root domain is booted, Action: DDI Resume, Result: success, Current state: online Jul 19 03:22:10 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8200/pci@4/pci@0/pci@11, Reason: root domain is booted, Action: DDI Resume, Result: success, Current state: online Jul 19 03:22:10 guest01 qlc: [ID 139792 kern.info] Qlogic qlc(0) FCA Driver v141204-5.03 Jul 19 03:22:10 guest01 qlc: [ID 279254 kern.info] Qlogic qlc(0,0,2) WWPN=100000144ffc0100 : WWNN=200000144ffc0100 Jul 19 03:22:10 guest01 qlc: [ID 336086 kern.info] NOTICE: Qlogic qlc(0,0,2): Link ONLINE Jul 19 03:22:10 guest01 qlc: [ID 874193 kern.info] NOTICE: Qlogic qlc(0): Firmware version 7.05.01, device id 2431 Jul 19 03:22:10 guest01 pcieb: [ID 586369 kern.info] PCIE-device: SUNW,qlc@0,2, qlc0 Jul 19 03:22:10 guest01 genunix: [ID 936769 kern.info] qlc0 is /pci@8200/pci@4/pci@0/pci@11/SUNW,qlc@0,2 Jul 19 03:22:10 guest01 genunix: [ID 408114 kern.info] /pci@8200/pci@4/pci@0/pci@11/SUNW,qlc@0,2 (qlc0) online Jul 19 03:22:11 guest01 fcsm: [ID 517869 kern.info] NOTICE: fcsm(0): attached to path /pci@8200/pci@4/pci@0/pci@11/SUNW,qlc@0,2/fp@0,0 Jul 19 03:22:11 guest01 genunix: [ID 936769 kern.info] fp0 is /pci@8200/pci@4/pci@0/pci@11/SUNW,qlc@0,2/fp@0,0 Jul 19 03:22:11 guest01 genunix: [ID 408114 kern.info] /pci@8200/pci@4/pci@0/pci@11/SUNW,qlc@0,2/fp@0,0 (fp0) online Jul 19 03:22:11 guest01 genunix: [ID 530209 kern.info] /scsi_vhci/ssd@g60060e801666500000016650000060df (ssd0) multipath status: optimal: path 1 fp0/ssd@w50060e801666500b,d is online: Load balancing: round-robin Jul 19 03:22:11 guest01 genunix: [ID 530209 kern.info] /scsi_vhci/ssd@g60060e801666500000016650000060de (ssd1) multipath status: optimal: path 2 fp0/ssd@w50060e801666500b,c is online: Load balancing: round-robin ... Jul 19 03:22:11 guest01 genunix: [ID 530209 kern.info] /scsi_vhci/ssd@g60060e801666500000016650000060d2 (ssd13) multipath status: optimal: path 14 fp0/ssd@w50060e801666500b,0 is online: Load balancing: round-robin Jul 19 03:22:11 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8200/pci@4/pci@0/pci@11/SUNW,qlc@0,2, Reason: root domain is booted, Action: Hotplug LSR Resume, Result: success, Current state: online
3) With no other explanation, the FC HBA goes offline again (not expected at "Jul 19 03:22:25") Jul 19 03:22:25 guest01 pcie: [ID 297812 kern.info] NOTICE: Live Suspend: port pci.0,2: child dev qlc#0(4000d1ce618) and descendants
Jul 19 03:22:25 guest01 genunix: [ID 634673 kern.info] NOTICE: ddihp_lsr_suspend_branch: 4000d1ce038:fp#-1 is offline Jul 19 03:22:25 guest01 pcie: [ID 286789 kern.info] NOTICE: Live Suspend: qlc0 suspended successfully Jul 19 03:22:25 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8200/pci@4/pci@0/pci@11/SUNW,qlc@0,2, Reason: root domain is rebooting, Action: Hotplug LSR Suspend, Result: success, Current state: suspended Jul 19 03:22:25 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8200/pci@4/pci@0/pci@11, Reason: root domain is rebooting, Action: DDI Suspend, Result: success, Current state: suspended Jul 19 03:22:25 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8200/pci@4/pci@0, Reason: root domain is rebooting, Action: DDI Suspend, Result: success, Current state: suspended Jul 19 03:22:25 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8200/pci@4, Reason: root domain is rebooting, Action: DDI Suspend, Result: success, Current state: suspended Jul 19 03:22:25 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8200, Reason: root domain is rebooting, Action: DDI Suspend, Result: success, Current state: suspended Jul 19 03:22:25 guest01 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g60060e801666500000016650000060df (ssd0) multipath status: degraded: path 1 fp0/ssd@w50060e801666500b,d is offline Jul 19 03:22:25 guest01 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g60060e801666500000016650000060de (ssd1) multipath status: degraded: path 2 fp0/ssd@w50060e801666500b,c is offline ... Jul 19 03:22:25 guest01 genunix: [ID 483743 kern.info] /scsi_vhci/ssd@g60060e801666500000016650000060d2 (ssd13) multipath status: degraded: path 14 fp0/ssd@w50060e801666500b,0 is offline Jul 19 03:22:25 guest01 genunix: [ID 408114 kern.info] /pci@8200/pci@4/pci@0/pci@11/SUNW,qlc@0,2/fp@0,0 (fp0) offline Jul 19 03:22:25 guest01 genunix: [ID 408114 kern.info] /pci@8200/pci@4/pci@0/pci@11/SUNW,qlc@0,2 (qlc0) offline Jul 19 03:22:25 guest01 pcieb: [ID 586369 kern.info] PCIE-device: SUNW,qlc@0,2, qlc0
After that, qlc0 FC HBA is missing from luxadm on the guest domain: # luxadm -e port
/devices/pci@8100/pci@4/pci@0/pci@0/pci@0/pci@0/pci@0/pci@1/pci@0/pci@10/pci@0/pci@0/SUNW,qlc@0,2/fp@0,0:devctl CONNECTED #
Note. When the problem is present (missing FC HBA), we can workaround the problem by rebooting the guest domain. The luxadm command will show both FC HBAs after rebooting the guest domain.
ChangesThe addition or removal of a nic driver on the guest ldom configuration changes the behavior, so this points to something in the ior/sriov framework code, not the qlc driver. bash-4.1$ grep "Current state" messages |egrep "qlc|network" Aug 13 01:08:06 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8100/pci@4/pci@0/pci@1/SUNW,qlc@0,2, Reason: root domain is rebooting, Action: Hotplug LSR Suspend, Result: success, Current state: suspended
Aug 13 01:09:08 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8100/pci@4/pci@0/pci@1/SUNW,qlc@0,2, Reason: root domain is booted, Action: Hotplug LSR Resume, Result: success, Current state: online
Aug 13 01:13:12 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8100/pci@4/pci@0/pci@1/SUNW,qlc@0,2, Reason: root domain is rebooting, Action: Hotplug LSR Suspend, Result: success, Current state: suspended
Aug 13 01:13:12 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8100/pci@4/pci@0/pci@1/SUNW,qlc@0,12, Reason: root domain is rebooting, Action: Hotplug LSR Suspend, Result: success, Current state: suspended Aug 13 01:14:16 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8100/pci@4/pci@0/pci@1/SUNW,qlc@0,2, Reason: root domain is booted, Action: Hotplug LSR Resume, Result: success, Current state: online Aug 13 01:14:26 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8100/pci@4/pci@0/pci@1/SUNW,qlc@0,12, Reason: root domain is booted, Action: Hotplug LSR Resume, Result: success, Current state: online
Aug 13 01:17:11 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8100/pci@4/pci@0/pci@1/SUNW,qlc@0,2, Reason: root domain is rebooting, Action: Hotplug LSR Suspend, Result: success, Current state: suspended
Aug 13 01:17:11 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8100/pci@4/pci@0/pci@0/network@0,81, Reason: root domain is rebooting, Action: Hotplug LSR Suspend, Result: success, Current state: suspended Aug 13 01:18:17 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8100/pci@4/pci@0/pci@1/SUNW,qlc@0,2, Reason: root domain is booted, Action: Hotplug LSR Resume, Result: success, Current state: online Aug 13 01:18:17 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8100/pci@4/pci@0/pci@0/network@0,81, Reason: root domain is booted, Action: Hotplug LSR Resume, Result: success, Current state: online
Aug 13 01:20:21 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8100/pci@4/pci@0/pci@1/SUNW,qlc@0,2, Reason: root domain is rebooting, Action: Hotplug LSR Suspend, Result: success, Current state: suspended
Aug 13 01:20:21 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8100/pci@4/pci@0/pci@0/network@0,81, Reason: root domain is rebooting, Action: Hotplug LSR Suspend, Result: success, Current state: suspended Aug 13 01:20:21 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8100/pci@4/pci@0/pci@9/network@0,80, Reason: root domain is rebooting, Action: Hotplug LSR Suspend, Result: success, Current state: suspended Aug 13 01:21:26 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8100/pci@4/pci@0/pci@1/SUNW,qlc@0,2, Reason: root domain is booted, Action: Hotplug LSR Resume, Result: success, Current state: online Aug 13 01:21:26 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8100/pci@4/pci@0/pci@1/SUNW,qlc@0,2, Reason: root domain is rebooting, Action: Hotplug LSR Suspend, Result: success, Current state: suspended <--!! Aug 13 01:21:26 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8100/pci@4/pci@0/pci@0/network@0,81, Reason: root domain is booted, Action: Hotplug LSR Resume, Result: success, Current state: online Aug 13 01:21:26 guest01 pcie: [ID 486281 kern.info] NOTICE: IOR dev:////pci@8100/pci@4/pci@0/pci@9/network@0,80, Reason: root domain is booted, Action: Hotplug LSR Resume, Result: success, Current state: online CauseThis is not a FC HBA issue, but a virtualization issue, we are facing --> Fix delivered in Oracle Solaris 11.3.0.30.0 (or greater) SolutionInstall Solaris 11.3 (or greater) , see:
Oracle Solaris 11.3 Support Repository Updates (SRU) Index (Doc ID 2045311.1) See also this related document:
References<BUG:21284271> - FAILED TO RESUME ONE OR MORE VFS WITH MIXED DEVICE CLASS CONFIGURATIONS<NOTE:1970596.1> - Guest/IO domain using SRIOV or DIO - expected behaviour when rebooting RootIOdomain/Primary domain <NOTE:2045311.1> - Oracle Solaris 11.3 Support Repository Updates (SRU) Index <BUG:21543339> - M10-4 /SR-IOV FC DEVICE LOST WHEN IO DOMAIN REBOOT Attachments This solution has no attachment |
||||||||||||||||||||
|