Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2327745.1
Update Date:2017-11-16
Keywords:

Solution Type  Problem Resolution Sure

Solution  2327745.1 :   Missing Disk Paths To Solaris 11 LDM Primary Control Domain  


Related Items
  • Oracle Fabric Interconnect F1-15
  •  
Related Categories
  • PLA-Support>Sun Systems>SAND>Network>SN-SND: Oracle Virtual Networking
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-16085407831>

Applies to:

Oracle Fabric Interconnect F1-15 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

Missing new disks on PRI LDOM that the SEC LDOM can see.

- Storage is HDS --> Fibre Channel Switches.
- FC switches connect to a pair of ORCL F1-15 (Xsigo) switches.
- Each F1-15 has a pair of InfiniBand interfaces going to the T5-4. (Total of 4 InfiniBand paths)
- F1-15 has multiple vHBAs going to the T5-4, encapsulated over the InfiniBand links.
- T5-4 is set up for LDOMs, with redundant primary and secondary control domains.
- Each FC target should have 4 paths to the T5; 2 paths for the pri control domain, 2 paths for the sec control domain.
------
- On the Xsigos,  can see all paths to some newly added disks exist, indicating that zoning and lun-masking are correct
- On the Solaris side,  there are 14 logical units missing on the primary control domain that are visible on the secondary:

# ldm list-hba -l -d -p | awk -F\| '/hermon/ {print $4}' | awk -F\/ '{print $(NF-1)}' | sort -n | uniq -c | sort -nr
319 sec_hba2@5001397000569105
319 sec_hba1@5001397000611104
319 pri_hba2@500139700056911b
319 pri_hba1@500139700061111b
22 sec_hba9@500139700061110e
22 sec_hba10@500139700056910c
8 pri_hba5@500139700061110f
8 pri_hba10@500139700056910b
1 pri_hba9@..

- The two lines with 8 LUs attached should both read 22, like the ones above it.

- pri_hba9 above is an extra vHBA on the Xsigo with nothing useful attached

- Additionally: Running the format cmd on the Secondary control ldom shows 277 devices, Primary only finds 4 devices.

- Running devfsadm -Cv on the primary does'nt discover the missing disks.

- cfgadm tells us the same story:


pri># cfgadm -lav | awk -F\/ '/hermon/ {print $10}' | sort -n | uniq -c | sort -nr
319 pri_hba2@500139700056911b
319 pri_hba1@500139700061111b
8 pri_hba5@500139700061110f
8 pri_hba10@500139700056910b
1 pri_hba9@...

 

sec># cfgadm -lav | awk -F\/ '/hermon/ {print $10}' | sort -n | uniq -c | sort -nr
319 sec_hba2@..9105
319 sec_hba1@..1104
22 sec_hba9@..110e
22 sec_hba10@..910c


No obvious issues with the OVN FC vHBA's:

FOUND PATH TO 2 LEADVILLE HBA/CNA PORTS IN EXPLORER

C# INST# PORT WWN MODEL FCODE STATUS DEVICE PATH
-- ----- -------- ----- ----- ------ -----------
c3 qlc0 2100001b3204d5e2 SG-XPCIE1FC-QF4 2.01 CONNECTED /pci@300/pci@1/pci@0/pci@c/SUNW,qlc@0
c16 qlc5 2100001b3200e592 SG-XPCIE1FC-QF4 2.01 CONNECTED /pci@400/pci@1/pci@0/pci@8/SUNW,qlc@0

Summit_1 SG-XPCIE1FC-QF4 375-3355 QLE2460 142 PCI-E 4Gb 1 123305

c3 = qlc0 (fp13) -> /devices/pci@300/pci@1/pci@0/pci@c/SUNW,qlc@0/fp@0,0:devctl
================================================================================
Port_ID Port WWN Device Description Type
------- -------- ------------------ ----
147200 50060e8016018e40 -> HDS: SN# 398 (Disk device)
14d080 2100001b3204d5e2 -> QLogic HBA: Port 1 (Unknown Type,Host Bus Adapter)

c16 = qlc5 (fp0) -> /devices/pci@400/pci@1/pci@0/pci@8/SUNW,qlc@0/fp@0,0:devctl
================================================================================
Port_ID Port WWN Device Description Type
------- -------- ------------------ ----
a7200 50060e8016018e50 -> HDS: SN# 398 (Disk device)
ad080 2100001b3200e592 -> QLogic HBA: Port 1 (Unknown Type,Host Bus Adapter)
-bash-3.2$ more cfgadm-al-o_show_FCP_dev.out
Ap_Id Type Receptacle Occupant Condition
c3 fc-fabric connected configured unknown
c3::50060e8016018e40,0 disk connected configured unknown
c16 fc-fabric connected configured unknown
c16::50060e8016018e50,0 disk connected configured unknown


Control domain: host1
Domain role: LDoms control I/O service root
Domain name: primary

Changes

Trying to add new LUNs.

 

Cause

This is due to a known bug, in this example, LUN0 (Storage Controller LUN) was added <after> other LUN IDs instead of before/first.  To avoid having to set vhba down and then back up and performing rescan from OVN 'admin' CLI to find new Target LUNs,  because LUN0 was not added first, always add LUN0 before adding any other LUN IDs.  Bug:17490439 - OVN[SCSAV3]: FAILED TO DETECT LUN 0 WHEN MPXIO IS ENABLED FROM THE HOSTDelete Reference

Solution

Here is what targets are displayed for each vhba for each CDOM (Both Primary and Secondary) - Note that LUN0 does not show up at the beginning of each output in each instance, however LUN0 *is* present for all vhbas (initiators):

Fabric Interconnect 1:


pri_hba1.host1 50:06:0E:80:10:4B:6D:94 50:06:0E:80:10:4B:6D:94 0,1,2,3,4,5,6
pri_hba5.host1 50:06:0E:80:16:01:8E:42 50:06:0E:80:16:01:8E:42 1,2,3,4,5,7,8,0,6,9,10,11,12,13,14,15,16,17,18,19,20
pri_hba1.host1 50:06:0E:80:10:2B:68:34 50:06:0E:80:10:2B:68:34 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41
pri_hba1.host1 50:06:0E:80:16:01:8E:20 50:06:0E:80:16:01:8E:20 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,
  47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,
  91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,
  126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,
  159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,
  192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,
  226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,208
  
sec_hba1.host1 50:06:0E:80:10:4B:6D:95 50:06:0E:80:10:4B:6D:95 0,1,2,3,4,5,6
sec_hba9.host1 50:06:0E:80:16:01:8E:00 50:06:0E:80:16:01:8E:00 1,2,3,4,5,7,8,0,6,9,10,11,12,13,14,15,16,17,18,19,20
sec_hba1.host1  50:06:0E:80:10:2B:68:35 50:06:0E:80:10:2B:68:35 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41
sec_hba1.host1  50:06:0E:80:16:01:8E:21 50:06:0E:80:16:01:8E:21 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,
  47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,
  91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,
  126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,
  159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,
  192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,209,210,211,212,213,214,215,216,217,218,219,220,221,222,223,224,225,
  226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,254,255,208


Fabric Interconnect 2 ::

pri_hba2.host1 50:06:0E:80:10:4B:6D:9C 50:06:0E:80:10:4B:6D:9C 0,1,2,3,4,5,6
pri_hba10.host1 50:06:0E:80:16:01:8E:52 50:06:0E:80:16:01:8E:52 1,2,3,4,5,7,8,0,6,9,10,11,12,13,14,15,16,17,18,19,20
pri_hba2.host1 50:06:0E:80:10:2B:68:3C 50:06:0E:80:10:2B:68:3C 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41
pri_hba2.host1 50:06:0E:80:16:01:8E:30 50:06:0E:80:16:01:8E:30 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,
  47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,
  90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,
  125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,
  157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,
  189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,209,210,211,212,213,214,215,216,217,218,219,220,221,
  222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,
  254,255,208


sec_hba2.host1 50:06:0E:80:10:4B:6D:9D 50:06:0E:80:10:4B:6D:9D 0,1,2,3,4,5,6
sec_hba10.host1 50:06:0E:80:16:01:8E:10 50:06:0E:80:16:01:8E:10 1,2,3,4,5,7,8,0,6,9,10,11,12,13,14,15,16,17,18,19,20
sec_hba2.host1 50:06:0E:80:10:2B:68:3D 50:06:0E:80:10:2B:68:3D 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41
sec_hba2.host1 50:06:0E:80:16:01:8E:31 50:06:0E:80:16:01:8E:31 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,
  47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,
  90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,
  125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,
  157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,
  189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,209,210,211,212,213,214,215,216,217,218,219,220,221,
  222,223,224,225,226,227,228,229,230,231,232,233,234,235,236,237,238,239,240,241,242,243,244,245,246,247,248,249,250,251,252,253,
  254,255,208


From OVN F1-15 perspective - both primary and secondary CDOMs have the exact same LUN counts present for each corresponding vhba, vhba LUN counts match across the primary and secondary CDOMs for each respective vhba. Problem is not at OVN F1-15 (Switch) level. No Fault found on OVN F1-15s, but it is apparent even from OVN CLI that the Storage Controller LUN0 was not added first, even though it LUN0 is present for all vhba/initiators.  


LUN0 (Controller LUN) is not being found/reported.

cfgadm -alv doesn't show lun0. It shows only 1 to 8.

# cat cfgadm-alv.out | grep pri_hba5
unavailable scsi-fabric n /devices/pci@340/pci@1/pci@0/pci@c/pciex15b3,673c@0/hermon@4/ibport@1,ffff,xstn/pri_hba5@500139700061110f/iport@v0:scsi
unavailable disk-path n /devices/pci@340/pci@1/pci@0/pci@c/pciex15b3,673c@0/hermon@4/ibport@1,ffff,xstn/pri_hba5@500139700061110f/iport@v0:scsi::w50060e8016018e42,1
unavailable disk-path n /devices/pci@340/pci@1/pci@0/pci@c/pciex15b3,673c@0/hermon@4/ibport@1,ffff,xstn/pri_hba5@500139700061110f/iport@v0:scsi::w50060e8016018e42,2
unavailable disk-path n /devices/pci@340/pci@1/pci@0/pci@c/pciex15b3,673c@0/hermon@4/ibport@1,ffff,xstn/pri_hba5@500139700061110f/iport@v0:scsi::w50060e8016018e42,3
unavailable disk-path n /devices/pci@340/pci@1/pci@0/pci@c/pciex15b3,673c@0/hermon@4/ibport@1,ffff,xstn/pri_hba5@500139700061110f/iport@v0:scsi::w50060e8016018e42,4
unavailable disk-path n /devices/pci@340/pci@1/pci@0/pci@c/pciex15b3,673c@0/hermon@4/ibport@1,ffff,xstn/pri_hba5@500139700061110f/iport@v0:scsi::w50060e8016018e42,5
unavailable disk-path n /devices/pci@340/pci@1/pci@0/pci@c/pciex15b3,673c@0/hermon@4/ibport@1,ffff,xstn/pri_hba5@500139700061110f/iport@v0:scsi::w50060e8016018e42,7
unavailable disk-path n /devices/pci@340/pci@1/pci@0/pci@c/pciex15b3,673c@0/hermon@4/ibport@1,ffff,xstn/pri_hba5@500139700061110f/iport@v0:scsi::w50060e8016018e42,8


This leads us back to this bug published in OVN Solaris 11.3 Product Notes:

The Solaris 11.3 OVN Product notes state this - LUN 0 is showing from OVN CLI but not from the host:

https://docs.oracle.com/cd/E38500_01/pdf/E66369.pdf

Bug 17490439
Failed to Detect LUN 0 When MPXIO Is Enabled From the Host
After the vHBA is created, Oracle Solaris 11.3 hosts fail to detect LUN 0 when LUN 0 is
added to a vHBA. This is a SCSIv3 layer bug.
Workaround:
LUN 0, is the controller LUN, should always be mapped to the
vHBA manually.

From bug notes for above bug:

1. If LUN 0 is not mapped at all, then the other LUNs are easily discovered
on host with a mere OS rescan and format command. No reboot is required.

2. If LUN 0 is mapped, then we may at least need a vhba down/up to detect that
LUN 0 in OS. Mere OS rescan is discovering all the other LUN-ids apart from
LUN 0. But success is that reboot is still no more required as earlier times.

What the above bug notes are saying is adding storage Controller LUN0 after other LUNs, requires vhba down and then back up followed by a "rescan" to find new LUNs.  If Controller LUN0 is added first, then a mere vhba rescan will discover all LUNs - even at host level.   As an additional note, it is always best when using OVN vhbas, to rescan for new Target LUNs from OVN CLI versus rescanning for new LUNs from host CLI.   OVN CLI vhba rescan pushes new LUNs out to hosts, but rescanning LUNs from hosts (CDOM) does not push discovered LUNs *back* to OVN.

Action Plan to recover:

EXAMPLE:

1) set applicable vhbas down and then back up from OVN CLI to detect LUN 0 at OS, rescan the applicable vhbas once they are back up on the Fabric Interconnects and then run this command in the Host CLI: then probe for LUNs on the host CLI :: cfgadm -alv

If above doesn’t work, then:

2) reboot host

To spell this out:

Run commands below as "admin' on OVN F1-15:


# set vhba pri_hba5.host1 down


Verify came fully down/down:


# show vhba pri_hba5.host1


Set vhba back up after fully came to down/down state:

# set vhba pri_hba5.host1 up


Show vhba came fully up/up:

# show vhba pri_hba5.host1


Display number of vhba targets/LUNs before rescanning targets:

# show vhba pri_hba5.hos1 targets


Rescan Targets:

# set vhba pri_hba5.host1 rescan


Show vhba targets/LUNs after rescanning:

# show vhba pri_hba5.host1 targets


Do this with the vhbas that do not have matching LUN counts for example pri_hba5, sec_hba9, pri_hba10, sec_hba10  - for other customers - vhba names will be different, these names are just examples.

Note if the target LUN count increased after rescan. You may still have to rescan for new LUNs from Host CDOM CLI.

Lastly on the host CLI run :: cfgadm -alv


 

 

References

<BUG:17490439> - OVN[SCSAV3]: FAILED TO DETECT LUN 0 WHEN MPXIO IS ENABLED FROM THE HOST

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback