![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1904446.1 : Oracle ZFS Storage Appliance: Infiniband IB Port not Activated
In this Document
Created from <SR 3-8688320181> Applies to:Oracle ZFS Storage ZS3-2 - Version All Versions to All Versions [Release All Releases]Sun ZFS Storage 7120 - Version All Versions to All Versions [Release All Releases] Sun ZFS Storage 7420 - Version All Versions to All Versions [Release All Releases] Oracle ZFS Storage ZS3-4 - Version All Versions to All Versions [Release All Releases] Sun ZFS Storage 7320 - Version All Versions to All Versions [Release All Releases] 7000 Appliance OS (Fishworks) SymptomsThe Appliance: ZS3-2 active-active cluster storage is unable to communicate using Infiniband Ports. We have two Sun datacenter 36 Infiniband switches: Port1 of Head1 connected to Switch1 Port 0A Port2 of Head1 connected to Switch2 Port 0A Port1 of Head2 connected to Switch1 Port 0B Port2 of Head2 connected to Swicth2 Port 0B Both switches have pkey 0xfe80 Switch1 has an sm priority 6 and switch2 sm priority 5 All connected ports are showing active and enabled at switch side but the IB is not active at Storage side.
ChangesNew Infiniband Installation CauseThe support bundle Net logs showed this: LINK HCAGUID PORTGUID PORT STATE PKEYS
ibp1 10E000013284D0 10E000013284D2 2 up FFFF <<<<<<<<<<<<<<<<<<<<<<<<<< Pkey FFFF ibp0 10E000013284D0 10E000013284D1 1 up FFFF dladm-show-part.out LINK PKEY OVER STATE FLAGS
pfe80_ibp0 FE80 ibp0 down f--- <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Pkey FE80 pfe80_ibp1 FE80 ibp1 down f--- SolutionLooking at the support bundles for both appliances the pkey was shown as FFFF in some instances and FE80 in others: ## dladm-show-ib.out LINK HCAGUID PORTGUID PORT STATE PKEYS
ibp1 10E000013284D0 10E000013284D2 2 up FFFF <<<<<<<<<<<<<<<<<<<<<<<<<< FFFF ibp0 10E000013284D0 10E000013284D1 1 up FFFF (This subcommand displays the physical links, port GUID, port# HCA GUID, and P_Key present on the port at the time the command is running) IB partition link information. ## dladm-show-part.out LINK PKEY OVER STATE FLAGS
pfe80_ibp0 FE80 ibp0 down f--- <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< FE80 pfe80_ibp1 FE80 ibp1 down f---
The default PKEY is FFFF on the appliance, but the link outputs showed then set to FE80, which implied the switch was set to a different PKEY value. The partition link state is down under the following conditions:
From the appliance help it also is very clear that the partition will remain "down" until the port GUID is member of the subnet partition:
Partition Key Use the partition (fabric domain) in which the underlying port device is a member. The partition key (pkey) is found on and configured by the subnet manager (SM). The pkey may be defined before configuring the subnet manager but the datalink will remain "down" until the subnet partition has been properly configured with the port GUID as a member. It is important to keep partition membership for HCA ports consistent with IPMP and clustering rules on the subnet manager.
Further data that can be collected to help isolate an Infiniband issue: For Example: Switch : 0x002128f56f5da0a0 ports 36 "SUN DCS 36P QDR localhost 10.145.229.242" enhanced port 0 lid 1 lmc 0
CLI> confirm shell ibstat For Example: CA 'mlx4_0'
CA type: 0 Number of ports: 2 Firmware version: 2.6.000 Hardware version: 160 Node GUID: 0x00212800013f2416 System image GUID: 0x00212800013f2419 Port 1: State: Active Physical state: LinkUp Rate: 40 Base lid: 21 LMC: 0 SM lid: 1 Capability mask: 0x00000030 Port GUID: 0x00212800013f2417 Link layer: IB Port 2: State: Active Physical state: LinkUp Rate: 40 Base lid: 22 LMC: 0 SM lid: 1 Capability mask: 0x00000030 Port GUID: 0x00212800013f2418 Link layer: IB
For Example: Switch 0x002128f56f5da0a0 SUN DCS 36P QDR localhost 10.145.229.242:
1 1[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 2[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 3[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 4[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 5[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 6[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 7[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 8[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 9[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 16 2[ ] "zs3-2-a PCIe 6" ( ) 1 10[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 23 1[ ] "zs3-2-a PCIe 6" ( ) 1 11[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 12[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 13[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 14[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 15[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 16[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 17[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 18[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 19[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 20[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 21[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 22 2[ ] "s7420-a PCIe 4" ( ) 1 22[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 21 1[ ] "s7420-a PCIe 4" ( ) 1 23[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 24[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 25[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 19 1[ ] "s7420b1 PCIe 3" ( ) 1 26[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 20 2[ ] "s7420b1 PCIe 3" ( ) 1 27[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 18 1[ ] "s7420b2 PCIe 3" ( ) 1 28[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 17 2[ ] "s7420b2 PCIe 3" ( ) 1 29[ ] ==( Down/ Polling)==> [ ] "" ( ) 1 30[ ] ==( Down/ Polling)==> [ ] "" ( ) 1 31[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 32[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 33[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 34[ ] ==( Down/Disabled)==> [ ] "" ( ) 1 35[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 12 2[ ] "MT25408 ConnectX Mellanox Technologies" ( ) 1 36[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 11 1[ ] "MT25408 ConnectX Mellanox Technologies" ( )
Sun Storage 7000 Unified Storage System: How to Troubleshoot Infiniband issues (Doc ID 1435063.1) # listlinkup
# ibnetdiscover
# sminfo
# ibdiagnet -v -r | tee /var/ak/dropbox/ibdiagnet.out
http://docs.oracle.com/cd/E18476_01/doc.220/e18478/physical_part.htm
References<NOTE:1587913.1> - Sun Storage 7000 Unified Storage System: How to rebuild network interfaces<NOTE:1435063.1> - Sun Storage 7000 Unified Storage System: How to Troubleshoot Infiniband Issues Attachments This solution has no attachment |
||||||||||||||||||||
|