Exalogic VM lost network communication through both vnics when one IB gateway switch was rebooted

Asset ID:	1-72-2227109.1
Update Date:	2018-01-03
Keywords:

Solution Type Problem Resolution Sure

Solution 2227109.1 : Exalogic VM lost network communication through both vnics when one IB gateway switch was rebooted

Applies to:

Sun Network QDR InfiniBand Gateway Switch - Version All Versions and later
Oracle Exalogic Elastic Cloud Software - Version 1.0.0.0.0 and later
Information in this document applies to any platform.

Symptoms

A VM in an exalogic system lost communication through all its vnics when one of the infiniband gateway switch went down or rebooted.

Cause

The root cause of the problem was due to a misconfiguration of the vnics. When a vnic is created on a gateway switch, it is expected that the port guid used for creating that vnic is associated with the directly connected port of the IB HCA of the server node. For example, if the server has one HCA with two IB ports, and if port 1 is connected to gateway switch gw01, and port 2 is connected to gw02, then when creating vnic on gw01, the port guid used must belong to port 1 of the HCA. And, when creating a vnic on gw02, it must use port guid associated with port 2 of the HCA. If these are swapped, the the packets from the hosts through these vnics will pass through both IB switches, requiring both switches to be up and active for both vnics to be operational. Failure of any one switch will break that path resulting in the failure of both vnics. That is what happened in this case. It will be observed that only those VMs where these ports are swapped the problems are seen and all other VMs will be unaffected when any one of the gateway switch is brought down.

Here is an example:

Here is the output of the ibstat in the VM.

# ibstat
CA 'mlx4_0'
      CA type: MT4100
      Number of ports: 2
      Firmware version: 2.11.1282
      Hardware version: 0
      Node GUID: 0x0010e00001436a10
      System image GUID: 0x0010e00001436a13
      Port 1:
                  State: Active
                  Physical state: LinkUp
                  Rate: 40
                  Base lid: 10
                  LMC: 0
                  SM lid: 107
                  Capability mask: 0x02514868
                 Port GUID: 0x0010e00067436a11
                 Link layer: IB
     Port 2:
                State: Active
               Physical state: LinkUp
               Rate: 40
               Base lid: 13
               LMC: 0
               SM lid: 107
              Capability mask: 0x02514868
              Port GUID: 0x0010e00067436a12
              Link layer: IB

Output of ibstat in the corresponding compute node is identical.

And, the cabling of both these ports of the HCA can be seen in the ibnetdiscover output as follows:

vendid=0x2c9
devid=0x1003
sysimgguid=0x10e00001436a13
caguid=0x10e00001436a10
Ca 2 "H-0010e00001436a10" # "dpxlp01acn07 EL-C 192.168.52.107 HCA-1"
[1](10e00001436a11) "S-0010e031373cc0a0"[11] # lid 10 lmc 0 "SUN IB QDR GW switch dpxlp01agw01 10.10.172.133" lid 7 4xQDR
[2](10e00001436a12) "S-002128d1c3eac0a0"[11] # lid 13 lmc 0 "SUN IB QDR GW switch dpxlp01agw02 10.10.172.134" lid 107 4xQDR

This shows that port 1(mlx4_0:1) of the HCA is cabled to port 11 of the ib gateway switch dpxlp01agw01 and port 2 (mlx4_0:2) of the HCA is cabled to port 11 of the switch dpxlp01agw02. So, the direct path from mlx4_0:1 is to gw01, and direct path from mlx4_0:2 is to gw02.

In a VM that lost connectivity when one of these two switches was shut down, it was found that the vnics were created as follows:

# mlx4_vnic_info -s
NETDEV_NAME eth651_2.1190
MAC 00:14:4f:f8:d2:71
VLAN 0x4a6
IOA_PORT mlx4_0:2 <<<<<<<<<<<<<<<<< Incorrect association of port and switch
BX_NAME dpxlp01agw01 <<<<<<<<<<<<<
EPORT_NAME 0A-ETH-2

NETDEV_NAME eth650_2.1818
MAC 00:14:4f:fb:7f:8f
VLAN 0x71a
IOA_PORT mlx4_0:1 <<<<<<<<<<<<<<<<< Incorrect association of port and switch
BX_NAME dpxlp01agw02 <<<<<<<<<<<<<<
EPORT_NAME 1A-ETH-2

This output shows that vnic eth651_2.1190 is created on the switch dpxlp01agw01 using port guid of port 2 (mlx4_0:2) of the HCA. And, vnic eth650_2.1818 is created on the switch dpxlp01agw02 using guid of port 1 (mlx4_0:1).
This is a misconfiguration. port 2 is not directly connected to gw01, and port 1 is not directly connected to gw02. With this configuration, the paths of packets through these vnics will pass through both switches. So, for these two vnics to be operational, both switches must be up and active. Failure of any one of these switches will result in a failure of both vnics of the VM.

Normally, when EMOC creates these vnics, it creates them correctly without these swapping. However, this swapping happens sometimes when this VM is rebooted or restarted manually or through xm.

Solution

To resolve this problem, re-create these vnics.

In an exalogic system, restarting this VM through EMOC will restore the correct configuration.

So, the simple solution is to restart this VM through EMOC.

Attachments

This solution has no attachment