![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||
Solution Type Technical Instruction Sure Solution 2162835.1 : How to fix the problem of ping failure or communication failure over ipoib interface when ibping works
In this Document
Applies to:Sun Datacenter InfiniBand Switch 36 - Version All Versions and laterSun Network QDR InfiniBand Gateway Switch - Version All Versions and later Oracle Exadata Hardware - Version 11.1.0.6 and later Sun Infiniband HCA - Version All Versions and later Oracle Exalogic Elastic Cloud Software - Version 1.0.0.0.0 and later Information in this document applies to any platform. GoalIn an infiniband network, communication between nodes over IPoIB interfaces may fail because of several reasons. Ping over IPoIB interface does not work whereas ibping between these two nodes using lid of the ports are working well. ibstat output shows that IB ports are up and active. This document provides guidelines for fixing such an issue. SolutionThere are several possible reasons why ping over IPoIB interface not working whereas ibping works. The following are some of them. (Note: It is assumed that IB ports are up and active and ibping works) 1. ib ports of these two nodes are not in the same IB partitions
1. ib ports of these two hosts are not in the same IB partitions. To check this, first find the portguids of the ports by running the following command on this host #ibstat Example: # ibstat If this host is running solaris, the following command will also help to know the port guid and the partitions each belongs #dladm show-ib Example: # dladm show-ib Then, login to the IB switch which is running as the current Master and run the following command to know the IB partitions in the IB fabric #smpartition list active For a connectivity between the two hosts, the ports guid of both these hosts must belong to the same partition. So, check the output of the above command and make sure that this condition is met. If port guid of one of these hosts is missing in the partition they are supposed to belong, add it by running the following commands on the IB switch which is running as the Master. #smpartition start
2. ipoib flag is not set in the ib partitions.
#smpartition list active Example: #smpartition list active If ipoib flag is missing in the partition, you may add it as follows by running the following command on the switch running as master. #smpartition start
3. mtu of the interface (IB layer MTU) is not matching with the mtu of the IB partition at switch
doc id 1988452.1 : dladm show-part shows link down over an IB partition (Doc ID 1988452.1)
4. Master subnet manager is not present in the IB fabric, or it is in a limbo state, or the SA database of SM master is corrupted. Run the following command on any IB switch to know who the current master is #getmaster Regardless on which switch the above command is run, it should always point to the same master. There shall be only one master in an ib fabric. It is possible that the SM master is not functioning well, or is in a limbo state due to either bug 17482244 or any other reason. doc id 2016560.1 : Troubleshooting communication issues over an Infiniband fabric Using ibping, ping, and rds-ping (Doc ID 2016560.1) To test using port guid of the destination, run the ibping command on the client as follows: #ibping -G <port guid of the destination> To fix this, reboot the switch which is running as the current Master. Disabling SM on this switch also may help, instead of rebooting.
References<NOTE:2016560.1> - Troubleshooting communication issues over an Infiniband fabric Using ibping, ping, and rds-ping<NOTE:1988452.1> - dladm show-part shows link down over an IB partition Attachments This solution has no attachment |
||||||||||||||||
|