![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||
Solution Type Troubleshooting Sure Solution 1538237.1 : Gathering Troubleshooting Information for the Infiniband Network in Engineered Systems
This document lists the data gathering required to troubleshoot issues with infiniband network in Exadata, Exalogic and suppercluster Engineered Systems. This document can also be useful to gather information on any infiniband network where Sun Datacenter Infiniband switch 36 and/or Sun Network QDR Gateway Switch are used. In this Document
Applies to:Exadata X3-2 HardwareSun Datacenter InfiniBand Switch 36 - Version Not Applicable to Not Applicable [Release N/A] Sun Network QDR InfiniBand Gateway Switch - Version Not Applicable to Not Applicable [Release N/A] Oracle SuperCluster Specific Software Sun Microsystems > Boards > InfiniBand (IB) Information in this document applies to any platform. PurposeThis document includes the data gathering required to troubleshoot issues with an Infiniband network in Exadata, Exalogic and supercluster Engineered Systems. It is also useful in gathering information on any Infiniband network where a Sun Datacenter Infiniband switch 36 and/or a Sun Network QDR Gateway Switch are used. This document lists the data to be collected to troubleshoot infiniband network Troubleshooting Steps1. From all the infiniband switches in the network, collect the outputs of the following commands: a) version
2. Copy the following files from all the infiniband switches;
4. From the IB switch running as Master, collect #smpartition list active
ibqueryerrors.pl -rR -s RcvSwRelayErrors,XmtDiscards,XmtWait,VL15Dropped /usr/bin/ibdiagnet -skip dup_guids -ls 10 -lw 4x -pm This command will create a few files in /tmp directory. Copy these files. Example:
This will let us capture all of the pm counters since the last time the errors & counters were cleared. Then wait for an hour and collect another ibdiagnet and ibqueryerrors output once more. NOTE: Alternatively, if immediate results are required, traffic may be generated manually...
# /usr/bin/ibdiagnet -c 500 -P all=1 (this will send 500 packets over all links) ...and collect another ibdiagnet and ibqueryerrors output once more. ibqueryerrors.pl -rR -s RcvSwRelayErrors,XmtDiscards,XmtWait,VL15Dropped /usr/bin/ibdiagnet -skip dup_guids -ls 10 -lw 4x -pm and copy the files from /tmp directory as follows:. Example: NOTE: Once the information has been provided, please remove the pre- and post-clear-ibdiagnet.tar files from the switch.
6. If there are Sun Network QDR Gateway Switch in the network (In Exalogic systems, for example), collect the outputs of the following commands in all the Sun Network QDR Gateway Switches : a) showvnics The following data may also be collected on these nodes, if they are not in the explorer, sosreport or support bundle. ibstat 8. If the issue is with communication between nodes, then additional data may be collected as per the following document Troubleshooting communication issues over an Infiniband fabric Using ibping, ping, and rds-ping (Doc ID 2016560.1) tcpdump data may also be collected on the appropriate interfaces of both the nodes while pinging from one node to the other.
NOTE:The -t option has been deprecated and is no longer available in newer version of the 'verify-topology' script.
In this case, alternatively, the 'ibnetdiscover' data can be reviewed for assessment of the IB topology.
10. If the issue is with the infiniband switch and its hardware, a snapshot of its ILOM may also be collected. Attachments This solution has no attachment |
||||||||||||||||
|