Asset ID: |
1-71-1936816.1 |
Update Date: | 2017-05-18 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
1936816.1
:
Overview of Troubleshooting Tools and Utilities for IB and Network Health Between an Exadata and BDA
Related Items |
- Big Data Appliance Hardware
|
Related Categories |
- PLA-Support>Eng Systems>BDA>Big Data Appliance>DB: BDA_EST
- _Old GCS Categories>Support>SET>DiagnosticTools>Health Check Rulesets
- _Old GCS Categories>ST>Server>Engineered Systems>Big Data Appliance>Image
|
In this Document
Applies to:
Big Data Appliance Hardware - Version Not Applicable and later
Linux x86-64
Goal
This note outlines some tools and utilities that can be used to perform IB/Network diagnostics and verify the IB and Network
health between an Exadata and Oracle Big Data Appliance.
Solution
1. traceroute
traceroute tracks the route packets taken from an IP network on their way to a given host.
Use "traceroute -n <destination host>" to confirm that there are no connection failures between an Exadata and BDA.
From an Exadata machine run:
$ traceroute -n <BDA host name or BDA IP>
2. ibdiagnet
ibdiagnet performs InfiniBand fabric diagnostics. ibdiagnet scans the IB fabric using directed route packets and extracts all the available information regarding its connectivity and devices into a file /tmp/ibdiagnet.log.
Use this to confirm the IB health.
You can run this on the BDA or Exadata with:
# ibdiagnet -v -r
...
Please see
/tmp/ibdiagnet.log for complete log
----------------------------------------------------------------
-I- Done. Run time was 35 seconds.
If there are problems with continuous errors on the Infiniband leaf switch like:
# ibqueryerrors.pl -rR -s PortRcvSwitchRelayErrors,PortXmitDiscards,PortXmitWait,VL15Dropped
Suppressing: PortRcvSwitchRelayErrors,PortXmitDiscards,PortXmitWait,VL15Dropped
Errors for 0x* "SUN IB QDR GW switch bdasw-ib2 *.*.*.*"
GUID 0x* port 13: [SymbolErrors == 11]
Link info: 68 13[ ] ==( 4X 10.0 Gbps)==> 0x0010e035b911a0a0 36[ ] "SUN DCS 36P QDR bdasw-ibs01 *.*.*.*
Collecting: # ibdiagnet -c 100 -P all=1 might help.
3. SOS report.
sosreport is a tool that collects information to assist with troubleshooting. It provides a snapshot of the resources/configuration at a given time.
For details on how to collect an SOS report see: Oracle Big Data Appliance Diagnostic Tools and Log Files for Troubleshooting Analysis (Doc ID 1575575.1) the section on "sosreport".
4. Generic IB and network checks:
a) On the BDA:
On the BDA tools to perform general IB and network checks are: bdacheckib, bdachecknet
Details are at: Oracle® Big Data Appliance Owner's Guide Release 4 (4.0), 13 Monitoring the Health of Oracle Big Data Appliance.
From there:
bdacheckib: Checks the InfiniBand cabling between the servers and switches of a single rack, when entered with no options.
bdachecknet: Checks whether the network configuration is working properly.
b) On the Exadata:
Use exadata-ibdiagtools. This is an Exadata-specific suite of utilities intended for use only on Exadata. exadata-ibdiagtools does not exist on the BDA. (on the BDA the equivalent are bdacheckib and bdachecknet as above).
c) On BDA and Exadata you can use the standard IB/network tools such as iputils, ibutils and infiniband-diags.
5. ibqueryerrors
ibqueryerrors queries and reports non-zero IB port counters. The default behavior is to report the port error counters which exceed a threshold for each port in the fabric. The default threshold is zero (0).
ibqueryerrors can be run on a Exadata and BDA to see if errors are reported in the IB fabric.
6. ib_write_bw
Use ib_write_bw to verify the IB raw bandwidth is ok.
ib_write_bw performs a write bandwidth diagnostic. It is Issued on a Linux InfiniBand host.
Use ib_write_bw to test the raw IB bandwidth across two BDAs, two Exadatas and across an Exadata and BDA.
7. iperf
Use to measure TCP and UDP bandwidth performance. This is an open-source tool.
Use to measure performance between two BDAs, two Exadatas, and between Exadata and BDA.
8. netperf
Netperf is a software application that provides network bandwidth testing between two hosts on a network.
Check if default partitioning is being used on the Exadata. Generally default partitioning should not be used on an Exadata. It is ok on a BDA.
Also note that IB transfer rate should not be sensitive to distance. The number of hops should not be adding latency. Copper cables grow in length and have greater latency but not optical fiber.
Attachments
This solution has no attachment