Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2249406.1
Update Date:2018-04-23
Keywords:

Solution Type  Problem Resolution Sure

Solution  2249406.1 :   ibqueryerrors Reports PortXmitWait errors on One or More of the IB switch or HCA port  


Related Items
  • Exadata X3-2 Hardware
  •  
  • Exadata X4-2 Hardware
  •  
  • Exadata X5-2 Hardware
  •  
  • Exadata X6-2 Hardware
  •  
  • Exadata Database Machine X2-2 Hardware
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Exadata>DB: Exadata_EST
  •  




Created from <SR 3-14520832871>

Applies to:

Exadata Database Machine X2-2 Hardware - Version All Versions and later
Exadata X3-2 Hardware - Version All Versions and later
Exadata X4-2 Hardware - Version All Versions and later
Exadata X5-2 Hardware - Version All Versions and later
Exadata X6-2 Hardware - Version All Versions and later
Information in this document applies to any platform.

Symptoms

When running the ibqueryerrors command, the output lists many PortXmitWait errors with non-zero error counts.   For example:

     [root@MyExa-db1 ~]# ibqueryerrors
     Errors for 0x10e035bac0a0a0 "SUN DCS 36P QDR MyExa-bsw001 10.11.12.13"
     GUID 0x10e035bac0a0a0 port ALL: [PortXmitWait == 150]
     GUID 0x10e035bac0a0a0 port 0: [PortXmitWait == 150]
     Errors for "MyExa-db1 S 10.230.49.53 HCA-1"
     GUID 0x21280001fc683c port 2: [PortXmitWait == 206704]
     Errors for "MyExa-db1 S 10.230.49.54 HCA-1"
     GUID 0x21280001eff452 port 2: [PortXmitWait == 140008]
     Errors for "MyExa-db2 C 10.230.49.55 HCA-1"
     GUID 0x10e00001328c3a port 2: [PortXmitWait == 10194301]
     Errors for "MyExa-db2 S 10.230.49.60 HCA-1"
     GUID 0x10e000012913da port 2: [PortXmitWait == 256070]

     <SNIP>

 

Cause

The PortXmitWait entry in the ibqueryerrors output shows the number of "ticks" during which the port selected by PortSelect had data to transmit but no data was sent during the entire tick.  The data was not sent either because of insufficient credits or because of lack of arbitration.

For example, when there is end-node contention, e.g., two HCAs sending data to one HCA, the two sending HCAs will only send 50% of the time, since the receiving HCA only can consume 100%.  Hence, the XmitWait counters on the HCA will increment.  PortXmitWait errors do not mean that any packets are dropped. It only indicates a delay in forwarding of the packets.   There is no actual problem.

 

Solution

Use the following command to filter out the ignorable results and see only the results that are of concern:

     ibqueryerrors.pl -rR -s PortRcvSwitchRelayErrors,PortXmitDiscards,PortXmitWait,VL15Dropped


If you see non-zero SymbolErrors counts, refer to Doc ID 1988445.1.

If you see other non-zero error counts, apply the information detailed in Doc ID 2276427.1.  If this does not resolve the problem, collect the data detailed in Doc ID 1683903.1 and open a Service Request with Oracle Support.

 

References

<NOTE:1988445.1> - ibqueryerrors Reports SymbolErrors on One or More of the IB switch or HCA port
<NOTE:1518889.1> - Oracle Fabric Interconnect :: Description of error counters in IB-path command output
<NOTE:2276427.1> - Infiniband - Port HCA-#:# Is Showing Non-Zero Error Counts

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback