Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1988445.1
Update Date:2018-05-23
Keywords:

Solution Type  Problem Resolution Sure

Solution  1988445.1 :   ibqueryerrors Reports SymbolErrors on One or More of the IB switch or HCA port  


Related Items
  • Big Data Appliance X3-2 Full Rack
  •  
  • Exadata X3-2 Hardware
  •  
  • Exadata X4-2 Hardware
  •  
  • Exalogic Elastic Cloud X5-2 Hardware
  •  
  • Big Data Appliance X3-2 In-Rack Expansion
  •  
  • Big Data Appliance X5-2 Full Rack
  •  
  • Oracle SuperCluster M7 Hardware
  •  
  • Oracle Exalogic Elastic Cloud Software
  •  
  • Exadata X5-8 Hardware
  •  
  • Exalogic Elastic Cloud X4-2 Hardware
  •  
  • Exadata X5-2 Hardware
  •  
  • Big Data Appliance X5-2 Hardware
  •  
  • Exadata X4-8 Hardware
  •  
  • Big Data Appliance X4-2 Starter Rack
  •  
  • Exadata X3-8 Hardware
  •  
  • Zero Data Loss Recovery Appliance X4 Hardware
  •  
  • Zero Data Loss Recovery Appliance X5 Hardware
  •  
  • Exalogic Elastic Cloud X3-2 Hardware
  •  
  • Big Data Appliance X4-2 In-Rack Expansion
  •  
  • Oracle Exalogic Elastic Cloud X2-2 Hardware
  •  
  • Big Data Appliance X3-2 Starter Rack
  •  
  • Oracle SuperCluster M6-32 Hardware
  •  
  • Oracle SuperCluster T5-8 Hardware
  •  
  • SPARC SuperCluster T4-4
  •  
Related Categories
  • PLA-Support>Eng Systems>BDA>Big Data Appliance>DB: BDA_EST
  •  




In this Document
Symptoms
Cause
Solution


Created from <SR 3-10238951403>

Applies to:

Big Data Appliance X3-2 In-Rack Expansion - Version All Versions and later
Big Data Appliance X5-2 Full Rack - Version All Versions and later
Big Data Appliance X5-2 Hardware - Version All Versions and later
Big Data Appliance X3-2 Full Rack - Version All Versions and later
Exadata X3-2 Hardware - Version All Versions and later
Linux x86-64

Symptoms

Note: The information in this note applies to BDA as well as Exadata, Exalogic, SuperCluster, BDA, Private Cloud Appliance, Zero Data Loss Recovery Appliance, and standalon Infiniband switches.

ibqueryerrors reports [SymbolErrors] on one of the IB switches. 

Note: This is not limited to Gateway switches.  The same applies to any IB Switch or HCA port.

  

For example:

[root@bdasw-ib2 ~]# ibqueryerrors.pl -rR -s PortRcvSwitchRelayErrors,PortXmitDiscards,PortXmitWait,VL15Dropped
  
Suppressing: PortRcvSwitchRelayErrors,PortXmitDiscards,PortXmitWait,VL15Dropped


Errors for * "SUN IB QDR GW switch bdasw-ib2 *.*.*.*"
GUID 0x002128d02ccac0a0 port 13: [SymbolErrors == 2]
      Link info:     68   13[  ]  ==( 4X 10.0 Gbps)==>  *   36[  ] "SUN DCS 36P QDR bdasw-ibs01 *.*.*.*"

 

 

Cause

Symbol errors are almost always caused by a poorly seated cable or defective cable. In rare cases they can be caused by a defective switch port.

Solution

Open an SR to check the gateway.

Typically the cable needs to be reseated, after reseating follow the steps below.



After this verify if errors are actively being logged:

1. Clear the errors and counters from the leaf switch:

# ibclearcounters
# ibclearerrors


2. Generate some traffic using the following command on the leaf switch:

# ibdiagnet -c 100 -P all=1


3. On the switch reporting the errors run getportcounters to display the port counters.  Verify the counters are cleared.  Note that both the port number and port label are valid arguments.

For example:

bdasw-ib2# getportcounters 13

or

bdasw-ib2# getportcounters 6b

 

Note: If you need a mapping of the port numbers to the port labels on the switch where the operations are being performed, get this with:

bdasw-ib2# dcsport -printconnectors

# dcsport -printconnectors
DCS-GW connectors:
Connector 0A maps to Switch port 20
Connector 1A maps to Switch port 22
Connector 2A maps to Switch port 24
Connector 3A maps to Switch port 26
Connector 4A maps to Switch port 28
Connector 5A maps to Switch port 30
Connector 6A maps to Switch port 35
Connector 7A maps to Switch port 33
Connector 8A maps to Switch port 31
Connector 9A maps to Switch port 14
Connector 10A maps to Switch port 16
Connector 11A maps to Switch port 12
Connector 12A maps to Switch port 18
Connector 13A maps to Switch port 9
Connector 14A maps to Switch port 7
Connector 15A maps to Switch port 5
...

  

Output when counters are cleared is like:

bdasw-ib2# getportcounters 6b
Port counters for connector 6B Switch port 36
SymbolErrors.....................0
LinkRecovers.....................0
LinkDowned.......................0
RcvErrors........................0
RcvRemotePhysErrors..............0
RcvSwRelayErrors.................0
XmtDiscards......................0
XmtConstraintErrors..............0
RcvConstraintErrors..............0
LinkIntegrityErrors..............0
ExcBufOverrunErrors..............0
VL15Dropped......................0
XmtData..........................0
RcvData..........................0
XmtPkts..........................0
RcvPkts..........................0
XmtWait..........................0


Ensure symbol errors are zero.

4. If you want to verify the IB port counters from a Server do the following:

a) Run ibnetdiscover to discover the InfiniBand topology

# ibnetdiscover

b) Then run perfquery to query the InfiniBand port counters.

For example on a node you can use the following perfquery command where 13 is the lid and 1 is the port obtained from ibnetdiscover:

# perfquery 13 1

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback