Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1473622.1
Update Date:2016-03-17
Keywords:

Solution Type  Problem Resolution Sure

Solution  1473622.1 :   Exachk Reports InfiniBand Errors  


Related Items
  • Exadata Database Machine X2-2 Full Rack
  •  
Related Categories
  • PLA-Support>Sun Systems>SAND>Network>SN-SND: Sun Network Infiniband
  •  


Exacheck reports infiniband errors in a setup with 1 Exadata X2-2 FullRack connected to 1 Exalogic HalfRack.

Created from <SR 3-5863579341>

Applies to:

Exadata Database Machine X2-2 Full Rack - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

 Exacheck reports infiniband errors in a setup with 1 Exadata X2-2 FullRack connected to 1 Exalogic HalfRack:

Errors for 0x2128df102ac000 "SUN IB QDR GW switch camgw01 10.0.55.193 Bridge 0"
  GUID 0x2128df102ac000 port 1: [VL15Dropped == 5]
      Link info:     52   1[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>  0x002128df102ac0a0     74    4[  ] "SUN IB QDR GW switch camgw01 10.0.55.193 Bridge 0" ( )
  GUID 0x2128df102ac000 port 2: [VL15Dropped == 13]
      Link info:     54   2[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>  0x002128df102ac0a0     74    3[  ] "SUN IB QDR GW switch camgw01 10.0.55.193 Bridge 0" ( )
Errors for 0x2128df102ac040 "SUN IB QDR GW switch camgw01 10.0.55.193 Bridge 1"
  GUID 0x2128df102ac040 port 1: [VL15Dropped == 3]
      Link info:     63   1[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>  0x002128df102ac0a0     74    2[  ] "SUN IB QDR GW switch camgw01 10.0.55.193 Bridge 1" ( )
  GUID 0x2128df102ac040 port 2: [VL15Dropped == 14]
      Link info:     65   2[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>  0x002128df102ac0a0     74    1[  ] "SUN IB QDR GW switch camgw01 10.0.55.193 Bridge 1" ( )
Errors for 0x2128deac2ac000 "SUN IB QDR GW switch camgw02 10.0.55.194 Bridge 0"
  GUID 0x2128deac2ac000 port 1: [VL15Dropped == 6]
      Link info:     53   1[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>  0x002128deac2ac0a0     75    4[  ] "SUN IB QDR GW switch camgw02 10.0.55.194 Bridge 0" ( )
  GUID 0x2128deac2ac000 port 2: [VL15Dropped == 13]
      Link info:     55   2[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>  0x002128deac2ac0a0     75    3[  ] "SUN IB QDR GW switch camgw02 10.0.55.194 Bridge 0" ( )
Errors for 0x2128deac2ac040 "SUN IB QDR GW switch camgw02 10.0.55.194 Bridge 1"
  GUID 0x2128deac2ac040 port 1: [VL15Dropped == 5]
      Link info:     64   1[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>  0x002128deac2ac0a0     75    2[  ] "SUN IB QDR GW switch camgw02 10.0.55.194 Bridge 1" ( )
  GUID 0x2128deac2ac040 port 2: [VL15Dropped == 26]
      Link info:     66   2[  ] ==( 4X 10.0 Gbps Active/  LinkUp)==>  0x002128deac2ac0a0     75    1[  ] "SUN IB QDR GW switch camgw02 10.0.55.194 Bridge 1" ( )

VL15 stands for "Virtual Lane" (VL) 15

 

 

Cause

VL15 is used for SM traffic, but also by tools like ibdiagnet, and get the highest priority, but there is no flowcontrol, i.e. if there is no available buffer capacity in the receiver side, the packet will be dropped (and the VL15Dropped counter will increment).

Solution

You can ignore/suppress these messages.

On any one IB switch, you can run the following commands to clear the cumulative IB counters and errors:

ibsw# date ; ibclearcounters ; ibclearerrors

On any one IB switch, you can also run the following command to monitor IB counters and errors while suppressing VL15Dropped and others.

ibsw# ibqueryerrors.pl --help
ibsw# ibqueryerrors.pl -rR -s LinkDowned,RcvSwRelayErrors,VL15Dropped,XmtDiscards,XmtWait

On any one IB node (e.g. compute node or database node or storage cell), you can also run the following command to monitor IB counters and errors while suppressing VL15Dropped and others.

node# ibqueryerrors --help
node# ibqueryerrors -r -s LinkDowned,RcvSwRelayErrors,VL15Dropped,XmtDiscards,XmtWait

Please note ibqueryerrors.pl on IB switches is slightly different to ibqueryerrors on IB nodes.

 

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback