![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Problem Resolution Sure Solution 1473622.1 : Exachk Reports InfiniBand Errors
Exacheck reports infiniband errors in a setup with 1 Exadata X2-2 FullRack connected to 1 Exalogic HalfRack. Created from <SR 3-5863579341> Applies to:Exadata Database Machine X2-2 Full Rack - Version All Versions to All Versions [Release All Releases]Information in this document applies to any platform. SymptomsExacheck reports infiniband errors in a setup with 1 Exadata X2-2 FullRack connected to 1 Exalogic HalfRack: Errors for 0x2128df102ac000 "SUN IB QDR GW switch camgw01 10.0.55.193 Bridge 0"
GUID 0x2128df102ac000 port 1: [VL15Dropped == 5] Link info: 52 1[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 0x002128df102ac0a0 74 4[ ] "SUN IB QDR GW switch camgw01 10.0.55.193 Bridge 0" ( ) GUID 0x2128df102ac000 port 2: [VL15Dropped == 13] Link info: 54 2[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 0x002128df102ac0a0 74 3[ ] "SUN IB QDR GW switch camgw01 10.0.55.193 Bridge 0" ( ) Errors for 0x2128df102ac040 "SUN IB QDR GW switch camgw01 10.0.55.193 Bridge 1" GUID 0x2128df102ac040 port 1: [VL15Dropped == 3] Link info: 63 1[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 0x002128df102ac0a0 74 2[ ] "SUN IB QDR GW switch camgw01 10.0.55.193 Bridge 1" ( ) GUID 0x2128df102ac040 port 2: [VL15Dropped == 14] Link info: 65 2[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 0x002128df102ac0a0 74 1[ ] "SUN IB QDR GW switch camgw01 10.0.55.193 Bridge 1" ( ) Errors for 0x2128deac2ac000 "SUN IB QDR GW switch camgw02 10.0.55.194 Bridge 0" GUID 0x2128deac2ac000 port 1: [VL15Dropped == 6] Link info: 53 1[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 0x002128deac2ac0a0 75 4[ ] "SUN IB QDR GW switch camgw02 10.0.55.194 Bridge 0" ( ) GUID 0x2128deac2ac000 port 2: [VL15Dropped == 13] Link info: 55 2[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 0x002128deac2ac0a0 75 3[ ] "SUN IB QDR GW switch camgw02 10.0.55.194 Bridge 0" ( ) Errors for 0x2128deac2ac040 "SUN IB QDR GW switch camgw02 10.0.55.194 Bridge 1" GUID 0x2128deac2ac040 port 1: [VL15Dropped == 5] Link info: 64 1[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 0x002128deac2ac0a0 75 2[ ] "SUN IB QDR GW switch camgw02 10.0.55.194 Bridge 1" ( ) GUID 0x2128deac2ac040 port 2: [VL15Dropped == 26] Link info: 66 2[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 0x002128deac2ac0a0 75 1[ ] "SUN IB QDR GW switch camgw02 10.0.55.194 Bridge 1" ( ) VL15 stands for "Virtual Lane" (VL) 15
CauseVL15 is used for SM traffic, but also by tools like ibdiagnet, and get the highest priority, but there is no flowcontrol, i.e. if there is no available buffer capacity in the receiver side, the packet will be dropped (and the VL15Dropped counter will increment). SolutionYou can ignore/suppress these messages. On any one IB switch, you can run the following commands to clear the cumulative IB counters and errors: ibsw# date ; ibclearcounters ; ibclearerrors
On any one IB switch, you can also run the following command to monitor IB counters and errors while suppressing VL15Dropped and others. ibsw# ibqueryerrors.pl --help
ibsw# ibqueryerrors.pl -rR -s LinkDowned,RcvSwRelayErrors,VL15Dropped,XmtDiscards,XmtWait On any one IB node (e.g. compute node or database node or storage cell), you can also run the following command to monitor IB counters and errors while suppressing VL15Dropped and others. node# ibqueryerrors --help
node# ibqueryerrors -r -s LinkDowned,RcvSwRelayErrors,VL15Dropped,XmtDiscards,XmtWait Please note ibqueryerrors.pl on IB switches is slightly different to ibqueryerrors on IB nodes.
Attachments This solution has no attachment |
||||||||||||
|