Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1989373.1
Update Date:2018-05-14
Keywords:

Solution Type  Problem Resolution Sure

Solution  1989373.1 :   Troubleshooting Infiniband links when a node reports a link down  


Related Items
  • Exadata X4-2 Hardware
  •  
  • Exalogic Elastic Cloud X4-2 Hardware
  •  
  • Exadata X4-2 Quarter Rack
  •  
  • Exadata Database Machine X2-2 Hardware
  •  
  • Exadata X3-2 Hardware
  •  
Related Categories
  • PLA-Support>Sun Systems>SAND>Network>SN-SND: Sun Network Infiniband
  •  




In this Document
Symptoms
Cause
Solution


Created from <SR 3-10409275186>

Applies to:

Exadata Database Machine X2-2 Hardware - Version All Versions to All Versions [Release All Releases]
Exalogic Elastic Cloud X4-2 Hardware - Version X4 to X5 [Release X4 to X5]
Exadata X4-2 Hardware - Version All Versions to All Versions [Release All Releases]
Exadata X3-2 Hardware - Version All Versions to All Versions [Release All Releases]
Exadata X4-2 Quarter Rack - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

Host side shows that port 1 on IB HCA is DOWN with Physical State Polling

[root@edx28bur09db02 ~]# ibstat
CA 'mlx4_0'
        CA type: MT26428
        Number of ports: 2
        Firmware version: 2.7.8130
        Hardware version: b0
        Node GUID: 0x0021280001a10790
        System image GUID: 0x0021280001a10793
        Port 1:
                State: Down<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< The Link is Down
                Physical state: Polling<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Polling state. waiting for SM
                Rate: 70
                Base lid: 142
                LMC: 0
                SM lid: 8
                Capability mask: 0x02510868
                Port GUID: 0x0021280001a10791
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 143
                LMC: 0
                SM lid: 8
                Capability mask: 0x02510868
                Port GUID: 0x0021280001a10792 

  On the infiniband switch side we see the following from listlinkup
 
[root@edx28bur09sw-ib3 ibdiag]# listlinkup
Connector  0A Present <-> Switch Port 20 is up (Enabled)
Connector  1A Present <-> Switch Port 22 is up (Enabled)
Connector  2A Present <-> Switch Port 24 is up (Enabled)
Connector  3A Present <-> Switch Port 26 is down (Enabled)
Connector  4A Present <-> Switch Port 28 is up (Enabled)
Connector  5A Present <-> Switch Port 30 is down (Enabled)<<<cable plugged in to switchport 30. Physical connector labled 5A
Connector  6A Present <-> Switch Port 35 is down (Enabled)
Connector  7A Present <-> Switch Port 33 is down (Enabled)

Cause

 Poorly seated cable on the infiniband switch port

Solution

Re-seat both ends of the cable between Connector 5A of edx28bur09sw-ib3 and port 1 of edx28bur09db02.  Then verify that the port state is now Active and Physical state up using ibstat command on the node.
If the state of the port is still not Active, replace that cable.

root@edx28bur09db02 ~]# ibstat
CA 'mlx4_0'
        CA type: MT26428
        Number of ports: 2
        Firmware version: 2.7.8130
        Hardware version: b0
        Node GUID: 0x0021280001a10790
        System image GUID: 0x0021280001a10793
        Port 1:
                State: Active<<<<<<<<<<<<<<<<<<<<<Link Active
                Physical state: LinkUp<<<<<<<<<<<<<<Link up
                Rate: 40
                Base lid: 142
                LMC: 0
                SM lid: 8
                Capability mask: 0x02510868
                Port GUID: 0x0021280001a10791
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 143
                LMC: 0
                SM lid: 8
                Capability mask: 0x02510868
                Port GUID: 0x0021280001a10792


Check the link using the following command from the IB switch that is attached:

 # listlinkup peer

Will provide the switch port number, connector number and node port GUID to map the connection from a single output.


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback