Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2025628.1
Update Date:2017-04-04
Keywords:

Solution Type  Problem Resolution Sure

Solution  2025628.1 :   BDA Nodes Inaccessible Via the Client Data Network Due to the EOIB Interface not Re-linking and Joing BONDETH0 after the IB Link is Down then Up  


Related Items
  • Big Data Appliance X3-2 Hardware
  •  
  • Big Data Appliance X4-2 Hardware
  •  
  • Big Data Appliance X5-2 Hardware
  •  
  • Big Data Appliance Hardware
  •  
Related Categories
  • PLA-Support>Eng Systems>BDA>Big Data Appliance>DB: BDA_EST
  •  




In this Document
Symptoms
Cause
Solution
 For BDA V4.2
 For BDA V4.1 and earlier releases
References


Created from <SR 3-10020854511>

Applies to:

Big Data Appliance X3-2 Hardware - Version All Versions and later
Big Data Appliance X5-2 Hardware - Version All Versions and later
Big Data Appliance Hardware - Version All Versions and later
Big Data Appliance X4-2 Hardware - Version All Versions and later
Linux x86-64

Symptoms


The EoIB VNIC interfaces (eth8 & eth9) do not automatically re-link and re-join the bondeth0 client network interface when the InfiniBand (IB) link comes up, after an outage of that IB VNIC.  The IB interfaces and bondib0 do come back up.  This can result in an outage if an IB port or gateway leaf switch goes down as the bondeth0 is not fully intact.  Therefore if the second gateway leaf switch goes down at a later time, the bondeth0 will be down and the BDA nodes inaccessible via the client data network.

Cause

The cause is due to the following:

BUG 20488920 - EOIB INTERFACE DOES NOT RE-LINK AND JOIN BONDETH0 AFTER IB LINK DOWN THEN UP
BUG 18906188 - E2E: JULYPSU DROP 4 - OEL6.5 PHYSICAL, EOIB NOT AVAILABLE AFTER REBOOT

Solution

The plans are to fix this in a future release.

The workarounds for current releases follow:

For BDA V4.2

For BDA version V4.2, an automatic workaround is implemented.  The automatic workaround is implemented via polling the interface status, and automatically attempting to bring it up if it is down.

Note: The polling period to detect  link down and failover, and link up and re-joining the bond is 5 minutes.

Therefore if two switches go down for maintenance, to check that failover occurs correctly take the first switch offline, wait 5 minutes, check that redundancy is  re-established and then take the second one offline.

  

For BDA V4.1 and earlier releases

For BDA V4.1 and earlier releases a manual workaround is required. 

If a single IB port link goes down on a single node, after the IB link is restored, manually up the interface again and verify the bondeth0 has both interfaces in it so redundancy is restored before the next IB link outage occurs.

Typically for port 1 this is eth8, for port 2 this is eth9. So do: 

# ifup eth8
  
Or
# ifup eth9

References

<BUG:20488920> - EOIB INTERFACE DOES NOT RE-LINK AND JOIN BONDETH0 AFTER IB LINK DOWN THEN UP

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback