On a BDA X6-2 Server a Slave Bond Link Status Repeatedly Fails with "link status definitely down for interface"/"link status definitely up for interface"

Asset ID:	1-72-2251331.1
Update Date:	2017-04-14
Keywords:

Solution Type Problem Resolution Sure

Solution 2251331.1 : On a BDA X6-2 Server a Slave Bond Link Status Repeatedly Fails with "link status definitely down for interface"/"link status definitely up for interface"

Applies to:

Big Data Appliance X6-2 Hardware - Version All Versions and later
Linux x86-64

Symptoms

1. One BDA X6-2 server in a cluster shows a slave bond link repeatedly failing. Output looks like:

<timestamp> bdanodex kernel: bonding: bondeth1: link status definitely down for interface eth11, disabling it
<timestamp> bdanodex kernel: bonding: bondeth1: making interface eth10 the new active one.
<timestamp> bdanodex kernel: bonding: bondeth1: link status definitely up for interface eth11.
<timestamp> bdanodex kernel: bonding: bondeth1: making interface eth11 the new active one.
...<repeated over and over>...

2. Comparing the bondeth1 configuration on the node with failing link status with a "healthy" node shows that the BONDING_OPTS options are different.

a) On the node with the failing link status /etc/sysconfig/network-scripts/ifcfg-bondeth1 shows BONDING_OPTS to be:

BONDING_OPTS="mode=active-backup fail_over_mac=active arp_interval=100 arp_ip_target=<ip> primary=eth11"

b) On a "healthy" node /etc/sysconfig/network-scripts/ifcfg-bondeth1 shows BONDING_OPTS to be:

BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000 primary=eth11"

Cause

1. There are 2 ways to detect that a slave bond is down in order to switch to the other slave.

a) arp ping monitoring: In this case Linux arp pings the gateway to tell that the slave bond is still up.

In this case the /etc/sysconfig/network-scripts/ifcfg-bondeth1 BONDING_OPTS paramater is:

BONDING_OPTS="mode=active-backup fail_over_mac=active arp_interval=100 arp_ip_target=<ip> primary=eth11"

b) mii status detection: This just detects the state of the local bond so it is normally a less accurate test.
In this case the /etc/sysconfig/network-scripts/ifcfg-bondeth1 BONDING_OPTS paramater is:

BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000 primary=eth11"

In recent BDA versions arp detection is used by default unless a gateway is not found or mii status detection is explicitly selected. In older BDA versions mii status detection is used.

2. Generally it is not expected that within a BDA cluster that the bond interface on some nodes would be using mii status detection and the bond interface on others would be using arp ping detection. However the symptoms indicate that arp ping monitoring may not be working.

One possibility is that on the node where the slave bond link repeatedly fails that a network reset script such as BdaUserConfigEoib was run with the more recent BDA code. The result is that all the other nodes on the cluster have the originally created interface, but the one node where the slave bond link repeatedly fails has a more recently created interface with the new arp ping monitoring.

Solution

On the node where the slave bond link repeatedly fails, try changing the /etc/sysconfig/network-scripts/ifcfg-bondeth1 BONDING_OPTS parameter to use mii status detection

Do so as follows as 'root' user on the node where the slave bond link repeatedly fails:

1. Connect to the node via the admin interface.

# ssh <node>-adm

2. Stop the bond interface:

# ifdown <bond interface>

For example:

# ifdown bondeth1

3. Edit the associated configuration file, /etc/sysconfig/network-scripts/ifcfg-bondeth1.

a) Backup the current file /etc/sysconfig/network-scripts/ifcfg-bondeth1 to some safe location.

b) Edit the file /etc/sysconfig/network-scripts/ifcfg-bondeth1:

Change:

BONDING_OPTS="mode=active-backup fail_over_mac=active arp_interval=100 arp_ip_target=<ip> primary=eth11"

To:

BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000 primary=eth11"

4. Bring the bond interface back up.

# ifup <bond interface>

For example:

# ifup bondeth1

Attachments

This solution has no attachment