Oracle ZFS Storage Appliance: IPMP Load Balancing does not return after a Link Failure

Asset ID:	1-72-2014149.1
Update Date:	2015-06-24
Keywords:

Solution Type Problem Resolution Sure

Solution 2014149.1 : Oracle ZFS Storage Appliance: IPMP Load Balancing does not return after a Link Failure

Applies to:

Sun Storage 7210 Unified Storage System - Version All Versions and later
Sun Storage 7310 Unified Storage System - Version All Versions and later
Sun Storage 7410 Unified Storage System - Version All Versions and later
Oracle ZFS Storage ZS3-4 - Version All Versions and later
Sun Storage 7110 Unified Storage System - Version All Versions and later
7000 Appliance OS (Fishworks)

Symptoms

An active/active IPMP interface is created from two 10GB network interface cards. Shares are created and mounted over nfs to the IPMP device. The interface is enabled and reports as "up".

NAS:> configuration net interfaces> select ipmp1 show
                         state = up
                      curaddrs = 10.152.224.105/24
                         class = ipmp
                         label = ipmp1
                        enable = true
                         admin = true
                         links = ixgbe0,ixgbe1
                       v4addrs = 10.152.224.105/24

Heavy read/write activity is processed by the interface. By definition, IPMP only distributes sends (or read operations) across interfaces.

An analytics review of TCP bytes broken down by interface does indeed show that read operations are spread across the two 10GB devices.

Interface bytes per second broken down by interface
NAS:> analytics dataset-026> read 3
    DATE/TIME                KB/SEC     KB/SEC BREAKDOWN
    2015-4-10 19:36:18       254067     127019 ipmp1
                                    80612 ixgbe1
                                    46407 ixgbe0
    2015-4-10 19:36:19       309636     154804 ipmp1
                                  104886 ixgbe1
                                      49917 ixgbe0
    2015-4-10 19:36:20       309410     154698 ipmp1
                                      105034 ixgbe1
                                        49650 ixgbe0

The IPMP device has been configured for "Link Based Failure Detection". A fault is incurred by ixgbe1 and a failover takes place. All I/O is routed to the surviving interface.

From the log files.

Apr 10 19:43:06 NAS in.mpathd[5449]: [ID x daemon.err] The link has gone down on ixgbe1
Apr 10 19:43:06 NAS in.mpathd[5449]: [ID x daemon.err] IP interface failure detected on ixgbe1 of group ipmp1

From the CLI on the appliance:

NAS:> configuration net devices show
   Devices:
   DEVICE      UP     SPEED         MAC
   ixgbe0      true   10000 Mbit/s  0:1b:21:7e:c2:90
   ixgbe1      false  0 Mbit/s      0:1b:21:7e:c2:91

Analytics also shows removal of the device:

Interface bytes per second broken down by interface
NAS:> analytics dataset-026> read 3
    DATE/TIME                KB/SEC     KB/SEC BREAKDOWN
    2015-4-10 19:43:27       147555      73764 ixgbe0
                                       73764 ipmp1
    2015-4-10 19:43:28       140051      70012 ixgbe0
                                     70012 ipmp1
    2015-4-10 19:43:29       115513      57743 ixgbe0
                                       57743 ipmp1

The problem occurs once the device is repaired and brought back online. The interface (ixgbe1) comes online, but is never used for subsequent I/O.

From the logs we see the link come back online:

Apr 10 19:45:53 NAS in.mpathd[5449]: [ID x daemon.error] The link has come up on ixgbe1 Apr 10 19:48:43 NAS in.mpathd[5449]: [ID x daemon.error] All IP interfaces in group ipmp1 are now usable

The CLI also reports the device is up:

NAS:> configuration net devices show Devices: DEVICE UP SPEED MAC ixgbe0 true 10000 Mbit/s 0:1b:21:7e:c2:90 ixgbe1 true 10000 Mbit/s 0:1b:21:7e:c2:91

But from analytics (this may also be viewed from an ethernet switch), we see the device performs literally no I/O:

Interface bytes per second broken down by interface NAS:> analytics dataset-026> read 3 DATE/TIME KB/SEC KB/SEC BREAKDOWN 2015-4-10 19:47:17 123911 61944 ixgbe0 61944 ipmp1 1 ixgbe1 2015-4-10 19:47:18 137855 68916 ipmp1 68915 ixgbe0 1 ixgbe1 2015-4-10 19:47:19 136876 68427 ipmp1 68426 ixgbe0 1 ixgbe1

Cause

IPMP, by design, is not intended as a "load spreading" utility. There is no "round robin" setting one can implement to equally distribute I/O. If the right set of circumstances are met, this condition will occur.

Solution

The easiest way to return I/O to all devices is to toggle the active/active IPMP device to active/standby and then back to active/active.

Toggle the working device from active to standby.

Following the example above, this would be ixgbe0.

NAS:> configuration net interfaces select ipmp1 NAS:configuration net interfaces ipmp1> set standbys=ixgbe0 standbys = ixgbe0 (uncommitted) NAS:>configuration net interfaces ipmp1> commit Give it a few minutes... NAS:> configuration net interfaces ipmp1> set standbys= standbys = (uncommitted) NAS:> configuration net interfaces ipmp1> commit

Here are some more observations surrounding this issue:

Rebooting the node (followed by a failback on clustered systems) will solve problem.
If you unmount and remount an affected nfs share, this share will start to use both IPMP devices.
If you use automounts instead if hard mounts, I/O will evntually return to both devices.
If you have just ONE nfs mount to an IPMP device, you will probably never see any load balancing at all.
Any new nfs mounts added after the interface is repaired will use both IPMP devices.

Attachments

This solution has no attachment