Snapshot-based Backup via NFS Over Infiniband Network Will Freeze The Node

Asset ID:	1-72-1632242.1
Update Date:	2016-02-12
Keywords:

Solution Type Problem Resolution Sure

Solution 1632242.1 : Snapshot-based Backup via NFS Over Infiniband Network Will Freeze The Node

Applies to:

Exadata Database Machine V2 - Version All Versions and later
Information in this document applies to any platform.

Symptoms

Backup server will go to a hung state when doing a Snapshot-based Backup from a Exadata Compute Node via NFS over Infiniband Network. The same backup procedure will work fine if its over normal 10Gb ethernet card with default setting.

The message file from the node will be showing the below stack information when running backup.

Aug 28 10:27:43 exahostdb01 lvm[79582]: Monitoring snapshot VGExaDb-u01_snap <<<
..
Aug 28 10:31:14 exahostdb01 kernel: nfs: server exaNFSbackup not responding, still trying
Aug 28 10:39:45 exahostdb01 kernel: nfs: server exaNFSbackup OK
Aug 28 10:55:00 exahostdb01 kernel: nfs: server exaNFSbackup not responding, still trying
...
Aug 28 11:01:10 exahostdb01 kernel: nfs: server exaNFSbackup OK
Aug 28 11:13:00 exahostdb01 kernel: nfs: server exaNFSbackup not responding, still trying
Aug 28 11:13:53 exahostdb01 kernel: nfs: server exaNFSbackup OK
Aug 28 12:41:20 exahostdb01 kernel: RDS/IB: re-connect to 169.XXX.XXX.XXX is stalling for more than 1 min...(drops=12 err=0)
Aug 28 12:41:20 exahostdb01 kernel: RDS/IB: re-connect to 169.XXX.XXX.XXX is stalling for more than 1 min...(drops=12 err=0)
Aug 28 12:41:58 exahostdb01 kernel: RDS/IB: re-connect to 10.XXX.XXX.XXX is stalling for more than 1 min...(drops=1 err=0)
Aug 28 14:13:49 exahostdb01 kernel: RDS/IB: connected to 10.XXX.XXX.XXX version 3.1
Aug 28 14:16:47 exahostdb01 kernel: RDS/IB: connected to 169.XXX.XXX.XXX version 3.1
Aug 28 14:16:47 exahostdb01 kernel: RDS/IB: connected to 169.XXX.XXX.XXX version 3.1
....
Sep 5 12:16:12 exahostdb01 kernel: INFO: task lsof:65691 blocked for more than 120 seconds.
Sep 5 12:16:12 exahostdb01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 5 12:16:12 exahostdb01 kernel: lsof D 0000000000000000 0 65691 65680 0x00000080
Sep 5 12:16:12 exahostdb01 kernel: ffff88116ec7bc08 0000000000000082 0000000000000000 ffffffffadf60c48
Sep 5 12:16:12 exahostdb01 kernel: ffff88355e6ea080 ffffffff81aae4c0 ffff88355e6ea450 0000000176b6fa52
Sep 5 12:16:12 exahostdb01 kernel: 000000006ec7bc98 0000000000000000 0000000000000000 ffff88355e6ea080
Sep 5 12:16:12 exahostdb01 kernel: Call Trace:
Sep 5 12:16:12 exahostdb01 kernel: [<ffffffff814569cc>] io_schedule+0x42/0x5c
Sep 5 12:16:12 exahostdb01 kernel: [<ffffffffa0614b02>] nfs_wait_bit_uninterruptible+0xe/0x12 [nfs]
Sep 5 12:16:12 exahostdb01 kernel: [<ffffffff81456efb>] __wait_on_bit+0x4a/0x7c
Sep 5 12:16:12 exahostdb01 kernel: [<ffffffffa0614af4>] ? nfs_wait_bit_uninterruptible+0x0/0x12 [nfs]
Sep 5 12:16:12 exahostdb01 kernel: [<ffffffffa0614af4>] ? nfs_wait_bit_uninterruptible+0x0/0x12 [nfs]
Sep 5 12:16:12 exahostdb01 kernel: [<ffffffff81456fa0>] out_of_line_wait_on_bit+0x73/0x80
Sep 5 12:16:12 exahostdb01 kernel: [<ffffffff8107706d>] ? wake_bit_function+0x0/0x2f
Sep 5 12:16:12 exahostdb01 kernel: [<ffffffffa0614af2>] nfs_wait_on_request+0x2b/0x2d [nfs]
Sep 5 12:16:12 exahostdb01 kernel: [<ffffffffa0618a6c>] nfs_sync_mapping_wait+0xec/0x1fa [nfs]
Sep 5 12:16:12 exahostdb01 kernel: [<ffffffffa0619073>] nfs_write_mapping+0x77/0x9e [nfs]
Sep 5 12:16:12 exahostdb01 kernel: [<ffffffff810432d6>] ? should_resched+0xe/0x2f
Sep 5 12:16:12 exahostdb01 kernel: [<ffffffffa06190b4>] nfs_wb_nocommit+0x1a/0x1c [nfs]
Sep 5 12:16:12 exahostdb01 kernel: [<ffffffffa060e184>] nfs_getattr+0x61/0xef [nfs]
Sep 5 12:16:12 exahostdb01 kernel: [<ffffffff8111ea7b>] vfs_getattr+0x4c/0x69
Sep 5 12:16:12 exahostdb01 kernel: [<ffffffff8111eae8>] vfs_fstatat+0x50/0x67
Sep 5 12:16:12 exahostdb01 kernel: [<ffffffff8111ebe5>] vfs_stat+0x1b/0x1d
Sep 5 12:16:12 exahostdb01 kernel: [<ffffffff8111ec06>] sys_newstat+0x1f/0x39
Sep 5 12:16:12 exahostdb01 kernel: [<ffffffff810a9d23>] ? audit_syscall_entry+0x103/0x12f
Sep 5 12:16:12 exahostdb01 kernel: [<ffffffff81011db2>] system_call_fastpath+0x16/0x1b
...
Sep 10 16:00:39 exahostdb01 kernel: ixgbe 0000:20:00.0: eth0: NIC Link is Up 1 Gbps, Flow Control: RX/TX
Sep 10 16:00:39 exahostdb01 kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Sep 10 16:00:41 exahostdb01 kernel: ib0: packet len 2398 (> 2048) too long to send, dropping
Sep 10 16:00:41 exahostdb01 last message repeated 2 times
Sep 10 16:11:34 exahostdb01 kernel: ixgbe 0000:30:00.1: eth5: NIC Link is Down <<<<

bondib0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00 (ib0 + ib1)
inet addr:10.x.x.x Bcast:10.x.x.255 Mask:255.255.255.0
inet6 addr: fe80::221:2800:1fc:b3ed/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:65520 Metric:1

Cause

The stack was showing that the nfs process was trying to get back the status of the task via "nfs_wait_bit_uninterruptible" function, where the process was in uninterruptable state , because the communication to the source location was not successful. The above logs are clearly pointing that the base IB devices that are part of bondib0 are get to downstate intermittently and then joining back. So its a communication issue from the client side as network communication to the source is down (bondib0).

This is because the MTU size set of the IB device was default and with 64K size. This will is a common issue with IB when the MTU size is with a larger value.

Solution

1) Reduce the MTU size of IB device to 7000 and restart the network service
2) Remount the shares exported from NFS
3) Take backup orver IB devices.

References

<NOTE:1546861.1> - [Linux OS] System Hung with Large Numbers of Page Allocation Failures with "order:5" on Exadata Environments
<NOTE:1586212.1> - How to Change MTU Size in Exadata Environment

Attachments

This solution has no attachment