![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 2163699.1 : Oracle VM: Hang With "mlx4_ib_tunnel_comp_worker at ffffffffa0435d50 [mlx4_ib]"
This note addresses bug 23538548 In this Document
Created from <SR 3-12971770131> Applies to:Oracle VM - Version 3.2.9 and laterExalogic Elastic Cloud X5-2 Hardware - Version X5 to X5 [Release X5] Information in this document applies to any platform. SymptomsThis issue happens on Oracle Server X5-2 with OVM 3.2.x installed. The impacted kernel is 2.6.39-400.277.1.el5uek. When the bug is triggered, a dom0 server gets hung. In the KDUMP-generated vmcore file, entries similar to these can be seen:
crash64> bt -t -a
PID: 12039 TASK: ffff880179ad6080 CPU: 0 COMMAND: "kworker/u:2" START: panic at ffffffff8106f5cb [ffff8801752b7940] panic at ffffffff8106f5cb [ffff8801752b7960] printk at ffffffff8107089c [ffff8801752b79f0] __atomic_notifier_call_chain at ffffffff8150de32 [ffff8801752b7a00] atomic_notifier_call_chain at ffffffff8150de56 [ffff8801752b7a10] notify_die at ffffffff8150de8e [ffff8801752b7a40] unknown_nmi_error at ffffffff8150b1af [ffff8801752b7a60] default_do_nmi at ffffffff8150b378 [ffff8801752b7a80] do_nmi at ffffffff8150b41e [ffff8801752b7ab0] nmi at ffffffff8150a810 [ffff8801752b7b10] xen_hypercall_sched_op at ffffffff810013aa [ffff8801752b7b38] xen_hypercall_sched_op at ffffffff810013aa [ffff8801752b7b78] xen_poll_irq_timeout at ffffffff812f9a10 [ffff8801752b7bb8] xen_poll_irq at ffffffff812f9a30 [ffff8801752b7bc8] xen_spin_lock_slow at ffffffff81012a39 [ffff8801752b7c18] xen_spin_lock at ffffffff81012b0a [ffff8801752b7c48] _raw_spin_lock at ffffffff81509d5e [ffff8801752b7c58] schedule_delayed at ffffffffa0445def [mlx4_ib] [ffff8801752b7c98] mlx4_ib_multiplex_cm_handler at ffffffffa04467cd [mlx4_ib] [ffff8801752b7ce8] mlx4_ib_multiplex_mad at ffffffffa04356e4 [mlx4_ib] [ffff8801752b7cf8] get_sw_cqe at ffffffffa04316f6 [mlx4_ib] [ffff8801752b7d28] mlx4_ib_poll_one at ffffffffa043211d [mlx4_ib] [ffff8801752b7d70] _raw_spin_unlock_irqrestore at ffffffff81509dfe [ffff8801752b7d88] mlx4_ib_poll_cq at ffffffffa04329be [mlx4_ib] [ffff8801752b7de8] mlx4_ib_tunnel_comp_worker at ffffffffa0435ddb [mlx4_ib] [ffff8801752b7e28] wake_up_process at ffffffff81068e77 [ffff8801752b7e58] process_one_work at ffffffff8108c5e9 [ffff8801752b7e68] mlx4_ib_tunnel_comp_worker at ffffffffa0435d50 [mlx4_ib] [ffff8801752b7ea8] worker_thread at ffffffff8108cf2a [ffff8801752b7ed0] worker_thread at ffffffff8108ce60 [ffff8801752b7ee8] kthread at ffffffff81091507 [ffff8801752b7f48] kernel_thread_helper at ffffffff81513644 [ffff8801752b7f78] int_ret_from_sys_call at ffffffff81512743 [ffff8801752b7f80] retint_restore_args at ffffffff8150a2e1 [ffff8801752b7fd8] kernel_thread_helper at ffffffff81513640 . PID: 346515 TASK: ffff88010ff04480 CPU: 1 COMMAND: "kworker/u:4" START: __schedule at ffffffff81507882 . [ffff8801a3a1dde8] mlx4_ib_tunnel_comp_worker at ffffffffa0435ddb [mlx4_ib] [ffff8801a3a1de58] process_one_work at ffffffff8108c5e9 [ffff8801a3a1de68] mlx4_ib_tunnel_comp_worker at ffffffffa0435d50 [mlx4_ib] [ffff8801a3a1dea8] worker_thread at ffffffff8108cf2a [ffff8801a3a1ded0] worker_thread at ffffffff8108ce60 [ffff8801a3a1dee8] kthread at ffffffff81091507 [ffff8801a3a1df48] kernel_thread_helper at ffffffff81513644 [ffff8801a3a1df78] int_ret_from_sys_call at ffffffff81512743 [ffff8801a3a1df80] retint_restore_args at ffffffff8150a2e1 [ffff8801a3a1dfd8] kernel_thread_helper at ffffffff81513640 ...
The call stack may also appear in this way: crash64> bt -t -a PID: 301458 TASK: ffff880029534340 CPU: 1 COMMAND: "kworker/u:2"
ChangesNone CauseThis is a known bug. SolutionThis bug has been fixed in UEK2 kernel 2.6.39-400.293.1 and onwards, UEK3 and UEK4 kernels do not have this issue.
References<BUG:23538548> - DOM0 HANG: ABBA DEAD LOCK IN MLX4 DRIVERAttachments This solution has no attachment |
||||||||||||||||||||
|