Grid Infrastructure May Reboot Nodes If Information Must Be Printed To The Serial Console

Asset ID:	1-72-2295141.1
Update Date:	2017-09-29
Keywords:

Solution Type Problem Resolution Sure

Solution 2295141.1 : Grid Infrastructure May Reboot Nodes If Information Must Be Printed To The Serial Console

Applies to:

Oracle Database Appliance - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

The system reboot suddenly. Sometime from lastgasp we can find it is css or cssmoniter rebooted the node because lost interconnection:

Network communication with node hpnplppmdb12 (2) missing for 90% of timeout interval. Removal of this node from cluster in 2.320 seconds

Cluster Synchronization Service daemon (CSSD) clssnmvKillBlockThread_0 not scheduled for 21140 msecs.

In kdump-dmessage or os message file file we can find:

kernel: INFO: rcu_sched_state detected stalls on CPUs/tasks: { 30} (detected by 14, t=60002)

The call stack has:

uart_console_write serial8250_console_write vprintk

PID: 0 TASK: ffff883f05660580 CPU: 23 COMMAND: "kworker/0:1"
#0 [ffff88407f2e6e70] crash_nmi_callback at ffffffff810326a6
[exception RIP: memcpy+5]
--- <NMI exception stack> ---
#7 [ffff88407f2e36b8] memcpy at ffffffff81262a55
#8 [ffff88407f2e36b8] vgacon_scroll at ffffffff812abd41
#9 [ffff88407f2e3708] scrup at ffffffff81323cf3
#11 [ffff88407f2e3768] vt_console_print at ffffffff81325c3b
#12 [ffff88407f2e37c8] __call_console_drivers at ffffffff8106e5f7
#13 [ffff88407f2e37f8] _call_console_drivers at ffffffff8106e65a
#14 [ffff88407f2e3818] console_unlock at ffffffff8106ecaf
#15 [ffff88407f2e3878] vprintk at ffffffff8106f38b
#16 [ffff88407f2e3928] printk at ffffffff8150bf5b

In OSW -- general CPU usage is low but some time we can find system CPU spike.

Changes

sysctl -a|grep printk
kernel.printk = 3 4 1 7

and iptables enabled.

Cause

This is because the customer set up the wrong trace for iptables but it can be some other application will need write large message onto console which will cause linux cannot assign CPU to cssd in time and cause the reboot.

Solution

Set: dmesg -n 1 -- temporarily or set:

Or change kernel.printk to lower level.

Attachments

This solution has no attachment