Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2295141.1
Update Date:2017-09-29
Keywords:

Solution Type  Problem Resolution Sure

Solution  2295141.1 :   Grid Infrastructure May Reboot Nodes If Information Must Be Printed To The Serial Console  


Related Items
  • Oracle Database Appliance
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Database Appliance>DB: ODA_EST
  •  


Node got rebooted by CSS because the CPU stall.

In this Document
Symptoms
Changes
Cause
Solution


Applies to:

Oracle Database Appliance - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

 The system reboot suddenly.  Sometime from lastgasp we can find it is css or cssmoniter rebooted the node because lost interconnection:

Network communication with node hpnplppmdb12 (2) missing for 90% of timeout interval. Removal of this node from cluster in 2.320 seconds

Cluster Synchronization Service daemon (CSSD) clssnmvKillBlockThread_0 not scheduled for 21140 msecs.

In kdump-dmessage or os message file file we can find:

kernel: INFO: rcu_sched_state detected stalls on CPUs/tasks: { 30} (detected by 14, t=60002)

The call stack has:

uart_console_write  serial8250_console_write  vprintk

PID: 0 TASK: ffff883f05660580 CPU: 23 COMMAND: "kworker/0:1"
#0 [ffff88407f2e6e70] crash_nmi_callback at ffffffff810326a6
[exception RIP: memcpy+5]
--- <NMI exception stack> ---
#7 [ffff88407f2e36b8] memcpy at ffffffff81262a55
#8 [ffff88407f2e36b8] vgacon_scroll at ffffffff812abd41
#9 [ffff88407f2e3708] scrup at ffffffff81323cf3
#11 [ffff88407f2e3768] vt_console_print at ffffffff81325c3b
#12 [ffff88407f2e37c8] __call_console_drivers at ffffffff8106e5f7
#13 [ffff88407f2e37f8] _call_console_drivers at ffffffff8106e65a
#14 [ffff88407f2e3818] console_unlock at ffffffff8106ecaf
#15 [ffff88407f2e3878] vprintk at ffffffff8106f38b
#16 [ffff88407f2e3928] printk at ffffffff8150bf5b

In OSW -- general CPU usage is low but some time we can find system CPU spike.

Changes

 sysctl -a|grep printk
kernel.printk = 3 4 1 7

and iptables enabled.

Cause

 This is because the customer set up the wrong  trace for iptables but it can be some other application will need write large message onto console which will cause linux cannot assign CPU to cssd in time and cause the reboot.

Solution

 Set: dmesg -n 1  -- temporarily or set:

Or change kernel.printk to lower level.


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback