Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1904612.1
Update Date:2014-07-21
Keywords:

Solution Type  Problem Resolution Sure

Solution  1904612.1 :   Exadata: CPU Stalls Causing Node to Panic and iLOM can become Non-Responsive  


Related Items
  • Oracle Exadata Storage Server Software
  •  
  • Exadata Database Machine X2-2 Hardware
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Exadata>DB: Exadata_EST
  •  




Applies to:

Oracle Exadata Storage Server Software - Version 11.2.3.1.1 to 12.1.1.1.0 [Release 11.2 to 12.1]
Exadata Database Machine X2-2 Hardware - Version All Versions and later
Information in this document applies to any platform.
Cell Machine X4270 M2 becomes non-responsive, eventually resulting in a node panic and we also loose access to iLOM.
Resulting in me having to disconnect/reconnect power cables to restore stability.

Symptoms

Two Symptoms experienced as a result of this hardware problem.

1). ILOM becomes non-responsive resulting in an engineer having to restore stability by disconnect/connect power cables again.
2). CPU Node panics due to detected stalls on a particular CPU. These may or may not occur with any regularity.

Console Dump -->
~~~~~~~~~~~~~~~~~
May  7 04:05:09 exacelmel01 kernel: Call Trace:
May  7 04:05:09 exacelmel01 kernel:  [<ffffffff81014ac6>] cpu_idle+0xc6/0xf0
May  7 04:05:09 exacelmel01 kernel:  [<ffffffff814fca20>] start_secondary+0xf0/0x100
May  7 04:07:20 exacelmel01 kernel: INFO: rcu_sched_state detected stalls on CPUs/tasks: { 7} (detected by 14, t=1860326 jiffies)
May  7 04:07:20 exacelmel01 kernel: sending NMI to all CPUs:
May  7 04:07:20 exacelmel01 kernel: NMI backtrace for cpu 0
May  7 04:07:20 exacelmel01 kernel: CPU 0
..
..
May  7 04:07:21 exacelmel01 kernel: Call Trace:
May  7 04:07:21 exacelmel01 kernel:  <IRQ>  [<ffffffff8109c6f8>] ktime_get+0x68/0xf0
May  7 04:07:21 exacelmel01 kernel:  [<ffffffff810a2fc0>] ? tick_clock_notify+0x60/0x60
May  7 04:07:21 exacelmel01 kernel:  [<ffffffff810a2fea>] tick_sched_timer+0x2a/0xd0
May  7 04:07:21 exacelmel01 kernel:  [<ffffffff810a2fc0>] ? tick_clock_notify+0x60/0x60
May  7 04:07:21 exacelmel01 kernel:  [<ffffffff81095ab3>] __run_hrtimer+0x83/0x1e0
May  7 04:07:21 exacelmel01 kernel:  [<ffffffff81095dc6>] hrtimer_interrupt+0xe6/0x240
May  7 04:07:21 exacelmel01 kernel:  [<ffffffff81033e6b>] local_apic_timer_interrupt+0x3b/0x70
May  7 04:07:21 exacelmel01 kernel:  [<ffffffff815108a5>] smp_apic_timer_interrupt+0x45/0x5a
May  7 04:07:21 exacelmel01 kernel:  [<ffffffff8150f733>] apic_timer_interrupt+0x13/0x20
May  7 04:07:21 exacelmel01 kernel:  <EOI>  [<ffffffff81098351>] ? sched_clock_idle_sleep_event+0x11/0x20
May  7 04:07:21 exacelmel01 kernel:  [<ffffffff8101de79>] ? mwait_idle+0x99/0x1c0
May  7 04:07:21 exacelmel01 kernel:  [<ffffffff81014ac6>] cpu_idle+0xc6/0xf0
May  7 04:07:21 exacelmel01 kernel:  [<ffffffff814fca20>] start_secondary+0xf0/0x100
May  7 04:08:09 exacelmel01 kernel: INFO: rcu_bh_state detected stalls on CPUs/tasks: { 7} (detected by 18, t=1860326 jiffies)
May  7 04:08:09 exacelmel01 kernel: sending NMI to all CPUs:
May  7 04:08:09 exacelmel01 kernel: NMI backtrace for cpu 0
May  7 04:08:09 exacelmel01 kernel: CPU 0

 

Note - System panic's can occur for any reason, including application software, OS drivers, and hardware.  The above signature is particular to this symptom of a bad CPU, but in general do not assume a panic handled by CPU0 is a CPU fault.

 

Cause

CPU 0 Stalling detected by the kernel and causing a hard hang including ILOM non-response may indicate a faulty CPU. 

Solution

This signature has been identified as likely to be a hardware problem. An ILOM snapshot will assist to confirm the problem.

In this case CPU#0 was replaced to resolve the problem.

References

<BUG:18480030> - SYSTEM HUNG & PANICED AFTER SEVERAL ""ILOM HAS STOPPED RESPONDING" MESSAGES

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback