Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1642258.1
Update Date:2014-12-22
Keywords:

Solution Type  Problem Resolution Sure

Solution  1642258.1 :   ODA: Node Inaccessible Due to OS "page allocation failure: order:1"  


Related Items
  • Oracle Database Appliance
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Database Appliance>DB: ODA_EST
  •  




Created from <SR 3-8678439001>

Applies to:

Oracle Database Appliance - Version All Versions and later
Information in this document applies to any platform.

Symptoms

ODA machine went offline and required a manual reboot (direct access to the box) as there was no ILOM console access to it.

Once restarted, the ILOM console history was captured and reported a variety of error stacks indicating:  "page allocation failure: order:1"

Locations that you can for diagnostic information include:

-----------------------------
> OS Messages
> OSW Meminfo
> ASM alert.log
> OSW Top

Example 1

swapper: page allocation failure. order:2, mode:0x20           <<< page allocation failure.
Pid: 0, comm: swapper Tainted: P           2.6.32-300.32.5.el5uek #1
Call Trace:
<IRQ>  [<ffffffff810ddd8b>] __alloc_pages_nodemask+0x524/0x595
[<ffffffff8110d6ef>] kmem_getpages+0x4f/0xf4
[<ffffffff8110d8ec>] fallback_alloc+0x158/0x1ce
[<ffffffff8110da83>] ____cache_alloc_node+0x121/0x134
[<ffffffff8110e0a3>] kmem_cache_alloc_node_notrace+0x84/0xb9
[<ffffffff8110e11e>] __kmalloc_node+0x46/0x73
[<ffffffff813b9518>] ? __alloc_skb+0x72/0x13d
[<ffffffff813b9518>] __alloc_skb+0x72/0x13d
[<ffffffffa0157d93>] ixgbe_alloc_rx_buffers+0x93/0x204 [ixgbe]
[<ffffffffa015ac08>] ixgbe_poll+0xeea/0x1071 [ixgbe]
[<ffffffffa0157df6>] ? ixgbe_alloc_rx_buffers+0xf6/0x204 [ixgbe]
[<ffffffff8123786c>] ? rb_insert_color+0x68/0xe3
[<ffffffff813c45d9>] net_rx_action+0xc6/0x1cd
[<ffffffff8105e8c5>] __do_softirq+0xd7/0x19e
[<ffffffff810aee94>] ? handle_IRQ_event+0x10a/0x120
[<ffffffff81012eec>] call_softirq+0x1c/0x30
[<ffffffff81014695>] do_softirq+0x46/0x89
[<ffffffff8105e74a>] irq_exit+0x3b/0x7a
[<ffffffff8145b8c1>] do_IRQ+0x99/0xb0
[<ffffffff81012713>] ret_from_intr+0x0/0x11
<EOI>  [<ffffffff810199d6>] ? mwait_idle+0x74/0x7f
[<ffffffff810199c9>] ? mwait_idle+0x67/0x7f
[<ffffffff81010d6f>] ? cpu_idle+0xa5/0xd4
[<ffffffff8145121f>] ? start_secondary+0x1fd/0x23c

...

rpciod/1: page allocation failure. order:2, mode:0x20
swapper: page allocation failure. order:2, mode:0x20
Pid: 0, comm: swapper Tainted: P           2.6.32-300.32.5.el5uek #1
Call Trace:
<IRQ>  [<ffffffff810ddd8b>] __alloc_pages_nodemask+0x524/0x595
[<ffffffff8110d6ef>] kmem_getpages+0x4f/0xf4
[<ffffffff8110d8ec>] fallback_alloc+0x158/0x1ce
[<ffffffff8110da83>] ____cache_alloc_node+0x121/0x134
[<ffffffff8110e0a3>] kmem_cache_alloc_node_notrace+0x84/0xb9
[<ffffffff8110e11e>] __kmalloc_node+0x46/0x73
[<ffffffff813b9518>] ? __alloc_skb+0x72/0x13d
[<ffffffff813b9518>] __alloc_skb+0x72/0x13d
[<ffffffffa0157d93>] ixgbe_alloc_rx_buffers+0x93/0x204 [ixgbe]
[<ffffffffa015ac08>] ixgbe_poll+0xeea/0x1071 [ixgbe]
[<ffffffff813c45d9>] net_rx_action+0xc6/0x1cd
[<ffffffff8105e8c5>] __do_softirq+0xd7/0x19e
[<ffffffff810aee94>] ? handle_IRQ_event+0x10a/0x120
[<ffffffff81012eec>] call_softirq+0x1c/0x30
[<ffffffff81014695>] do_softirq+0x46/0x89
[<ffffffff8105e74a>] irq_exit+0x3b/0x7a
[<ffffffff8145b8c1>] do_IRQ+0x99/0xb0
[<ffffffff81012713>] ret_from_intr+0x0/0x11
<EOI>  [<ffffffff810199d6>] ? mwait_idle+0x74/0x7f
[<ffffffff810199c9>] ? mwait_idle+0x67/0x7f
[<ffffffff81010d6f>] ? cpu_idle+0xa5/0xd4
[<ffffffff8145121f>] ? start_secondary+0x1fd/0x23c

...

Call Trace:
<IRQ>  [<ffffffff810ddd8b>] __alloc_pages_nodemask+0x524/0x595
[<ffffffff8110d6ef>] kmem_getpages+0x4f/0xf4
[<ffffffff8110d8ec>] fallback_alloc+0x158/0x1ce
[<ffffffff8110da83>] ____cache_alloc_node+0x121/0x134
[<ffffffff8110e0a3>] kmem_cache_alloc_node_notrace+0x84/0xb9
[<ffffffff8110e11e>] __kmalloc_node+0x46/0x73
[<ffffffff813b9518>] ? __alloc_skb+0x72/0x13d
[<ffffffff813b9518>] __alloc_skb+0x72/0x13d
[<ffffffffa0157d93>] ixgbe_alloc_rx_buffers+0x93/0x204 [ixgbe]
[<ffffffffa015ac08>] ixgbe_poll+0xeea/0x1071 [ixgbe]
[<ffffffff812b2d0e>] ? mix_pool_bytes_extract+0x145/0x154
[<ffffffff812b31a8>] ? add_timer_randomness+0x107/0x110
[<ffffffff813c45d9>] net_rx_action+0xc6/0x1cd
[<ffffffff8105e8c5>] __do_softirq+0xd7/0x19e
[<ffffffff81012eec>] call_softirq+0x1c/0x30
<EOI>  [<ffffffff81014695>] do_softirq+0x46/0x89
[<ffffffff8105df02>] _local_bh_enable_ip+0x82/0x93
[<ffffffff8105e00b>] local_bh_enable+0x12/0x14
[<ffffffff813c1b31>] rcu_read_unlock_bh+0xe/0x10
[<ffffffff813c4dac>] dev_queue_xmit+0x2ed/0x310
[<ffffffff813c8536>] neigh_resolve_output+0x1db/0x210
[<ffffffff813b9568>] ? __alloc_skb+0xc2/0x13d
[<ffffffff813eb026>] ip_finish_output2+0x1a1/0x1e5
[<ffffffff813eb0cc>] ip_finish_output+0x62/0x67
[<ffffffff813eb17f>] ip_output+0xae/0xb5
[<ffffffff813e957d>] dst_output+0x10/0x12
[<ffffffff813eabe4>] ip_local_out+0x23/0x28
[<ffffffff813ebc2e>] ip_queue_xmit+0x301/0x371
[<ffffffff813b9518>] ? __alloc_skb+0x72/0x13d
[<ffffffff813fcad5>] tcp_transmit_skb+0x62d/0x66d
[<ffffffff813fdeec>] tcp_write_xmit+0x6d7/0x7bd
[<ffffffff813fc151>] ? tcp_current_mss+0x4b/0x6a
[<ffffffff813fe037>] __tcp_push_pending_frames+0x2f/0x62
[<ffffffff813f13af>] tcp_push+0x86/0x88
[<ffffffff813f21bb>] tcp_sendpage+0x375/0x3b3
[<ffffffffa0418d14>] xs_sendpages+0x120/0x1b5 [sunrpc]
[<ffffffffa041abb3>] xs_tcp_send_request+0x49/0x11a [sunrpc]
[<ffffffffa0417ac6>] xprt_transmit+0x10d/0x1e7 [sunrpc]
[<ffffffffa04bd732>] ? nfs3_xdr_writeargs+0x0/0x7a [nfs]
[<ffffffffa0415164>] call_transmit+0x1d3/0x21e [sunrpc]
[<ffffffffa041b8ba>] __rpc_execute+0x85/0x270 [sunrpc]
[<ffffffffa041baa5>] ? rpc_async_schedule+0x0/0x17 [sunrpc]
[<ffffffffa041baba>] rpc_async_schedule+0x15/0x17 [sunrpc]
[<ffffffff81072d62>] worker_thread+0x14d/0x1ed
[<ffffffff81077028>] ? autoremove_wake_function+0x0/0x3d
[<ffffffff81072c15>] ? worker_thread+0x0/0x1ed
[<ffffffff81076c7f>] kthread+0x6e/0x76
[<ffffffff81012dea>] child_rip+0xa/0x20
[<ffffffff81076c11>] ? kthread+0x0/0x76
[<ffffffff81012de0>] ? child_rip+0x0/0x20


...

swapper: page allocation failure. order:2, mode:0x20
Pid: 0, comm: swapper Tainted: P           2.6.32-300.32.5.el5uek #1
Call Trace:
<IRQ>  [<ffffffff810ddd8b>] __alloc_pages_nodemask+0x524/0x595
[<ffffffff8110d6ef>] kmem_getpages+0x4f/0xf4
[<ffffffff8110d8ec>] fallback_alloc+0x158/0x1ce
[<ffffffff8110da83>] ____cache_alloc_node+0x121/0x134
[<ffffffff8110e0a3>] kmem_cache_alloc_node_notrace+0x84/0xb9
[<ffffffff8110e11e>] __kmalloc_node+0x46/0x73
[<ffffffff813b9518>] ? __alloc_skb+0x72/0x13d
[<ffffffff813b9518>] __alloc_skb+0x72/0x13d
[<ffffffffa0157d93>] ixgbe_alloc_rx_buffers+0x93/0x204 [ixgbe]
[<ffffffffa015ac08>] ixgbe_poll+0xeea/0x1071 [ixgbe]
[<ffffffff8101859a>] ? native_sched_clock+0x37/0x39
[<ffffffff8123786c>] ? rb_insert_color+0x68/0xe3
[<ffffffff813c45d9>] net_rx_action+0xc6/0x1cd
[<ffffffff8105e8c5>] __do_softirq+0xd7/0x19e
[<ffffffff810aee94>] ? handle_IRQ_event+0x10a/0x120
[<ffffffff81012eec>] call_softirq+0x1c/0x30
[<ffffffff81014695>] do_softirq+0x46/0x89
[<ffffffff8105e74a>] irq_exit+0x3b/0x7a
[<ffffffff8145b8c1>] do_IRQ+0x99/0xb0
[<ffffffff81012713>] ret_from_intr+0x0/0x11
<EOI>  [<ffffffff810199d6>] ? mwait_idle+0x74/0x7f
[<ffffffff810199c9>] ? mwait_idle+0x67/0x7f
[<ffffffff81010d6f>] ? cpu_idle+0xa5/0xd4
[<ffffffff8145121f>] ? start_secondary+0x1fd/0x23c


...


sshd: page allocation failure. order:1, mode:0x20
Pid: 9578, comm: sshd Tainted: P           2.6.32-300.32.5.el5uek #1
Call Trace:
<IRQ>  [<ffffffff810ddd8b>] __alloc_pages_nodemask+0x524/0x595
[<ffffffff8110d6ef>] kmem_getpages+0x4f/0xf4
[<ffffffff8110d8ec>] fallback_alloc+0x158/0x1ce
[<ffffffff8110da83>] ____cache_alloc_node+0x121/0x134
[<ffffffff8110ee1c>] kmem_cache_alloc+0x7f/0xf7
[<ffffffff813b5035>] sk_prot_alloc+0x3b/0x13e
[<ffffffff813b63ad>] sk_clone+0x1e/0x270
[<ffffffff813efe30>] inet_csk_clone+0x16/0x9c
[<ffffffff81404272>] tcp_create_openreq_child+0x23/0x3f5
[<ffffffff81402ce0>] tcp_v4_syn_recv_sock+0x5c/0x21a
[<ffffffff81404157>] tcp_check_req+0x1f3/0x2eb
[<ffffffff813efcbd>] ? inet_csk_search_req+0x3c/0x9d
[<ffffffff814015bf>] tcp_v4_do_rcv+0x225/0x352
[<ffffffff8105e00b>] ? local_bh_enable+0x12/0x14
[<ffffffff81402982>] tcp_v4_rcv+0x459/0x6d0
[<ffffffff813e6e52>] ip_local_deliver_finish+0x152/0x1fa
[<ffffffff813e724d>] ip_local_deliver+0x72/0x7d
[<ffffffff813e6c7e>] ip_rcv_finish+0x372/0x38c
[<ffffffff813f370b>] ? tcp_gro_receive+0x7e/0x1e5
[<ffffffff813e719c>] ip_rcv+0x2a2/0x2e1
[<ffffffff813c14ab>] __netif_receive_skb+0x41b/0x440
[<ffffffff813c1519>] netif_receive_skb+0x49/0x50
[<ffffffff813c15b5>] napi_skb_finish+0x2b/0x42
[<ffffffff813c1a2e>] napi_gro_receive+0x2f/0x34
[<ffffffffa01935e8>] igb_poll+0x808/0xb78 [igb]
[<ffffffff8101859a>] ? native_sched_clock+0x37/0x39
[<ffffffff810182e0>] ? sched_clock+0x9/0xd
[<ffffffff8107bd51>] ? sched_clock_cpu+0x4c/0xdc
[<ffffffff813c45d9>] net_rx_action+0xc6/0x1cd
[<ffffffff8105e8c5>] __do_softirq+0xd7/0x19e
[<ffffffff810aee94>] ? handle_IRQ_event+0x10a/0x120
[<ffffffff81012eec>] call_softirq+0x1c/0x30
[<ffffffff81014695>] do_softirq+0x46/0x89
[<ffffffff8105e74a>] irq_exit+0x3b/0x7a
[<ffffffff8145b8c1>] do_IRQ+0x99/0xb0
[<ffffffff81012713>] ret_from_intr+0x0/0x11
<EOI>


...


swapper: page allocation failure. order:2, mode:0x20
Pid: 0, comm: swapper Tainted: P           2.6.32-300.32.5.el5uek #1
Call Trace:
<IRQ>  [<ffffffff810ddd8b>] __alloc_pages_nodemask+0x524/0x595
[<ffffffff8110d6ef>] kmem_getpages+0x4f/0xf4
[<ffffffff8110d8ec>] fallback_alloc+0x158/0x1ce
[<ffffffff8110da83>] ____cache_alloc_node+0x121/0x134
[<ffffffff8110e0a3>] kmem_cache_alloc_node_notrace+0x84/0xb9
[<ffffffff8110e11e>] __kmalloc_node+0x46/0x73
[<ffffffff813b9518>] ? __alloc_skb+0x72/0x13d
[<ffffffff813b9518>] __alloc_skb+0x72/0x13d
[<ffffffffa0157d93>] ixgbe_alloc_rx_buffers+0x93/0x204 [ixgbe]
[<ffffffffa015ac08>] ixgbe_poll+0xeea/0x1071 [ixgbe]
[<ffffffff81044498>] ? __wake_up+0x48/0x55
[<ffffffff812b3098>] ? credit_entropy_bits+0x90/0x99
[<ffffffff813c45d9>] net_rx_action+0xc6/0x1cd
[<ffffffff8105e8c5>] __do_softirq+0xd7/0x19e
[<ffffffff810aee94>] ? handle_IRQ_event+0x10a/0x120
[<ffffffff81012eec>] call_softirq+0x1c/0x30
[<ffffffff81014695>] do_softirq+0x46/0x89
[<ffffffff8105e74a>] irq_exit+0x3b/0x7a
[<ffffffff8145b8c1>] do_IRQ+0x99/0xb0
[<ffffffff81012713>] ret_from_intr+0x0/0x11
<EOI>  [<ffffffff810199d6>] ? mwait_idle+0x74/0x7f
[<ffffffff810199c9>] ? mwait_idle+0x67/0x7f
[<ffffffff81010d6f>] ? cpu_idle+0xa5/0xd4
[<ffffffff8145121f>] ? start_secondary+0x1fd/0x23c
 

 

@ e.g. from SR 3-9966841401: Database lost connection because server was not accessible

Here are a few more examples by file type / location

Example 2

OS Messages
Dec 2 03:53:59 oda03 kernel: swapper: page allocation failure.   order:2, mode:0x20     <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<,
Dec 2 03:53:59 oda03 kernel: Pid: 0, comm: swapper Tainted: P W 2.6.32-300.32.5.el5uek #1
Dec 2 03:53:59 oda03 kernel: Call Trace:
Dec 2 03:53:59 oda03 kernel: <IRQ> [<ffffffff810ddd8b>] __alloc_pages_nodemask+0x524/0x595
Dec 2 03:53:59 oda03 kernel: [<ffffffff8110d6ef>]  kmem_getpages+0x4f/0xf4
Dec 2 03:53:59 oda03 kernel: [<ffffffff8110d8ec>]  fallback_alloc+0x158/0x1ce
Dec 2 03:53:59 oda03 kernel: [<ffffffff8110da83>]  ____cache_alloc_node+0x121/0x134
Dec 2 03:53:59 oda03 kernel: [<ffffffff8110e0a3>]  kmem_cache_alloc_node_notrace+0x84/0xb9
Dec 2 03:53:59 oda03 kernel: [<ffffffff8110e11e>]  __kmalloc_node+0x46/0x73
Dec 2 03:53:59 oda03 kernel: [<ffffffff813b9518>]  ? __alloc_skb+0x72/0x13d
Dec 2 03:53:59 oda03 kernel: [<ffffffff813b9518>]  __alloc_skb+0x72/0x13d
Dec 2 03:53:59 oda03 kernel: [<ffffffffa0157d93>]  ixgbe_alloc_rx_buffers+0x93/0x204 [ixgbe]
Dec 2 03:53:59 oda03 kernel: [<ffffffffa015ac08>]  ixgbe_poll+0xeea/0x1071 [ixgbe]
Dec 2 03:53:59 oda03 kernel: [<ffffffff813c45d9>]  net_rx_action+0xc6/0x1cd
Dec 2 03:53:59 oda03 kernel: [<ffffffff8105e8c5>]  __do_softirq+0xd7/0x19e
Dec 2 03:53:59 oda03 kernel: [<ffffffff810aee94>]  ? handle_IRQ_event+0x10a/0x120
Dec 2 03:53:59 oda03 kernel: [<ffffffff81012eec>]  call_softirq+0x1c/0x30
Dec 2 03:53:59 oda03 kernel: [<ffffffff81014695>]  do_softirq+0x46/0x89
Dec 2 03:54:00 oda03 kernel: [<ffffffff8105e74a>]  irq_exit+0x3b/0x7a
Dec 2 03:54:00 oda03 kernel: [<ffffffff8145b8c1>]  do_IRQ+0x99/0xb0
Dec 2 03:54:00 oda03 kernel: [<ffffffff81012713>]  ret_from_intr+0x0/0x11
Dec 2 03:54:00 oda03 kernel: <EOI> [<ffffffff810199d6>] ? mwait_idle+0x74/0x7f
Dec 2 03:54:00 oda03 kernel: [<ffffffff810199c9>]  ? mwait_idle+0x67/0x7f
Dec 2 03:54:00 oda03 kernel: [<ffffffff81010d6f>]  ? cpu_idle+0xa5/0xd4
Dec 2 03:54:00 oda03 kernel: [<ffffffff8145121f>]  ? start_secondary+0x1fd/0x23c
Dec 2 03:54:00 oda03 kernel: Mem-Info:
... 
Dec 2 05:18:13 oda03 syslogd 1.4.1: restart.          <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Dec 2 05:18:13 oda03 kernel: klogd 1.4.1, log source = /proc/kmsg started.
Dec 2 05:18:13 oda03 kernel: 6-11,18-23 (cpu_power = 7068)
Dec 2 05:18:13 oda03 kernel: CPU1 attaching sched-domain:

 

MEMINFO  - (excerpt showing memfree is running out just before the reboot)

...

Meminfo
============
zzz ***Tue Dec 2 03:59:02 EST 2014
MemTotal:  98929480 kB
MemFree:     595328 kB <<<<<<<<<<<<< decreasing and now approaching reboot!
Buffers:     368332 kB
Cached:    36465928 kB
... 


OSTop 

OS Top
========
top - 03:59:06 up 266 days, 2:00, 9 users, load average: 4.49, 4.03, 3.64

Tasks: 940 total, 2 running, 938 sleeping, 0 stopped, 0 zombie
Cpu(s): 5.8%us, 3.6%sy, 0.0%ni, 78.2%id, 12.2%wa, 0.0%hi, 0.1%si, 0.0%st
Mem:  98929480k total, 98414808k used,   514672k free,   368256k buffers
Swap: 25165816k total,  3271740k used, 21894076k free, 36556856k cached      <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Swapping is happening

PID      USER PR NI   VIRT  RES  SHR  S %CPU  %MEM     TIME+    COMMAND
25919  oracle 20  0  12.1g  11g  11g  D 65.0  12.2  299:50.42  /cloudfs/goldengate
10548  oracle 20  0  24.2g  26m  21m  S 11.5   0.0   36:55.47  oracleabcGP1 (LOCAL
26766  oracle 20  0  24.2g  28m  24m  S  8.5   0.0   61:01.08  oracleabcGP1 (LOCAL
13508  oracle 20  0  24.2g  26m  21m  S  4.9   0.0   23:35.33  oracleabcGP1 (LOCAL
24137  oracle 20  0  24.2g  27m  22m  S  4.9   0.0    4:32.45  oracleabcGP1 (LOCAL
12103  oracle 20  0  8425m 4996 3712  S  3.9   0.0    1385:42    ora_lck0_STMTNFP1
13947  oracle 20  0   324m 8936 4260  S  3.9   0.0   86:48.12  /cloudfs/goldengate
 2438  oracle 20  0  24.2g  24m  20m  S  3.0   0.0    0:17.16  oracleabcGP1 (LOCAL
10545  oracle 20  0  24.2g  26m  21m  S  3.0   0.0   23:33.25  oracleabcGP1 (LOCAL
...
...

 

ASM ALERT.LOG   - shows restart at the time of the reboot matching the time stamps of confirmed memory issues

ASM1
=====
...
Tue Dec 02 03:54:05 2014
Time drift detected. Please check VKTM trace file for more details.
Tue Dec 02 05:16:19 2014    <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< No activity recorded on this instance at the time of the reboot until some time later (over an hour for this case)
NOTE: No asm libraries found in the system
* instance_number obtained from CSS = 1, checking for the existence of node 0... 
* node 0 does not exist. instance_number = 1 
NOTE: parameter asm_diskstring not allowed in ODA appliance; overriding asm_diskstring to "/dev/mapper/*D_*"
Starting ORACLE instance (normal)
...
...

 

Cause

The setting for vm.min_free_kbytes is set too small.

Solution

Upgrade to ODA 2.9 where the setting is increased.
 

As a workaround, you may manually set vm.min_free_kbytes until you are able to upgrade.

vm.min_free_kbytes=512000

 

References

<BUG:14849704> - PAGE ALLOCATION FAILURE. ORDER:1, MODE:0X20
<NOTE:1546861.1> - [Linux OS] System Hung with Large Numbers of Page Allocation Failures with "order:5" on Exadata Environments

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback