Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1664223.1
Update Date:2014-04-21
Keywords:

Solution Type  Problem Resolution Sure

Solution  1664223.1 :   On Oracle Big Data Appliance (BDA) One Node Shuts Down Itself On Its Own - NUMA Disabled  


Related Items
  • Big Data Appliance Hardware
  •  
Related Categories
  • PLA-Support>Eng Systems>BDA>Big Data Appliance>DB: BDA_EST
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-8697460131>

Applies to:

Big Data Appliance Hardware - Version All Versions and later
Linux x86-64

Symptoms

An Oracle Big Data Appliance (BDA) node shuts down itself on its own almost everyday. When it goes down, it needs to be rebooted from ILOM.

To check an individual node to see if NUMA is enabled run the following commands:

# grep -i numa /var/log/dmesg

AND

# numactl --hardware

 

To check if NUMA is enabled on all nodes of the BDA cluster run from Node 1 as root:

# dcli -C grep -i numa /var/log/dmesg
# dcli -C numactl --hardware

 

The following is an example of output showing NUMA is disabled:

# grep -i numa /var/log/dmesg
  
Command line: ro root=/dev/md2 rhgb console=ttyS0,9600n8 console=tty1 crashkernel=256M loglevel=7 panic=60 debug audit=1 processor.max_cstate=1 nomce numa=off
NUMA turned off
Kernel command line: ro root=/dev/md2 rhgb console=ttyS0,9600n8 console=tty1 crashkernel=256M loglevel=7 panic=60 debug audit=1 processor.max_cstate=1 nomce numa=off
# numactl --hardware
  
available: 1 nodes (0)
node 0 size: 98295 MB
node 0 free: 3022 MB
No distance information available.



On nodes where NUMA is enabled you should see the following output:

# grep -i numa /var/log/dmesg
NUMA: Allocated memnodemap from 12040 - 43080
NUMA: Using 20 for the hash shift.
# numactl --hardware
available: 2 nodes (0-1)
node 0 size: 49143 MB
node 0 free: 1681 MB
node 1 size: 49152 MB
node 1 free: 1280 MB
node distances:
node 0 1
0: 10 21
1: 21 10

 

Cause

This is caused when the grub.conf on the USB disk of a server specifies "numa=off".  The Linux kernel on the particular server does default to NUMA support enabled but grub.conf on the USB disk of the particular server shutting down specifies "numa=off".

It may be that this server only boots from the USB or an upgrade action fixed all the other servers on the BDA.

NUMA should be enabled since disabling NUMA results in instability under heavy Hadoop workloads on BDA nodes. This impacts earlier releases such as the v2.2.1 Mammoth software.

This is fixed in v2.4.0 and above.

Solution

Remove "numa=off" from grub.conf on the internal USB drive.

To do this, as root:

1. Mount the internal USB drive:

# mount /usbdisk


2. Backup /usbdisk/boot/grub/grub.conf. For example:

# cp -p /usbdisk/boot/grub/grub.conf /root/grub.confORIG

or backup to some place safe wherever you want.

3. Update the file /usbdisk/boot/grub/grub.conf to remove all instances of "numa=off"

4. Unmount the internal USB drive:

# umount /usbdisk


5. Reboot the server:

# reboot

 

References

<BUG:18537723> - EXPLICITLY TURN NUMA SUPPORT ON IN KERNEL OPTIONS FOR INSTALL AND UPGRADE.

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback