![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1935983.1 : Node Constantly Booting into the Grub Menu After Reboot Step of Oracle BDA Mammoth V4.0.0 Upgrade
In this Document
Created from <SR 3-9733467061> Applies to:Big Data Appliance X4-2 Hardware - Version All Versions and laterx86_64 SymptomsAfter the 'reboot' step of an Oracle Big Data Appliance upgrade to V4.0.0 one node continuously boots into the Grub menu. Additional Troubleshooting for three potential update problems shows no indication of errors. The three things being updated during the 'reboot' step are:
In the case here, all look to be updated appropriately because: 1. Checking both copies of grub.conf i.e. /boot/grub/grub.conf and /usbdisk/boot/grub/grub.conf confirms that both were updated on all nodes of the cluster and that all are identical. This rules out the case that one copy was not updated when the kernel was upgraded. (Note that if usbdisk is not mounted, mount it with "mount usbdisk" before checking /usbdisk/boot/grub/grub.conf.) 2. The output from "rpm -qa|grep kernel-uek|sort" is the same on all nodes of the cluster. 3. The output from 'imageinfo' is correct on all nodes of the cluster. CauseThe root problem cause is that a partition for both /dev/md2 and /dev/md0 was removed on one node of the cluster, i.e. the node continuously booting to the Grub menu.
# mdadm --detail --test /dev/md2
/dev/md2: Version : 1.1 Creation Time : Mon Jun 16 17:26:45 2014 Raid Level : raid1 Array Size : 488149824 (465.54 GiB 499.87 GB) Used Dev Size : 488149824 (465.54 GiB 499.87 GB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Wed Oct 15 12:47:44 2014 State : active, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Name : localhost.localdomain:2 UUID : 99f3d098:a655aefc:72a9322c:f004d4fa Events : 1390874 Number Major Minor RaidDevice State 0 8 18 0 active sync /dev/sdb2 1 0 0 1 removed # mdadm --detail --test /dev/md0
/dev/md0: Version : 1.0 Creation Time : Mon Jun 16 17:26:44 2014 Raid Level : raid1 Array Size : 194496 (189.97 MiB 199.16 MB) Used Dev Size : 194496 (189.97 MiB 199.16 MB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Update Time : Tue Oct 14 17:30:17 2014 State : clean, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Name : localhost.localdomain:0 UUID : be320f94:05f780e6:9d4150a0:7e041533 Events : 258 Number Major Minor RaidDevice State 0 8 17 0 active sync /dev/sdb1 1 0 0 1 removed SolutionLong term bug: Bug 19824921 - BDACHECKCLUSTER NOT CHECKING IF UNDERLYING RAID PARTITIONS ARE FULLY FUNCTIONAL, was filed to check for this condition. Note: Currently bdacheckcluster runs bdacheckhw and bdachecksw on all nodes.
bdacheckhw: checks for physical health of all disks so that will not error our here because all disks are healthy. bdachecksw: checks that all partitions are fully functional. It isn't throwing an error because / (/dev/md2) and /boot (/dev/md0) are still fully functional even with one disk out of the raid partition. There is no error here because we are only testing that all partitions are fully functional (which they are here). We are not testing that the underlying raid partitions are fully healthy so even if they are degraded (as they are here) there is no error.
# mdadm --add /dev/md0 /dev/sda1
# mdadm --add /dev/md2 /dev/sda2 After re-adding the devices verify: 1. That reboot is successful. 2. That the output run as 'root' from Node 1 for: "dcli -C imageinfo | grep KERNEL_VERSION" is the same and correct on all nodes. # dcli -C imageinfo | grep KERNEL_VERSION
3. That the output run as 'root' from Node 1 for: "dcli -C uname -a" is the same and correct on all nodes. # dcli -C uname -a
References<BUG:19824921> - BDACHECKCLUSTER NOT CHECKING IF UNDERLYING RAID PARTITIONS ARE FULLY FUNCTIONALAttachments This solution has no attachment |
||||||||||||||||||
|