Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1947114.1
Update Date:2016-05-29
Keywords:

Solution Type  Technical Instruction Sure

Solution  1947114.1 :   How to boot Exadata database server with diagnostic ISO image  


Related Items
  • Exadata X4-2 Hardware
  •  
  • Exadata Database Machine V2
  •  
  • Exadata Database Machine X2-2 Hardware
  •  
  • Exadata X3-2 Hardware
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Exadata>DB: Exadata_EST
  •  


How to boot Exadata database server with diagnostic ISO image

Created from <SR 3-9892129721>

Applies to:

Exadata X4-2 Hardware - Version All Versions to All Versions [Release All Releases]
Exadata X3-2 Hardware - Version All Versions to All Versions [Release All Releases]
Exadata Database Machine V2 - Version All Versions to All Versions [Release All Releases]
Exadata Database Machine X2-2 Hardware - Version All Versions to All Versions [Release All Releases]
Linux x86-64

Goal

This document provides the steps for booting an Exadata database server (compute node) with the diagnostic ISO image.

An alternative boot method may be required in cases where the server does not boot, due to an incorrect value for a kernel parameter or some other (Operating System) software related change. While the same procedure can be used to boot an Exadata storage cell, this is normally not required as the storage cells have the built in USB disks that can be used as an alternative to a normal boot sequence, by choosing the CELL_USB_BOOT_CELLBOOT_usb_in_rescue_mode boot entry from the GRUB.

INTERNAL: For many types of issues, a faster approach to obtaining a shell is to boot in single user mode by following the steps in internal Doc ID 1321297.1 - How to unlock expired/locked root account on Exadata Compute nodes and Storage Cells. The steps therein are applicable to both compute nodes and storage cells.

 

Solution

NOTE: There are two flavours of the web based ILOM interface for Exadata servers. Prior to that Exadata image version 11.2.3.3.0, the main menu had the tabbed interface at the top of the page. The new interface has the drop down style main menu on the left hand side of the page. In this article the older style interface will be marked with (old ILOM interface) and the later one with (new ILOM interface). Irrespective of the interface style the ILOM should provide the same functionality.

1. Copy diag.iso to desktop machine

Copy the diag.iso from any database server or storage cell to your desktop machine, to say Downloads folder. The diagnostic ISO image is available on all Exadata database servers and storage cells as /opt/oracle.SupportTools/diagnostics.iso, which is actually a link to the diag.iso:

# ls -l /opt/oracle.SupportTools/diagnostics.iso
lrwxrwxrwx 1 root root 27 Aug  2 08:26 /opt/oracle.SupportTools/diagnostics.iso -> /opt/oracle.cellos/diag.iso

INTERNAL ONLY:

Recent versions of the diag.iso can also be obtained via MOS. To identify the customer facing patch number for a specific Exadata Software image version of the diag.iso. simply search for "EXADATA diag.iso" in the ISP Advanced Search and select the desired version. Generally, it's fine to use the latest available diag.iso, even if it's newer than customer's Exadata Software image version

 

2. Set up the server to boot from the diagnostic ISO image

2.1. Log in as root to the web based ILOM console for the server, i.e. enter <server name>-ilom in the web browser address field, and provide the root username and password in the login dialog box.

2.2. From the main ILOM menu click through to [Remote Control] -> [Redirection] -> [Launch Remote Console].

That should launch the Remote Console, a new window captioned [Oracle(R) Integrated Lights Out Manager Remote Console], that provides access to the server's video console (tty1).

2.3. From the Remote Console main menu, select [Devices] -> [CD-ROM Image...].

That should prompt for a file to open. Locate the diag.iso (e.g. in your Downloads folder), select it and click [Open].

This attaches the diag.iso as the virtual CD-ROM image, that will be used to boot the server up.

Please Note: For X5 systems:

  On the Exadata X5 and later, the Devices option is not present.  For these machines, the user must do the following to connect an ISO.

1.  Select KVMS to display the KVMS drop-down menu.
2.  Select Storage.  The Storage Devices dialog appears.
3.  In the Storage Devices dialog, select Add.  The Add Storage Device dialog appears.
4.  Browse to the ISO image, select it, and click Select.  The Storage Devices screen appears and lists the ISO image.
5.  Select the ISO image and click Connect.  The ISO image is mounted to the remote console and can be used to perform the OS installation.
6.  Click OK to dismiss the Storage Devices dialog.

Reference documentation: https://docs.oracle.com/cd/E41059_01/html/E48312/naplo.z4002d3d1472914.html

3. Set up a terminal (putty) session to the server ILOM

3.1. Use a terminal emulator software (e.g. putty) from your desktop and login to <server>-ilom as root. Alternatively, ssh from any server on your network to <server>-ilom, for example, if the server name is exadb01, do this:
 

% ssh root@exadb01-ilom
Password:

3.2. That should take you to the "->" - the ILOM command line interface prompt, that would look like this:

Oracle(R) Integrated Lights Out Manager
Version 3.1.2.20.c r86871
Copyright (c) 2014, Oracle and/or its affiliates. All rights reserved.

->

3.3. Set the next boot device to "CDROM". Run:

-> set /HOST boot_device=cdrom

Note: This setting is non-persistent, only taking effect for the next reboot, power cycle or power on/power off. Any subsequent reboots, power cycles, or power off/power on will use the default boot device configured in the BIOS, namely the LSI HBA controller disks on DB nodes and the internal USB thumb drive on storage cells. Should you need to again boot from diag.iso on a subsequent reboot, re-run the above command beforehand. 

3.4. Reboot the server, so as to boot from the diag.iso. Run from the ILOM prompt:

-> reset -force /SYS

3.5 Shortly after, start the serial (ttyS0) /SP/console. Run:

-> start /SP/console
Are you sure you want to start /SP/console (y/n)? y

That should show the following message:

Serial console started. To stop, type ESC (

4. Wait for the server to boot from the diag.iso

The server should be booting from the diag.iso. The reboot messages will be seen in both the Remote Console and the putty/ssh session. In 11.2.3.3.1 and later Exadata releases, the primary console accepting user I/O is set as the serial ttyS0 /SP/console. Depending on the uplink speed between the desktop machine and the ILOM, loading the vmlinuz and initrd prior to booting the kernel may take some time. 

5. Enter the interactive diagnostics shell

On both the Remote Console window and the putty/ssh session window you will see the server going through  BIOS POST, then the kernel boot messages.

At the end of the boot up sequence, there should be the menu prompt such as the one below. Note: When the diag.iso shipped with Exadata software version 11.2.3.3.1 or later is used, the prompt will be displayed on the serial /SP/console, not on the video Remote Console that was used to mount it. The console which does not display the prompt may give the false impression that the boot sequence has hung, so both consoles should be checked for the below prompt:

EXT3-fs: mounted filesystem with ordered data mode.
[date/time]  The current installation has version [Exadata image version]
[date/time]  Choose from the following by typing letter in '()':
[date/time]    (e)nter interactive diagnostics shell.
[date/time]      Use diagnostics shell password to login as root user
[date/time]      (reboot or power cycle to exit the shell),
[date/time]    (r)estore system from NFS backup archive,
Select:

Type 'e' to enter the interactive diagnostics shell.

Log in as root with the password obtained from Exadata Support Team. Note that you cannot use your own root password at this point.


localhost login: root
Password: *********

Once logged in, enter the chroot environment like this:

-sh-3.2# chroot /mnt/cell

This provides the full access to the server's file systems.

Steps 5.1 and 5.2 below are optional, only being required in certain scenarios.

5.1. Should you need to perform certain tasks that require access to filesystem/device special nodes under /dev from inside the chroot environment, such as mounting filesystems or recreating the initramfs, it's recommended to first set up the following mounts before running the chroot command:

cd /mnt/cell
mount -t proc proc proc/
mount -t sysfs sys sys/
mount -o bind /dev dev/
chroot /mnt/cell

5.2. If access to /boot is also required, the filesystem may be mounted even from inside the chroot, once above mounts have been set up. 

For DB nodes, run the following from inside the chroot to mount the /boot filesystem:

mount /dev/sda1 /boot

For storage cells, run instead:

mount /dev/md4 /boot


5.3. At this point you can attempt to correct the problem that prevents the server from the normal boot. For example, to revert back a Linux kernel parameter value, you can edit /etc/sysctl.conf file and save the changes. To disable SELinux, you may edit /etc/selinux/config

If problem happened in the context of a DB node Exadata Software upgrade, the steps in the following note may be used to set the root device to the LVM volume on which the backup of the Exadata OS image was placed prior to the upgrade. Doc ID 1952372.1 - How to recover from a failed Linux Exadata DB Server dbnodeupdate or rollback.

Even if problem happened outside of an upgrade, above approach may still prove useful, though it would require that the Exadata Software image be again upgraded once the OS has been restored.

Once corrective actions are completed, you are ready to attempt a normal server boot.

6. Detach the diag.iso/stop the CD-ROM redirection

In the Remote Console main menu, select [Devices] -> [CD-ROM Image...]. This will result in the following message: Are you sure you want to stop CD-ROM redirection?. Click [Yes].

That will result in the following message in the putty/ssh session window:

sh-3.2# usb 1-3.2: USB disconnect, address 4

That is normal and can be ignored.

7. Reboot the server

To reboot the server, in the putty/ssh session window, type exit, followed by reboot:

sh-3.2# exit
exit
-sh-3.2# reboot

Of use the web based ILOM interface to reboot the server. Alternatively, reboot it from the ILOM shell as per step 3.4 above. To exit from the /SP/console, use the ESC ( key sequence. 

This should result in the normal server boot up sequence. If the original problem has been resolved, the server should boot up to multi-user mode.

8. Copying files from the server via the diag.iso

Should the need arise to copy files (e.g. logs, configuration files) from the server while it's booted via diag.iso, this can be easily achieved, assuming relatively healthy filesystem contents under /mnt/cell. The steps involve plumbing up the eth0 management interface and using scp to copy any files to/from another DB node or cell in the rack. Steps:

8.1. Once inside the chroot, retrieve the configuration of the eth0 management interface, noting the IPADDR and NETMASK values:

cat /etc/sysconfig/network-scripts/ifcfg-eth0

8.2. Plumb up the interface using the values from ifcfg-eth0

ifconfig eth0 <IPADDR> netmask <NETMASK> up

8.3. Verify network connectivity by attempting to ping another node in the rack. NOTE: The Ctrl + C escape sequence may not work in the diag.iso environment, therefore avoid running any commands that would need a Ctrl + C to terminate. For "ping" ensure that the "-c 3" option is used, so as to terminate the ping after 3 requests:

ping -c 3 <IP address of another node in the rack>

8.4. Use scp command to copy files.

Note: The diag.iso itself does not ship scp. It does however include an ftp client as well as netcat (nc), which can alternatively be used to copy files in case the filesystem on /mnt/cell has lost it's contents or filesystem corruption has rendered it unmountable, making the above chroot based approach impractical. As DB nodes will typically have a copy of the previous Exadata Software image saved to /dev/VGExaDb/LVDbSys2, it's worth attempting to manually mount that volume to /mnt/cell and check it's integrity. 

 

References

<NOTE:1589715.1> - HOWTO: Boot From an ISO Image File Using the ILOM Remote Console
<NOTE:1321297.1> - How to unlock expired/locked root account on Exadata Compute nodes and Storage Cells
<BUG:19157124> - CANNOT RESTORE WITH 11.2.3.3.1 DIAGNOSTIC.ISO
<NOTE:2003016.1> - OS Bootup on Exadata hangs after GRUB screen, but GUI hostconsole does not show error or accept entry
https://docs.oracle.com/cd/E41059_01/html/E48312/naplo.z4002d3d1472914.html

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback