Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1003085.1
Update Date:2017-09-11
Keywords:

Solution Type  Technical Instruction Sure

Solution  1003085.1 :   Solaris[TM] Operating System: How to force a kernel core dump on an x86 or x64 system  


Related Items
  • Solaris Operating System
  •  
  • Sun Server X3-2
  •  
Related Categories
  • PLA-Support>Sun Systems>x86>Server>SN-x64: SERVER 64bit
  •  

PreviouslyPublishedAs
204226


Applies to:

Solaris Operating System - Version 8 2/04 U8 and later
Sun Server X3-2 - Version All Versions to All Versions [Release All Releases]
All Platforms

Goal

This document provides instructions on how to configure and initiate forced crash dumps (kernel core dumps) in x86 Solaris[TM]. This is usually performed in order to collect information for troubleshooting system hangs.

Two methods for generating a forced crash dump on Solaris x86 are covered:

  • The Solaris kernel debugger $<systemdump command: available on any platform running Solaris x86.
  • Using the Non Maskable Interrupt (NMI): available on many (but not all) platforms.

It is beyond the scope of this document to recommend when to enable or use either of these methods for forcing a crash dump - such direction would generally be provided by Oracle Customer Support, taking in to consideration your operating environment and specific troubleshooting requirements.

It is a pre-requisite that a suitable dump device is configured and available. For more information refer the the dumpadm(1M) and savecore(1M) man pages & documentation for your Operating System version.

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Sun x86 Systems

Solution

Solaris Kernel Debugger forced crash dumps

Solaris Configuration

Requires booting the system with the kernel debugger using the method appropriate for the version of Solaris installed:

 

See 'x86: How to Boot a System With the Kernel Debugger in the GRUB Boot Environment (kmdb)'

Search for 'How to Boot a System With the Kernel Debugger in the GRUB Boot Environment' at http://docs.oracle.com/, then look for relevent 'System Administration Guide: Basic Administration'

 

Initiating The Crash Dump
  1. Invoke the kernel debugger from the operating system console by sending the console break sequence (see below for console break sequence information).
  2. Then at the kernel debugger prompt type $<systemdump and press ENTER to generate the system crash dump image.
[14]> $<systemdump



nopanicdebug: 0 = 0x1



panic[cpu14]/thread=fffffe8000d4bc60: BAD TRAP: type=e (#pf Page fault) rp=fffffe8000d87db0 addr=0 occurred in module 

"<unknown>" due to a NULL pointer dereference

...

syncing file systems... 112 48 done

dumping to /dev/dsk/c0t0d0s1, offset 108593152, content: kernel

0:04 100% done

100% done: 268695 pages dumped, dump succeeded

rebooting...

 

If you invoke the kernel debugger when the operating system console is set to a text or VGA graphics display, for example from a keyboard and monitor attached to the system, or from an ILOM Java Remote GUI Console session, it might not be possible for the system to display the debugger prompt, giving the appearance the system has frozen.

This is normal - the kernel debugger suspends the operating system including GUI applications and window managers. The above command can still be typed to generate a crash dump (you just wont see any of the output) but it is preferable to use a serial port as console where available.

 

Console Break Sequence

The break sequence required for invoking the kernel debugger will vary between systems, depending on the platform, console type, and other configuration (see the man page for kbd(1)). The following are the default console break sequences in Solaris.

  • Graphics Heads & ILOM Java Remote Console. Where the operating system console is set to be a text or graphics display with keyboard attached (or ILOM Java Remote GUI Console) the break sequence is sent by either of the following:
    • F1+A (hold down the F1 function-key & press letter A)
    • SHIFT+Break (hold down the SHIFT key and press Pause/Break)
  • Serial console. Where the console is a serial port on the system (including the ILOM command-line interface accessible /SP/console) a serial break signal is used by default to invoke the kernel debugger.
  • On server and blade platforms using an ILOM service processor, issue the following from the ILOM command-line interface prompt to connect to the serial console and send the serial break:
-> start /SP/console

Are you sure you want to start /SP/console (y/n)? y

Serial console started. To stop, type ESC (

<ESC> <SHIFT+B>
  • From a V20/40Z service processor:
sp $ platform console

[Enter `^Ec?' for help]

<CTRL+E> <c> <l> <0>
(CTRL+E, lower-case 'c', lower-case 'L', then the numeric zero)
  •  Blade B100x & B200x platforms, issue break sN ('N' is the blade slot number) from the system controller prompt:
sc> break s1


Are you sure you want to send break to FRU s1 (y/n)? y


s1: Break sent.

Non-Maskable Interrupt (NMI) forced crash dumps

Solaris Configuration
If you have an X86 system which supports NMI ( Non Maskable Interrupts)

Add the following two lines to /etc/system, the system will need to be rebooted for the changes to the /etc/system to be loaded, once rebooted, you will be able to send a NMI next hang

set pcplusmp:apic_panic_on_nmi=1
set apix:apic_panic_on_nmi=1

Note the above actually contains the option to set the same variable from within two modules. Most systems currently use the pcplusmp module however, newer systems may use the apix module. Only one module will be used and if desired you can check which module is loaded and only add the setting for that module. You can check for loaded modules by using the modinfo command, so "modinfo | egrep 'apix|pcplusmp' " will show which of the two modules you are using and then you can use the single line above to set it in that module. Having the extra line will however just be ignored, if both are defined in /etc/system S10U11 will trigger a WARNING: forceload of drv/apix failed for the none used device.

In situations where systems have to start swappng, the pcplusmp and apix drivers could be unloaded to free memory. This prevents NMI working. To Prevent the drivers being unloaded in system memory shortage situations: 

Add the following two lines to /etc/system and send a NMI next hang

 

 

forceload: drv/pcplusmp

 

forceload: drv/apix

 

 

 In either case, you can verify that the system is properly prepared by checking the apic_panic_on_nmi variable via mdb to ensure that it is set to 1 with:

echo "apic_panic_on_nmi/X" | mdb -k

                This variable can also be set, which will avoid the need for a reboot (but please do add the settings to /etc/system to ensure a reboot doesn't clear the setting) with:                

echo "apic_panic_on_nmi/W1" | mdb -kw

The above two mdb commands will both work regardless of the module which provides the variable ...

The two main Entries that define NMI behaviour;


X:apic_kmdb_on_nmi=1  Cause system to enter Kernel Debugger.
X:apic_panic_on_nmi=1  Cause system to panic.

        Where 'X' is either 'pcplusmp' or 'apix'

With regard to having both entries in /etc/system, apic_kmdb_on_nmi=1 will take precedence over apic_panic_on_nmi=1

So if both are defined, it will not generate a corefile but to enter the debugger, which would require manual actions are taken to force this via the Kernel Debugger, so if you only want to capture a coredump just use one value, X:apic_panic_on_nmi=1. Also Note directions above, in regard to viewing the Debugger.

Initiating The Crash Dump

To generate the Non-Maskable Interrupt and panic the system, one or more of the following are commonly available on Sun Oracle X64 servers.

    • Press and release the NMI dump button on the server motherboard.
    • Use ipmitool to initiate the NMI remotely on the systems Service Processor (SP)
ipmitool -H <SP IP Addr> -U root chassis power diag

 

    • From ILOM's command-line interface (one of the following):
set /SP/diag generate_host_nmi=true



or



set /HOST generate_host_nmi=true



depending on your platform (cd and show will tell you which)



NOTE: Not all servers will have this ILOM object. If you get an message similar to:



set: No such property <...> generate_host_nmi



normal operating system procedures for creating a kernel coredump should be followed instead.

 

    • ILOM Service Processor Web User Interface. Log in, then navigate to:

Remote Control -> Diagnostics -> Generate NMI

 

Refer to the Service and Diagnostic documentation of your platform for confirmation on how NMI can be initiated.


The NMI might be logged to the service processor system event log :

100 | 09/04/2008 | 19:25:19 | Critical Interrupt | Software NMI | Asserted


And output similar to the following displayed on the system console:

panic[cpu13]/thread=fffffe8000c99c60: pcplusmp: NMI received



fffffe8000c99a10 pcplusmp:apic_nmi_intr+58 ()

fffffe8000c99a30 unix:av_dispatch_nmivect+1f ()

fffffe8000c99a40 unix:nmiint+17e ()

fffffe8000c99b50 acpica:AcpiOsReadPort+d6 ()

fffffe8000c99b70 unix:cpu_acpi_read_port+11 ()

fffffe8000c99be0 unix:acpi_cpu_cstate+2d7 ()

fffffe8000c99c10 unix:cpu_acpi_idle+ac ()

fffffe8000c99c20 unix:cpu_idle_adaptive+13 ()

fffffe8000c99c40 unix:idle+89 ()

fffffe8000c99c50 unix:thread_start+8 ()



syncing file systems... 2 done

dumping to /dev/dsk/c0t0d0s1, offset 108593152, content: kernel

0:04 100% done

100% done: 268707 pages dumped, dump succeeded

rebooting...



Send the core to Oracle for analysis.


Previously Published As 15553

References

<NOTE:1004506.1> - How to Force a Crash Dump When the Solaris Operating System is Hung

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback