![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||
Solution Type Technical Instruction Sure Solution 1173064.1 : Oracle ZFS Storage Appliance: How to generate a system core dump in case of system hang (BUI and CLI fails to respond) using NMI when directed to do so by an Oracle Support Engineer
In this Document
Applies to:Sun ZFS Storage 7420 - Version All Versions and laterSun ZFS Storage 7120 - Version All Versions and later Sun Storage 7310 Unified Storage System - Version All Versions and later Sun Storage 7410 Unified Storage System - Version All Versions and later Sun Storage 7110 Unified Storage System - Version All Versions and later 7000 Appliance OS (Fishworks) GoalHow to generate a system core dump in case of system hang (BUI and CLI fails to respond) using NMI. Before performing the NMI, a Service Request should be opened to Oracle Support with an Engineer who can verify the system status and collect any additional information - that will be lost once the NMI is The Oracle Engineer will confirm if NMI is necessary and when no further data collection is required and the NMI can be performed. To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Disk Storage ZFS Storage Appliance Community
SolutionBefore collecting a system crash dump, try to retrieve some akd information as per Doc 1401288.1 : Storage 7000 Unified Storage System: Data collection for akd hang issue Also, please collect a gcore of the 'fmd' (FMA) daemon: (shell) gcore -o /var/ak/dropbox/core.fmd `pgrep fmd` & The following will stop a hung system by generating a Non-Maskable Interrupt (NMI). It should force a core dump and reboot the node.
-> cd /SP/diag
-> set generate_host_nmi=true From the ILOM 3.x revision : -> cd /HOST/
-> set generate_host_nmi=true The console session should report something similar to the following: panic[cpu2]/thread=ffffff001eccbc60: NMI received
ffffff001eccbac0 pcplusmp:apic_nmi_intr+7c () ffffff001eccbaf0 unix:av_dispatch_nmivect+30 () ffffff001eccbb00 unix:nmiint+154 () ffffff001eccbbf0 unix:mach_cpu_idle+b () ffffff001eccbc20 unix:cpu_idle+c2 () ffffff001eccbc40 unix:idle+114 () ffffff001eccbc50 unix:thread_start+8 () syncing file systems... done dumping to /dev/zvol/dsk/system/dump, offset 65536, content: kernel + curproc 100% done: 356267 pages dumped, compression ratio 3.84, dump succeeded
PLEASE NOTE: The 'savecore' process - to copy the corefile from the dump device into the root filesystem - must have completed before the supportbundle is collected. There is no supported method that the customer has to verify savecore has finished and the time this takes can vary widely. A suggestion may be to wait a minimum of one hour (?) or contact Oracle support to confirm the next steps. If the supportbundle is generated too soon, it may contain an incomplete core and the supportbundle is deleted from the system after upload. We then have no core dump from the NMI and no possibility of RCA from the NMI, so it is important that the bundle is not generated until the savecore is complete.
Generate a bundle after the reboot and the core should be in the cores section of the bundle.
Oracle engineers can drop to the shell and check 'debug.sys' and wait for a similar message to: Jan 9 17:29:44 hostname savecore: [ID 165606 auth.error] Decompress the crash dump with
Jan 9 17:29:44 hostname 'savecore -vf /var/ak/core/vmdump.2' and possibly check for the 'savecore' process still running.
Refer to Sun Storage 7000 Unified Storage System: How to collect supportfile bundle using the BUI or CLI (Doc ID 1019887.1)
NMI switch location for 7120 and 7320: NMI switch location for 7420, BA, ZS3-4 and ZS3-BA:
NMI switch location for ZS3-2: Please see - https://support.oracle.com/handbook_private/Systems/ZS3_2/component.rear_zoom.html . NMI is the pinhole between the VGA connector and the SER MGT port. NMI switch location for ZS3-ES: The NMI switch is between LEDs and NET MGT port. Please see https://mosemp.us.oracle.com/handbook_internal/Systems/ZS3_ES/component.rear_zoom.html NMI switch location for ZS4-4: Please see https://docs.oracle.com/cd/E38212_01/html/E38213/xffsm.gnjil.html Tip for Oracle Storage-TSC Support Engineer: In order to minimize the delay between the akd core collection and the dump collection, the gcore of akd must be generated as close as possible to the NMI in order for the akd userland threads to be corresponding to the kthread stacks. #/bin/bash
#cd /var/ak/dropbox; gcore `pgrep -ox akd` # ipmitool chassis power diag
./script.sh
NOTE: ipmitool chassis power diag may not work, if does not, ipmitool power diag can be used Example ss7120-sin06-a# ipmitool power diag
rol: Diagower Cont panic[cpu6]/thread=ffffff002eb05c40: NMI received ffffff002eb05a70 pcplusmp:apic_nmi_intr+7c () ffffff002eb05aa0 unix:av_dispatch_nmivect+30 () ffffff002eb05ab0 unix:nmiint+152 () ffffff002eb05ba0 unix:i86_mwait+d () ffffff002eb05bf0 unix:cpu_idle_mwait+158 () ffffff002eb05c20 unix:idle+112 () ffffff002eb05c30 unix:thread_start+8 () syncing file systems... done dumping to /dev/zvol/dsk/system/dump, offset 65536, content: kernel + curproc 0:15 13% done
Checked for Currency - 18-FEB-2017 Attachments This solution has no attachment |
||||||||||||||||
|