Asset ID: |
1-72-1546760.1 |
Update Date: | 2018-03-20 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1546760.1
:
Sun Storage 7000 Unified Storage System: System Panic: Bad Trap: Type=e (Page Fault) Occurred In Module "Zfs" Due To A Null Pointer Dereference
Related Items |
- Sun ZFS Storage 7320
- Sun Storage 7210 Unified Storage System
- Sun Storage 7410 Unified Storage System
- Sun ZFS Storage 7420
- Sun Storage 7310 Unified Storage System
- Sun Storage 7110 Unified Storage System
- Sun ZFS Backup Appliance
- Sun ZFS Storage 7120
|
Related Categories |
- PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
|
ak-2011.1.4.2 system panic
In this Document
Created from <SR 3-7061059631>
Applies to:
Sun Storage 7310 Unified Storage System - Version All Versions to All Versions [Release All Releases]
Sun Storage 7410 Unified Storage System - Version All Versions to All Versions [Release All Releases]
Sun ZFS Backup Appliance - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7120 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7320 - Version All Versions to All Versions [Release All Releases]
7000 Appliance OS (Fishworks)
Symptoms
### Problem statement
zfssa system panic
### Symptoms
A Kernel Panic on the Node. "The system has rebooted after a kernel panic."
https://<nas-ip>:215/#maintenance/logs=fltlog
The system has rebooted after a kernel panic.
https://<nas-ip>:215/#maintenance/logs=alert
Description |
The system has rebooted after a kernel panic. |
Type |
Major Defect |
Impact |
There may be some performance impact while the panic is copied to the savecore directory. Disk space usage by panics can be substantial. |
Automated response |
The failed system image has been preserved for analysis. |
Recommended action |
Generate a Support Bundle and contact you support provider with the id of that bundle. |
Event time |
2013-4-14 19:22:20 |
Unique Identifier |
5886c184-ba89-c1f1-93d5-a1cca7b3080d |
Status |
This alert is associated with an active problem on the system. For more information specific to this alert, see the 'Active Problems' page. |
1. debug.sys show panic string:
bash-4.1$ cd /cores/support_bundle
bash-4.1$ grep panic logs/debug.sys
Apr 13 05:49:34 zfssa1 ^Mpanic[cpu78]/thread=fffffe8824edec40:
Apr 13 12:28:53 zfssa1 savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf Page fault) rp=fffffe8824ede6c0 addr=68 occurred in module "zfs" due to a NULL pointer dereference
Apr 14 22:33:30 zfssa1 savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf Page fault) rp=fffffe8824ede6c0 addr=68 occurred in module "zfs" due to a NULL pointer dereference
bash-4.1$
2. stacktrace for the panic is:
bash-4.1$ more logs/debug.sys or system.sys
bash-4.1$ cd core/ (where vmdump will be located, this is part of the panic that is in scat> analyze)
panic[cpu78]/thread=fffffe8824edec40: BAD TRAP: type=e (#pf Page fault) rp=fffffe8824ede6c0 addr=68 occurred in module "zfs" due to a NULL pointer dereference
zpool-pool-0: #pf Page fault
Bad kernel fault at addr=0x68
pid=3647, pc=0xfffffffff79c2657, sp=0xfffffe8824ede7b0, eflags=0x10282
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
cr2: 68cr3: c400000cr8: 0
rdi: fffffa8973d36660 rsi: 3 rdx: fffffe8824edec40
rcx: 0 r8: fffffffff79c2630 r9: 0
rax: 0 rbx: 100000 rbp: fffffe8824ede7f0
r10: fffff60000059008 r11: a6667117e70f115f r12: fffffa8973d36660
r13: 0 r14: 0 r15: fffff74719f68258
fsb: 0 gsb: fffff6018efab000 ds: 4b
es: 4b fs: 0 gs: 1c3
trp: e err: 0 rip: fffffffff79c2657
cs: 30 rfl: 10282 rsp: fffffe8824ede7b0
ss: 38
fffffe8824ede5a0 unix:die+dd ()
fffffe8824ede6b0 unix:trap+1799 ()
fffffe8824ede6c0 unix:cmntrap+e6 ()
fffffe8824ede7f0 zfs:arc_read_done+27 ()
fffffe8824ede860 zfs:zio_done+377 ()
fffffe8824ede890 zfs:zio_execute+8d ()
fffffe8824ede8f0 zfs:zio_notify_parent+a6 ()
fffffe8824ede960 zfs:zio_done+3d6 ()
fffffe8824ede990 zfs:zio_execute+8d ()
fffffe8824ede9f0 zfs:zio_notify_parent+a6 ()
fffffe8824edea60 zfs:zio_done+3d6 ()
fffffe8824edea90 zfs:zio_execute+8d ()
fffffe8824edeb30 genunix:taskq_thread+248 ()
fffffe8824edeb40 unix:thread_start+8 ()
syncing file systems... done
dumping to /dev/zvol/dsk/system/dump, offset 65536, content: kernel + curproc
Another panic stack variance (from debug.sys)under bug 15817552.
Sep 11 21:24:15 adc07stor22 ^Mpanic[cpu11]/thread=ffffff03d0469c40:
Sep 11 21:24:15 adc07stor22 genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#
pf Page fault) rp=ffffff03d0469630 addr=0 occurred in module "zfs" due to a NULL
pointer dereference
Sep 11 21:24:16 adc07stor22 unix: [ID 839527 kern.notice] zpool-pool22a:
Sep 11 21:24:16 adc07stor22 unix: [ID 753105 kern.notice] #pf Page fault
Sep 11 21:24:16 adc07stor22 unix: [ID 532287 kern.notice] Bad kernel fault at addr=0x0 to statd at 10.224.38.185, RPC: Timed out(5)
Sep 11 21:24:16 adc07stor22 unix: [ID 243837 kern.notice] pid=6512, pc=0xfffffffff79fb411, sp=0xffffff03d0469720, eflags=0x10246
Sep 11 21:24:16 adc07stor22 unix: [ID 211416 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
Sep 11 21:24:16 adc07stor22 unix: [ID 624947 kern.notice] cr2: 0
Sep 11 21:24:16 adc07stor22 unix: [ID 625075 kern.notice] cr3: c400000
Sep 11 21:24:16 adc07stor22 unix: [ID 625715 kern.notice] cr8: 0
Sep 11 21:24:16 adc07stor22 unix: [ID 100000 kern.notice]
Sep 11 21:24:16 adc07stor22 unix: [ID 592667 kern.notice] rdi: fffff639e583f801 rsi: 0 rdx: fe
Sep 11 21:24:16 adc07stor22 unix: [ID 592667 kern.notice] rcx: 4000 r8: 0 r9: 4000
Sep 11 21:24:16 adc07stor22 unix: [ID 592667 kern.notice] rax: fe rbx: 100000 rbp: ffffff03d0469750
Sep 11 21:24:16 adc07stor22 unix: [ID 592667 kern.notice] r10: 0 r11: 1 r12: fffff622175aa010
Sep 11 21:24:16 adc07stor22 unix: [ID 592667 kern.notice] r13: fffff64bcfdd33f0 r14: 0 r15: fffff622175aa320
Sep 11 21:24:16 adc07stor22 unix: [ID 592667 kern.notice] fsb: 0 gsb: fffff600c9ccc040 ds: 4b
Sep 11 21:24:16 adc07stor22 unix: [ID 592667 kern.notice] es: 4b fs: 0 gs: 1c3
Sep 11 21:24:16 adc07stor22 unix: [ID 592667 kern.notice] trp: e err: 2 rip: fffffffff79fb411
Sep 11 21:24:17 adc07stor22 unix: [ID 592667 kern.notice] cs: 30 rfl: 10246 rsp: ffffff03d0469720
Sep 11 21:24:17 adc07stor22 unix: [ID 266532 kern.notice] ss: 38
Sep 11 21:24:17 adc07stor22 unix: [ID 100000 kern.notice]
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469510 unix:die+dd ()
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469620 unix:trap+1799 ()
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469630 unix:cmntrap+e6 ()
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469750 zfs:lzjb_decompress+a9 ()
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469790 zfs:zio_decompress_data+53 ()
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d04697c0 zfs:zio_decompress+56 ()
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d04697f0 zfs:zio_pop_transforms+3d ()
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469860 zfs:zio_done+152 ()
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469890 zfs:zio_execute+8d ()
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d04698f0 zfs:zio_notify_parent+a6 ()
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469960 zfs:zio_done+3d6 ()
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469990 zfs:zio_execute+8d ()
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d04699f0 zfs:zio_notify_parent+a6 ()
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469a60 zfs:zio_done+3d6 ()
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469a90 zfs:zio_execute+8d ()
nunix:taskq_thread+248 ()
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469b40 unix:thread_start+8 ()
Sep 11 21:24:17 adc07stor22 unix: [ID 100000 kern.notice]
Sep 11 21:24:17 adc07stor22 genunix: [ID 672855 kern.notice] syncing file systems...
Sep 11 21:24:17 adc07stor22 genunix: [ID 904073 kern.notice] done
### Configuration
2x7420 in cluster
Appliance Name: zfssa1
Appliance Type: Sun ZFS Storage 7420
Appliance Version: 2011.04.24.4.2,1-2.28.1.1
Last Booted: Tue Feb 05 2013 09:37:37 GMT+0000 (UTC)
Appliance Kit: ak/SUNW,otoro@2011.04.24.4.2,1-2.28.1.1
Operating System: SunOS 5.11 ak/generic@2011.04.24.4.2,1-2.28.1.1 64-bit
BIOS: American Megatrends Inc. 16030103 12/01/2011
Service Processor: 3.0.16.12
Appliance Name: zfssa2
Appliance Type: Sun ZFS Storage 7420
Appliance Version: 2011.04.24.4.2,1-2.28.1.1
Last Booted: Sun Apr 14 2013 19:22:20 GMT+0000 (UTC)
Appliance Kit: ak/SUNW,otoro@2011.04.24.4.2,1-2.28.1.1
Operating System: SunOS 5.11 ak/generic@2011.04.24.4.2,1-2.28.1.1 64-bit
BIOS: American Megatrends Inc. 16030103 12/01/2011
Service Processor: 3.0.16.12
Cause
https://<nas-ip>:215/#maintenance/problems
Description |
The system has rebooted after a kernel panic. |
Type |
Major Defect |
Impact |
There may be some performance impact while the panic is copied to the savecore directory. Disk space usage by panics can be substantial. |
Affected components |
100% |
sw:///:path=/var/ak/core/.<Unique Identifier> (in service) |
|
Automated response |
The failed system image has been preserved for analysis. |
Recommended action |
Generate a Support Bundle and contact you support provider with the id of that bundle. If you are a qualified service person, detailed information on this problem can be found at http://sun.com/msg/SUNOS-8000-KL |
Event time |
2013-4-14 19:22:20 |
Unique Identifier |
5886c184-ba89-c1f1-93d5-a1cca7b3080d |
Phoned home |
Never
|
### Let's see what is in the core file:
in support_bundle/core/
bash-4.1$ savecore -f vmdump.# .
let the unix.# and vmcore.# build
then
bash-4.1$ scat vmcore.#
core file: /cores/3-7061059631/core/vmcore.0
user: Roberto Martini (rmartini:107940)
release: 5.11 (64-bit)
version: ak/generic@2011.04.24.4.2,1-2.28.1.1
machine: i86pc
node name: adc39stor20
system type: i86pc
hostid: 0
dump_conflags: 0x40000 (DUMP_CURPROC) on /dev/zvol/dsk/system/dump(264G)
snooping: 0x0
moddebug: 0x10 (NOAUTOUNLOAD)
dump_uuid: f6b56ee7-ffcf-4402-e37c-fbc0bb4d14f4
time of crash: Sat Apr 13 05:49:33 UTC 2013
age of system: 66 days 19 hours 57 minutes 44.771754210 seconds
panic CPU: 78 (80 CPUs, 1023G memory)
panic string: BAD TRAP: type=e (#pf Page fault) rp=fffffe8824ede6c0 addr=68 occurred in module "zfs" due to a NULL pointer dereference
### msgbuf doesn't show any message just before panic.
### stack trace for the panic is: (using scat vmcore.#)
CAT(vmcore.0/11X)> panic
panic on CPU 78
panic string: BAD TRAP: type=e (#pf Page fault) rp=fffffe8824ede6c0 addr=68 occurred in module "zfs" due to a NULL pointer dereference
==== panic kernel (with lwp) thread: 0xfffffe8824edec40 PID: 3647 on CPU: 78 affinity CPU: 78 ====
cmd: zpool-pool20a
t_procp: 0xfffff601913b6018
p_as: 0xfffffffffbc301a0(kas)
zone: global
t_stk: 0xfffffe8824edeb50 sp: 0xfffffe8824ede420 t_stkbase: 0xfffffe8824eda000
t_pri: 99(SDC) t_tid: 107 pctcpu: 4.121339
t_lwp: 0xfffff60262dac700 lwp_regs: 0xfffffe8824edeb50
mstate: LMS_SYSTEM ms_prev: LMS_KFAULT
ms_state_start: 93 days 3 hours 52 minutes 10.527300702 seconds later
ms_start: 66 days 19 hours 31 minutes 21.246733589 seconds earlier
psrset: 0 last CPU: 78
idle: 0 ticks (0s)
start: Tue Feb 5 09:58:21 2013
age: 5773872 seconds (66 days 19 hours 51 minutes 12 seconds)
tstate: TS_ONPROC - thread is being run on a processor
tflg: T_TALLOCSTK - thread structure allocated from stk
T_PANIC - thread initiated a system panic
tpflg: TP_MSACCT - collect micro-state accounting information
tsched: TS_LOAD - thread is in memory
TS_DONT_SWAP - thread/LWP should not be swapped
TS_SIGNALLED - thread was awakened by cv_signal()
TS_RUNQMATCH
pflag: SSYS - system resident process
SNOWAIT - children never become zombies
pc: unix:vpanic_common+0x13a: addq $0xf0,%rsp
unix:vpanic_common+0x13a()
unix:panic+0x94(, , ...)
unix:die+0xdd(0xe, 0xfffffe8824ede6c0, 0x68, 0x4e)
unix:trap+0x1799(0xfffffe8824ede6c0, 0x68, 0x4e)
unix:cmntrap_pushed+0x3c()
-- panic trap data type: 0xe (Page fault)
addr 0x68 rp 0xfffffe8824ede6c0
trapno 0xe (Page fault)
err 0 (page not present,read,supervisor)
%rfl 0x10282 (negative|interrupt enable|resume)
savbp 0xfffffe8824ede7f0
savip zfs:arc_read_done+0x27: movq 0x68(%r13),%rdi
%rbp 0xfffffe8824ede7f0 %rsp 0xfffffe8824ede7b0
%rip zfs:arc_read_done+0x27: movq 0x68(%r13),%rdi
0%rdi 0xfffffa8973d36660 1%rsi 0x3 2%rdx 0xfffffe8824edec40
3%rcx 0 4%r8 0xfffffffff79c2630 5%r9 0
%rax 0 %rbx 0x100000
%r10 0xfffff60000059008 %r11 0xa6667117e70f115f %r12 0xfffffa8973d36660
%r13 0 %r14 0 %r15 0xfffff74719f68258
%cs 0x30 (KDS_SEL) %ds 0x4b (UCS_SEL)
%es 0x4b (UCS_SEL) %ss 0x38 (GDT_U32CODE,KPL)
%fs 0 (KFS_SEL) %gs 0x1c3 (LWPGS_SEL)
fsbase 0xfffe8824ede7f0ff
gsbase 0xfffe8824ede7f0ff
zfs:arc_read_done+0x27(0xfffffa8973d36660)
zfs:zio_done+0x377(0xfffffa8973d36660)
zfs:zio_execute+0x8d(0xfffffa8973d36660)
zfs:zio_notify_parent+0xa6(0xfffffa8973d36660, 0xfffff74934e38338, 0x1)
zfs:zio_done+0x3d6(0xfffff74934e38338)
zfs:zio_execute+0x8d(0xfffff74934e38338)
zfs:zio_notify_parent+0xa6(0xfffff74934e38338, 0xfffff9355ea6a010, 0x1)
zfs:zio_done+0x3d6(0xfffff9355ea6a010)
zfs:zio_execute+0x8d(0xfffff9355ea6a010)
genunix:taskq_thread+0x248(0xfffff601989487b8)
unix:thread_start+0x8()
-- end of kernel (with lwp) thread's stack --
CAT(vmcore.0/11X)>
### so panic occurs on arc_read_done function. Looking to arc buffer we get the NULL pointer:
CAT(vmcore.0/11X)> sdump 0xfffffa8973d36660 zio io_private
io_private = 0xfffff74719f68258
CAT(vmcore.0/11X)> sdump 0xfffff74719f68258 arc_buf_t
{
b_hdr = NULL !! this NULL pointer triggered the BAD TRAP panic
b_next = NULL
b_evict_lock = {
_opaque = [ NULL ]
}
b_data_lock = {
_opaque = [ NULL ]
}
b_data = NULL
b_efunc = NULL
b_private = NULL
}
CAT(vmcore.0/11X)>
This is the problem "b_hdr = NULL !! this NULL pointer "
Solution
Please open an SR for this issue with Oracle Support and submit a Support Bundle for review.
Generate a support bundle https://<nas-ip>:215/#maintenance/system click the '+' plus sign next to Support Bundles for Oracle to review the sw:///:path=/var/ak/core/.<Unique Identifier>
The failed system image has been preserved for analysis.
Stacktrace match Bug 16370523 "7420 CLUSTER PEER PANIC'D "BAD TRAP...."ZFS" DUE TO A NULL POINTER DEREFERENCE", which is duplicated of BUG 15812101 "SUNBT7193862 *someone* is abusing zio_buf_free()", fixed in ak-2011.1.6.0 release.
Please upgrade zfssa system to ak-2011.1.6.0 or above
References
<BUG:15812101> - SUNBT7193862 *SOMEONE* IS ABUSING ZIO_BUF_FREE
<BUG:16370523> - 7420 CLUSTER PEER PANIC'D "BAD TRAP...."ZFS" DUE TO A NULL POINTER DEREFERENCE"
<BUG:15817552> - SUNBT7200081 BAD TRAP TYPE=E OCCURRED IN MODULE "ZFS" DUE TO A NULL POINTER DERE
<NOTE:1173706.1> - SUNOS-8000-KL - Kernel Panic
Attachments
This solution has no attachment