Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1546760.1
Update Date:2018-03-20
Keywords:

Solution Type  Problem Resolution Sure

Solution  1546760.1 :   Sun Storage 7000 Unified Storage System: System Panic: Bad Trap: Type=e (Page Fault) Occurred In Module "Zfs" Due To A Null Pointer Dereference  


Related Items
  • Sun ZFS Storage 7320
  •  
  • Sun Storage 7210 Unified Storage System
  •  
  • Sun Storage 7410 Unified Storage System
  •  
  • Sun ZFS Storage 7420
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Sun Storage 7110 Unified Storage System
  •  
  • Sun ZFS Backup Appliance
  •  
  • Sun ZFS Storage 7120
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  


ak-2011.1.4.2 system panic

In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-7061059631>

Applies to:

Sun Storage 7310 Unified Storage System - Version All Versions to All Versions [Release All Releases]
Sun Storage 7410 Unified Storage System - Version All Versions to All Versions [Release All Releases]
Sun ZFS Backup Appliance - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7120 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7320 - Version All Versions to All Versions [Release All Releases]
7000 Appliance OS (Fishworks)

Symptoms

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Disk Storage ZFS Storage Appliance

### Problem statement

zfssa system panic

### Symptoms

A Kernel Panic on the Node.  "The system has rebooted after a kernel panic."


https://<nas-ip>:215/#maintenance/logs=fltlog
The system has rebooted after a kernel panic.

https://<nas-ip>:215/#maintenance/logs=alert

Description The system has rebooted after a kernel panic.
Type Major Defect
Impact There may be some performance impact while the panic is copied to the savecore directory. Disk space usage by panics can be substantial.
Automated response The failed system image has been preserved for analysis.
Recommended action Generate a Support Bundle and contact you support provider with the id of that bundle.
Event time 2013-4-14 19:22:20
Unique Identifier 5886c184-ba89-c1f1-93d5-a1cca7b3080d
Status This alert is associated with an active problem on the system. For more information specific to this alert, see the 'Active Problems' page.

 



1. debug.sys show panic string:


bash-4.1$ cd /cores/support_bundle
bash-4.1$ grep panic logs/debug.sys
Apr 13 05:49:34 zfssa1 ^Mpanic[cpu78]/thread=fffffe8824edec40:
Apr 13 12:28:53 zfssa1 savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf Page fault) rp=fffffe8824ede6c0 addr=68 occurred in module "zfs" due to a NULL pointer dereference
Apr 14 22:33:30 zfssa1 savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf Page fault) rp=fffffe8824ede6c0 addr=68 occurred in module "zfs" due to a NULL pointer dereference
bash-4.1$

2. stacktrace for the panic is:

bash-4.1$ more logs/debug.sys or system.sys
bash-4.1$ cd core/ (where vmdump will be located, this is part of the panic that is in scat> analyze)

 

panic[cpu78]/thread=fffffe8824edec40: BAD TRAP: type=e (#pf Page fault) rp=fffffe8824ede6c0 addr=68 occurred in module "zfs" due to a NULL pointer dereference

zpool-pool-0: #pf Page fault
Bad kernel fault at addr=0x68
pid=3647, pc=0xfffffffff79c2657, sp=0xfffffe8824ede7b0, eflags=0x10282
cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
cr2: 68cr3: c400000cr8: 0

rdi: fffffa8973d36660 rsi: 3 rdx: fffffe8824edec40
rcx: 0 r8: fffffffff79c2630 r9: 0
rax: 0 rbx: 100000 rbp: fffffe8824ede7f0
r10: fffff60000059008 r11: a6667117e70f115f r12: fffffa8973d36660
r13: 0 r14: 0 r15: fffff74719f68258
fsb: 0 gsb: fffff6018efab000 ds: 4b
es: 4b fs: 0 gs: 1c3
trp: e err: 0 rip: fffffffff79c2657
cs: 30 rfl: 10282 rsp: fffffe8824ede7b0
ss: 38

fffffe8824ede5a0 unix:die+dd ()
fffffe8824ede6b0 unix:trap+1799 ()
fffffe8824ede6c0 unix:cmntrap+e6 ()
fffffe8824ede7f0 zfs:arc_read_done+27 ()
fffffe8824ede860 zfs:zio_done+377 ()
fffffe8824ede890 zfs:zio_execute+8d ()
fffffe8824ede8f0 zfs:zio_notify_parent+a6 ()
fffffe8824ede960 zfs:zio_done+3d6 ()
fffffe8824ede990 zfs:zio_execute+8d ()
fffffe8824ede9f0 zfs:zio_notify_parent+a6 ()
fffffe8824edea60 zfs:zio_done+3d6 ()
fffffe8824edea90 zfs:zio_execute+8d ()
fffffe8824edeb30 genunix:taskq_thread+248 ()
fffffe8824edeb40 unix:thread_start+8 ()

syncing file systems... done
dumping to /dev/zvol/dsk/system/dump, offset 65536, content: kernel + curproc

 

Another panic stack variance (from debug.sys)under bug 15817552.

Sep 11 21:24:15 adc07stor22 ^Mpanic[cpu11]/thread=ffffff03d0469c40:                                                                
Sep 11 21:24:15 adc07stor22 genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#                                                  
pf Page fault) rp=ffffff03d0469630 addr=0 occurred in module "zfs" due to a NULL                                                  
pointer dereference                                  

Sep 11 21:24:16 adc07stor22 unix: [ID 839527 kern.notice] zpool-pool22a:                                                          
Sep 11 21:24:16 adc07stor22 unix: [ID 753105 kern.notice] #pf Page fault                                                          
Sep 11 21:24:16 adc07stor22 unix: [ID 532287 kern.notice] Bad kernel fault at addr=0x0 to statd at 10.224.38.185, RPC: Timed out(5)
Sep 11 21:24:16 adc07stor22 unix: [ID 243837 kern.notice] pid=6512, pc=0xfffffffff79fb411, sp=0xffffff03d0469720, eflags=0x10246  
Sep 11 21:24:16 adc07stor22 unix: [ID 211416 kern.notice] cr0: 8005003b<pg,wp,ne,et,ts,mp,pe> cr4: 6f8<xmme,fxsr,pge,mce,pae,pse,de>
Sep 11 21:24:16 adc07stor22 unix: [ID 624947 kern.notice] cr2: 0                                                                  
Sep 11 21:24:16 adc07stor22 unix: [ID 625075 kern.notice] cr3: c400000                                                            
Sep 11 21:24:16 adc07stor22 unix: [ID 625715 kern.notice] cr8: 0                                                                  
Sep 11 21:24:16 adc07stor22 unix: [ID 100000 kern.notice]                                                                          
Sep 11 21:24:16 adc07stor22 unix: [ID 592667 kern.notice]       rdi: fffff639e583f801 rsi:                0 rdx:               fe  
Sep 11 21:24:16 adc07stor22 unix: [ID 592667 kern.notice]       rcx:             4000  r8:                0  r9:             4000  
Sep 11 21:24:16 adc07stor22 unix: [ID 592667 kern.notice]       rax:               fe rbx:           100000 rbp: ffffff03d0469750  
Sep 11 21:24:16 adc07stor22 unix: [ID 592667 kern.notice]       r10:                0 r11:                1 r12: fffff622175aa010  
Sep 11 21:24:16 adc07stor22 unix: [ID 592667 kern.notice]       r13: fffff64bcfdd33f0 r14:                0 r15: fffff622175aa320  
Sep 11 21:24:16 adc07stor22 unix: [ID 592667 kern.notice]       fsb:                0 gsb: fffff600c9ccc040  ds:               4b  
Sep 11 21:24:16 adc07stor22 unix: [ID 592667 kern.notice]        es:               4b  fs:                0  gs:              1c3  
Sep 11 21:24:16 adc07stor22 unix: [ID 592667 kern.notice]       trp:                e err:                2 rip: fffffffff79fb411  
Sep 11 21:24:17 adc07stor22 unix: [ID 592667 kern.notice]        cs:               30 rfl:            10246 rsp: ffffff03d0469720  
Sep 11 21:24:17 adc07stor22 unix: [ID 266532 kern.notice]        ss:               38                                              
Sep 11 21:24:17 adc07stor22 unix: [ID 100000 kern.notice]                                                                          
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469510 unix:die+dd ()                                      
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469620 unix:trap+1799 ()                                    
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469630 unix:cmntrap+e6 ()                                  
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469750 zfs:lzjb_decompress+a9 ()                            
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469790 zfs:zio_decompress_data+53 ()                        
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d04697c0 zfs:zio_decompress+56 ()                            
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d04697f0 zfs:zio_pop_transforms+3d ()                        
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469860 zfs:zio_done+152 ()                                  
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469890 zfs:zio_execute+8d ()                                
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d04698f0 zfs:zio_notify_parent+a6 ()                          
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469960 zfs:zio_done+3d6 ()                                  
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469990 zfs:zio_execute+8d ()                                
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d04699f0 zfs:zio_notify_parent+a6 ()                          
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469a60 zfs:zio_done+3d6 ()                                  
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469a90 zfs:zio_execute+8d ()                                
nunix:taskq_thread+248 ()                                                                                                          
Sep 11 21:24:17 adc07stor22 genunix: [ID 655072 kern.notice] ffffff03d0469b40 unix:thread_start+8 ()                              
Sep 11 21:24:17 adc07stor22 unix: [ID 100000 kern.notice]                                                                          
Sep 11 21:24:17 adc07stor22 genunix: [ID 672855 kern.notice] syncing file systems...                                              
Sep 11 21:24:17 adc07stor22 genunix: [ID 904073 kern.notice]  done

 

### Configuration

2x7420 in cluster

Appliance Name: zfssa1
Appliance Type: Sun ZFS Storage 7420
Appliance Version: 2011.04.24.4.2,1-2.28.1.1
Last Booted: Tue Feb 05 2013 09:37:37 GMT+0000 (UTC)
Appliance Kit: ak/SUNW,otoro@2011.04.24.4.2,1-2.28.1.1
Operating System: SunOS 5.11 ak/generic@2011.04.24.4.2,1-2.28.1.1 64-bit
BIOS: American Megatrends Inc. 16030103 12/01/2011
Service Processor: 3.0.16.12


Appliance Name: zfssa2
Appliance Type: Sun ZFS Storage 7420
Appliance Version: 2011.04.24.4.2,1-2.28.1.1
Last Booted: Sun Apr 14 2013 19:22:20 GMT+0000 (UTC)
Appliance Kit: ak/SUNW,otoro@2011.04.24.4.2,1-2.28.1.1
Operating System: SunOS 5.11 ak/generic@2011.04.24.4.2,1-2.28.1.1 64-bit
BIOS: American Megatrends Inc. 16030103 12/01/2011
Service Processor: 3.0.16.12

Cause

 https://<nas-ip>:215/#maintenance/problems

MARK REPAIRED
Description The system has rebooted after a kernel panic.
Type Major Defect
Impact There may be some performance impact while the panic is copied to the savecore directory. Disk space usage by panics can be substantial.
Affected components
100% sw:///:path=/var/ak/core/.<Unique Identifier> (in service)
Automated response The failed system image has been preserved for analysis.
Recommended action Generate a Support Bundle and contact you support provider with the id of that bundle. If you are a qualified service person, detailed information on this problem can be found at http://sun.com/msg/SUNOS-8000-KL
Event time 2013-4-14 19:22:20
Unique Identifier 5886c184-ba89-c1f1-93d5-a1cca7b3080d
Phoned home
Never

### Let's see what is in the core file:

in support_bundle/core/
bash-4.1$ savecore -f vmdump.# .
let the unix.# and vmcore.# build
then
bash-4.1$ scat vmcore.#


core file: /cores/3-7061059631/core/vmcore.0
user: Roberto Martini (rmartini:107940)
release: 5.11 (64-bit)
version: ak/generic@2011.04.24.4.2,1-2.28.1.1
machine: i86pc
node name: adc39stor20
system type: i86pc
hostid: 0
dump_conflags: 0x40000 (DUMP_CURPROC) on /dev/zvol/dsk/system/dump(264G)
snooping: 0x0
moddebug: 0x10 (NOAUTOUNLOAD)
dump_uuid: f6b56ee7-ffcf-4402-e37c-fbc0bb4d14f4
time of crash: Sat Apr 13 05:49:33 UTC 2013
age of system: 66 days 19 hours 57 minutes 44.771754210 seconds
panic CPU: 78 (80 CPUs, 1023G memory)
panic string: BAD TRAP: type=e (#pf Page fault) rp=fffffe8824ede6c0 addr=68 occurred in module "zfs" due to a NULL pointer dereference

### msgbuf doesn't show any message just before panic.

### stack trace for the panic is: (using scat vmcore.#)

CAT(vmcore.0/11X)> panic

panic on CPU 78
panic string: BAD TRAP: type=e (#pf Page fault) rp=fffffe8824ede6c0 addr=68 occurred in module "zfs" due to a NULL pointer dereference
==== panic kernel (with lwp) thread: 0xfffffe8824edec40 PID: 3647 on CPU: 78 affinity CPU: 78 ====
cmd: zpool-pool20a
t_procp: 0xfffff601913b6018
  p_as: 0xfffffffffbc301a0(kas)
  zone: global
t_stk: 0xfffffe8824edeb50 sp: 0xfffffe8824ede420 t_stkbase: 0xfffffe8824eda000
t_pri: 99(SDC) t_tid: 107 pctcpu: 4.121339
t_lwp: 0xfffff60262dac700 lwp_regs: 0xfffffe8824edeb50
  mstate: LMS_SYSTEM ms_prev: LMS_KFAULT
  ms_state_start: 93 days 3 hours 52 minutes 10.527300702 seconds later
  ms_start: 66 days 19 hours 31 minutes 21.246733589 seconds earlier
psrset: 0 last CPU: 78
idle: 0 ticks (0s)
start: Tue Feb 5 09:58:21 2013
age: 5773872 seconds (66 days 19 hours 51 minutes 12 seconds)
tstate: TS_ONPROC - thread is being run on a processor
tflg: T_TALLOCSTK - thread structure allocated from stk
  T_PANIC - thread initiated a system panic
tpflg: TP_MSACCT - collect micro-state accounting information
tsched: TS_LOAD - thread is in memory
  TS_DONT_SWAP - thread/LWP should not be swapped
  TS_SIGNALLED - thread was awakened by cv_signal()
  TS_RUNQMATCH
pflag: SSYS - system resident process
  SNOWAIT - children never become zombies

pc: unix:vpanic_common+0x13a: addq $0xf0,%rsp

unix:vpanic_common+0x13a()
unix:panic+0x94(, , ...)
unix:die+0xdd(0xe, 0xfffffe8824ede6c0, 0x68, 0x4e)
unix:trap+0x1799(0xfffffe8824ede6c0, 0x68, 0x4e)
unix:cmntrap_pushed+0x3c()
-- panic trap data type: 0xe (Page fault)
  addr 0x68 rp 0xfffffe8824ede6c0
  trapno 0xe (Page fault)
  err 0 (page not present,read,supervisor)
  %rfl 0x10282 (negative|interrupt enable|resume)
  savbp 0xfffffe8824ede7f0
  savip zfs:arc_read_done+0x27: movq 0x68(%r13),%rdi

  %rbp 0xfffffe8824ede7f0 %rsp 0xfffffe8824ede7b0
  %rip zfs:arc_read_done+0x27: movq 0x68(%r13),%rdi

  0%rdi 0xfffffa8973d36660 1%rsi 0x3 2%rdx 0xfffffe8824edec40
  3%rcx 0 4%r8 0xfffffffff79c2630 5%r9 0

  %rax 0 %rbx 0x100000
  %r10 0xfffff60000059008 %r11 0xa6667117e70f115f %r12 0xfffffa8973d36660
  %r13 0 %r14 0 %r15 0xfffff74719f68258
  %cs 0x30 (KDS_SEL) %ds 0x4b (UCS_SEL)
  %es 0x4b (UCS_SEL) %ss 0x38 (GDT_U32CODE,KPL)
  %fs 0 (KFS_SEL) %gs 0x1c3 (LWPGS_SEL)
  fsbase 0xfffe8824ede7f0ff
  gsbase 0xfffe8824ede7f0ff
zfs:arc_read_done+0x27(0xfffffa8973d36660)
zfs:zio_done+0x377(0xfffffa8973d36660)
zfs:zio_execute+0x8d(0xfffffa8973d36660)
zfs:zio_notify_parent+0xa6(0xfffffa8973d36660, 0xfffff74934e38338, 0x1)
zfs:zio_done+0x3d6(0xfffff74934e38338)
zfs:zio_execute+0x8d(0xfffff74934e38338)
zfs:zio_notify_parent+0xa6(0xfffff74934e38338, 0xfffff9355ea6a010, 0x1)
zfs:zio_done+0x3d6(0xfffff9355ea6a010)
zfs:zio_execute+0x8d(0xfffff9355ea6a010)
genunix:taskq_thread+0x248(0xfffff601989487b8)
unix:thread_start+0x8()

 -- end of kernel (with lwp) thread's stack --

CAT(vmcore.0/11X)>

### so panic occurs on arc_read_done function. Looking to arc buffer we get the NULL pointer:

CAT(vmcore.0/11X)> sdump 0xfffffa8973d36660 zio io_private
  io_private = 0xfffff74719f68258
CAT(vmcore.0/11X)> sdump 0xfffff74719f68258 arc_buf_t
{
  b_hdr = NULL !! this NULL pointer triggered the BAD TRAP panic
  b_next = NULL
  b_evict_lock = {
  _opaque = [ NULL ]
  }
  b_data_lock = {
  _opaque = [ NULL ]
  }
  b_data = NULL
  b_efunc = NULL
  b_private = NULL
}
CAT(vmcore.0/11X)>

 

This is the problem "b_hdr = NULL !! this NULL pointer "

 

Solution

Please open an SR for this issue with Oracle Support and submit a Support Bundle for review.

Generate a support bundle https://<nas-ip>:215/#maintenance/system click the '+' plus sign next to Support Bundles for Oracle to review the sw:///:path=/var/ak/core/.<Unique Identifier>

The failed system image has been preserved for analysis.

 

Stacktrace match Bug 16370523 "7420 CLUSTER PEER PANIC'D "BAD TRAP...."ZFS" DUE TO A NULL POINTER DEREFERENCE", which is duplicated of BUG 15812101 "SUNBT7193862 *someone* is abusing zio_buf_free()", fixed in ak-2011.1.6.0 release.

Please upgrade zfssa system to ak-2011.1.6.0 or above

References

<BUG:15812101> - SUNBT7193862 *SOMEONE* IS ABUSING ZIO_BUF_FREE
<BUG:16370523> - 7420 CLUSTER PEER PANIC'D "BAD TRAP...."ZFS" DUE TO A NULL POINTER DEREFERENCE"
<BUG:15817552> - SUNBT7200081 BAD TRAP TYPE=E OCCURRED IN MODULE "ZFS" DUE TO A NULL POINTER DERE
<NOTE:1173706.1> - SUNOS-8000-KL - Kernel Panic

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback