![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1674811.1 : Oracle ZFS Storage Appliance: Panic with ZFS Null Pointer Dereference in arc_hash_remove()
In this Document
Created from <SR 3-8933876631> Applies to:Oracle ZFS Storage ZS3-2 - Version All Versions and laterSun Storage 7110 Unified Storage System - Version All Versions and later Oracle ZFS Storage ZS3-4 - Version All Versions and later Sun ZFS Storage 7420 - Version All Versions to All Versions [Release All Releases] Sun Storage 7310 Unified Storage System - Version All Versions and later 7000 Appliance OS (Fishworks) SymptomsSystem panic and rebooted. In clustered appliance, pool and data services are taken over by the peer appliance head. CauseSystem panic: May 1 06:01:25 hostname ^Mpanic[cpu19]/thread=ffffff03d04c9c40:
May 1 06:01:25 hostname genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=ffffff03d04c9a40 addr=28 occurred in module "zfs" due to a NULL pointer dereference May 1 06:01:25 hostname unix: [ID 100000 kern.notice] May 1 06:01:25 hostname unix: [ID 839527 kern.notice] sched: May 1 06:01:25 hostname unix: [ID 753105 kern.notice] #pf Page fault May 1 06:01:25 hostname unix: [ID 532287 kern.notice] Bad kernel fault at addr=0x28 May 1 06:01:25 hostname unix: [ID 243837 kern.notice] pid=0, pc=0xfffffffff7a0a378, sp=0xffffff03d04c9b30, eflags=0x10213 May 1 06:01:25 hostname unix: [ID 211416 kern.notice] cr0: 8005003b cr4: 6f8 May 1 06:01:25 hostname unix: [ID 624947 kern.notice] cr2: 28 May 1 06:01:25 hostname unix: [ID 625075 kern.notice] cr3: 7217000 May 1 06:01:25 hostname unix: [ID 625715 kern.notice] cr8: 0 May 1 06:01:25 hostname unix: [ID 100000 kern.notice] May 1 06:01:25 hostname unix: [ID 592667 kern.notice] rdi: fffff6face6e0038 rsi: 7 rdx: ffffff03d04c9c40 May 1 06:01:25 hostname unix: [ID 592667 kern.notice] rcx: 28 r8: fffff6face64e438 r9: 1 May 1 06:01:25 hostname unix: [ID 592667 kern.notice] rax: 0 rbx: 34f3a8 rbp: ffffff03d04c9b40 May 1 06:01:26 hostname unix: [ID 592667 kern.notice] r10: ffffff03d49aec40 r11: ffffff03d04c9c40 r12: fffff6028e93b700 May 1 06:01:26 hostname unix: [ID 592667 kern.notice] r13: fffff6028e93b748 r14: fffff6face6e0038 r15: fffff60083c22a00 May 1 06:01:26 hostname unix: [ID 592667 kern.notice] fsb: 0 gsb: fffff600cabe4040 ds: 4b May 1 06:01:26 hostname unix: [ID 592667 kern.notice] es: 4b fs: 0 gs: 1c3 May 1 06:01:26 hostname unix: [ID 592667 kern.notice] trp: e err: 0 rip: fffffffff7a0a378 May 1 06:01:26 hostname unix: [ID 592667 kern.notice] cs: 30 rfl: 10213 rsp: ffffff03d04c9b30 May 1 06:01:26 hostname unix: [ID 266532 kern.notice] ss: 0 May 1 06:01:26 hostname unix: [ID 100000 kern.notice] May 1 06:01:26 hostname genunix: [ID 655072 kern.notice] ffffff03d04c9960 unix:die+105 () May 1 06:01:26 hostname genunix: [ID 655072 kern.notice] ffffff03d04c9a30 unix:trap+152b () May 1 06:01:26 hostname genunix: [ID 655072 kern.notice] ffffff03d04c9a40 unix:cmntrap+e7 () May 1 06:01:26 hostname genunix: [ID 655072 kern.notice] ffffff03d04c9b40 zfs:arc_hash_remove+28 () May 1 06:01:26 hostname genunix: [ID 655072 kern.notice] ffffff03d04c9bb0 zfs:l2arc_evict+1a5 () May 1 06:01:26 hostname genunix: [ID 655072 kern.notice] ffffff03d04c9c20 zfs:l2arc_feed_thread+163 () May 1 06:01:26 hostname genunix: [ID 655072 kern.notice] ffffff03d04c9c30 unix:thread_start+8 () May 1 06:01:26 hostname unix: [ID 100000 kern.notice] May 1 06:01:26 hostname genunix: [ID 672855 kern.notice] syncing file systems... May 1 06:01:27 hostname genunix: [ID 904073 kern.notice] done May 1 06:01:25 hostname genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/system/dump, offset 65536, content: kernel + curproc May 1 09:53:46 hostname genunix: [ID 100000 kern.notice] May 1 09:53:46 hostname genunix: [ID 665016 kern.notice] ^M100% done: 38706698 pages dumped, May 1 09:53:46 hostname genunix: [ID 851671 kern.notice] dump succeeded > ::status
debugging crash dump vmcore.0 (64-bit) from slcnas460 operating system: 5.11 ak/generic@2013.06.05.1.8,1-1.1 (i86pc) image uuid: c10eb749-c0b5-c6f1-aade-dd98d2d37e23 panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff03d04c9a40 addr=28 occurred in module "zfs" due to a NULL pointer dereference dump content: kernel pages only
> ::stack
arc_hash_remove+0x28(fffff6face6e0038) l2arc_evict+0x1a5(fffff6028e93b700, 800000, 0) l2arc_feed_thread+0x163() thread_start+8()
The based bug fix and explanation is available in CR 18695640 : RACE BETWEEN ARC_WRITE_DONE() AND ARC_EVICT_BUF() CR 18695640 and 19013554 fixes are back ported to Solaris 10 in patch 150400-21 (SPARC) and 150401-21 (X86) See also CR 19013554 : read completion is racy and the fix is included in Micro release 2013.1.2.6. Example panic stacktrace for CR 19013554: Jul 2 11:49:46 levsc02 genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=ffffff00b81d39a0 addr=3f24ef1c28 occurred in module "zfs" due to an illegal access to a user address
Jul 2 11:49:46 levsc02 unix: [ID 100000 kern.notice] Jul 2 11:49:46 levsc02 unix: [ID 839527 kern.notice] sched: Jul 2 11:49:46 levsc02 unix: [ID 753105 kern.notice] #pf Page fault Jul 2 11:49:46 levsc02 unix: [ID 532287 kern.notice] Bad kernel fault at addr=0x3f24ef1c28 Jul 2 11:49:47 levsc02 genunix: [ID 655072 kern.notice] ffffff00b81d38c0 unix:die+105 () Jul 2 11:49:48 levsc02 genunix: [ID 655072 kern.notice] ffffff00b81d3990 unix:trap+152b () Jul 2 11:49:48 levsc02 genunix: [ID 655072 kern.notice] ffffff00b81d39a0 unix:cmntrap+e7 () Jul 2 11:49:48 levsc02 genunix: [ID 655072 kern.notice] ffffff00b81d3aa0 zfs:l2arc_destroy_cookie+1d () Jul 2 11:49:48 levsc02 genunix: [ID 655072 kern.notice] ffffff00b81d3b10 zfs:l2arc_free_buffers+f6 () Jul 2 11:49:48 levsc02 genunix: [ID 655072 kern.notice] ffffff00b81d3b50 zfs:l2arc_evict_one+67 () Jul 2 11:49:48 levsc02 genunix: [ID 655072 kern.notice] ffffff00b81d3ba0 zfs:l2arc_feed_pool+139 () Jul 2 11:49:48 levsc02 genunix: [ID 655072 kern.notice] ffffff00b81d3c00 zfs:l2arc_feed_thread+125 () Jul 2 11:49:48 levsc02 genunix: [ID 655072 kern.notice] ffffff00b81d3c10 unix:thread_start+8 () Jul 2 11:49:48 levsc02 unix: [ID 100000 kern.notice] Jul 2 11:49:48 levsc02 genunix: [ID 672855 kern.notice] syncing file systems... Jul 2 11:49:48 levsc02 genunix: [ID 904073 kern.notice] done Jul 2 11:49:49 levsc02 genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/system/dump, offset 65536, content: kernel + curproc Jul 2 12:30:26 levsc02 genunix: [ID 100000 kern.notice] Jul 2 12:30:26 levsc02 genunix: [ID 665016 kern.notice] ^M100% done: 8350742 pages dumped, Jul 2 12:30:26 levsc02 genunix: [ID 851671 kern.notice] dump succeeded
Solution
Upgrade to firmware code 2013.1.2.13 or 2013.1.3.0 or above
References<BUG:19013554> - READ COMPLETION IS RACY<BUG:18247089> - BAD TRAP PANIC IN ARC_HASH_REMOVE <BUG:18695640> - RACE BETWEEN ARC_WRITE_DONE AND ARC_EVICT_BUF <BUG:18779253> - BACKPORT 18695640 TO AK-2013-REL Attachments This solution has no attachment |
||||||||||||||||||
|