Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1674811.1
Update Date:2018-01-08
Keywords:

Solution Type  Problem Resolution Sure

Solution  1674811.1 :   Oracle ZFS Storage Appliance: Panic with ZFS Null Pointer Dereference in arc_hash_remove()  


Related Items
  • Oracle ZFS Storage ZS3-2
  •  
  • Sun Storage 7110 Unified Storage System
  •  
  • Sun Storage 7210 Unified Storage System
  •  
  • Sun ZFS Storage 7420
  •  
  • Sun Storage 7410 Unified Storage System
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Sun ZFS Storage 7120
  •  
  • Sun ZFS Storage 7320
  •  
  • Oracle ZFS Storage ZS3-BA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-8933876631>

Applies to:

Oracle ZFS Storage ZS3-2 - Version All Versions and later
Sun Storage 7110 Unified Storage System - Version All Versions and later
Oracle ZFS Storage ZS3-4 - Version All Versions and later
Sun ZFS Storage 7420 - Version All Versions to All Versions [Release All Releases]
Sun Storage 7310 Unified Storage System - Version All Versions and later
7000 Appliance OS (Fishworks)

Symptoms

System panic and rebooted.

In clustered appliance, pool and data services are taken over by the peer appliance head.

Cause

System panic:

May 1 06:01:25 hostname ^Mpanic[cpu19]/thread=ffffff03d04c9c40:
May 1 06:01:25 hostname genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=ffffff03d04c9a40 addr=28 occurred in module "zfs" due to a NULL pointer dereference
May 1 06:01:25 hostname unix: [ID 100000 kern.notice]
May 1 06:01:25 hostname unix: [ID 839527 kern.notice] sched:
May 1 06:01:25 hostname unix: [ID 753105 kern.notice] #pf Page fault
May 1 06:01:25 hostname unix: [ID 532287 kern.notice] Bad kernel fault at addr=0x28
May 1 06:01:25 hostname unix: [ID 243837 kern.notice] pid=0, pc=0xfffffffff7a0a378, sp=0xffffff03d04c9b30, eflags=0x10213
May 1 06:01:25 hostname unix: [ID 211416 kern.notice] cr0: 8005003b cr4: 6f8
May 1 06:01:25 hostname unix: [ID 624947 kern.notice] cr2: 28
May 1 06:01:25 hostname unix: [ID 625075 kern.notice] cr3: 7217000
May 1 06:01:25 hostname unix: [ID 625715 kern.notice] cr8: 0
May 1 06:01:25 hostname unix: [ID 100000 kern.notice]
May 1 06:01:25 hostname unix: [ID 592667 kern.notice] rdi: fffff6face6e0038 rsi: 7 rdx: ffffff03d04c9c40
May 1 06:01:25 hostname unix: [ID 592667 kern.notice] rcx: 28 r8: fffff6face64e438 r9: 1
May 1 06:01:25 hostname unix: [ID 592667 kern.notice] rax: 0 rbx: 34f3a8 rbp: ffffff03d04c9b40
May 1 06:01:26 hostname unix: [ID 592667 kern.notice] r10: ffffff03d49aec40 r11: ffffff03d04c9c40 r12: fffff6028e93b700
May 1 06:01:26 hostname unix: [ID 592667 kern.notice] r13: fffff6028e93b748 r14: fffff6face6e0038 r15: fffff60083c22a00
May 1 06:01:26 hostname unix: [ID 592667 kern.notice] fsb: 0 gsb: fffff600cabe4040 ds: 4b
May 1 06:01:26 hostname unix: [ID 592667 kern.notice] es: 4b fs: 0 gs: 1c3
May 1 06:01:26 hostname unix: [ID 592667 kern.notice] trp: e err: 0 rip: fffffffff7a0a378
May 1 06:01:26 hostname unix: [ID 592667 kern.notice] cs: 30 rfl: 10213 rsp: ffffff03d04c9b30
May 1 06:01:26 hostname unix: [ID 266532 kern.notice] ss: 0
May 1 06:01:26 hostname unix: [ID 100000 kern.notice]
May 1 06:01:26 hostname genunix: [ID 655072 kern.notice] ffffff03d04c9960 unix:die+105 ()
May 1 06:01:26 hostname genunix: [ID 655072 kern.notice] ffffff03d04c9a30 unix:trap+152b ()
May 1 06:01:26 hostname genunix: [ID 655072 kern.notice] ffffff03d04c9a40 unix:cmntrap+e7 ()
May 1 06:01:26 hostname genunix: [ID 655072 kern.notice] ffffff03d04c9b40 zfs:arc_hash_remove+28 ()
May 1 06:01:26 hostname genunix: [ID 655072 kern.notice] ffffff03d04c9bb0 zfs:l2arc_evict+1a5 ()
May 1 06:01:26 hostname genunix: [ID 655072 kern.notice] ffffff03d04c9c20 zfs:l2arc_feed_thread+163 ()
May 1 06:01:26 hostname genunix: [ID 655072 kern.notice] ffffff03d04c9c30 unix:thread_start+8 ()
May 1 06:01:26 hostname unix: [ID 100000 kern.notice]
May 1 06:01:26 hostname genunix: [ID 672855 kern.notice] syncing file systems...
May 1 06:01:27 hostname genunix: [ID 904073 kern.notice] done
May 1 06:01:25 hostname genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/system/dump, offset 65536, content: kernel + curproc
May 1 09:53:46 hostname genunix: [ID 100000 kern.notice]
May 1 09:53:46 hostname genunix: [ID 665016 kern.notice] ^M100% done: 38706698 pages dumped,
May 1 09:53:46 hostname genunix: [ID 851671 kern.notice] dump succeeded

> ::status
debugging crash dump vmcore.0 (64-bit) from slcnas460
operating system: 5.11 ak/generic@2013.06.05.1.8,1-1.1 (i86pc)
image uuid: c10eb749-c0b5-c6f1-aade-dd98d2d37e23
panic message: BAD TRAP: type=e (#pf Page fault) rp=ffffff03d04c9a40 addr=28 occurred in module "zfs" due to a NULL pointer dereference
dump content: kernel pages only

 

> ::stack
arc_hash_remove+0x28(fffff6face6e0038)
l2arc_evict+0x1a5(fffff6028e93b700, 800000, 0)
l2arc_feed_thread+0x163()
thread_start+8()

 

The based bug fix and explanation is available in CR 18695640 : RACE BETWEEN ARC_WRITE_DONE() AND ARC_EVICT_BUF()

At the time of writing this document, the fix is available in 2013.1.1.8 IDR #1.12013.1.2.2 and Solaris 11.2-SRU.

CR 18695640 and 19013554 fixes are back ported to Solaris 10 in patch 150400-21 (SPARC) and 150401-21 (X86)

See also CR 19013554 : read completion is racy and the fix is included in Micro release 2013.1.2.6.

Example panic stacktrace for CR 19013554:

Jul 2 11:49:46 levsc02 genunix: [ID 335743 kern.notice] BAD TRAP: type=e (#pf Page fault) rp=ffffff00b81d39a0 addr=3f24ef1c28 occurred in module "zfs" due to an illegal access to a user address
Jul 2 11:49:46 levsc02 unix: [ID 100000 kern.notice]
Jul 2 11:49:46 levsc02 unix: [ID 839527 kern.notice] sched:
Jul 2 11:49:46 levsc02 unix: [ID 753105 kern.notice] #pf Page fault
Jul 2 11:49:46 levsc02 unix: [ID 532287 kern.notice] Bad kernel fault at addr=0x3f24ef1c28

Jul 2 11:49:47 levsc02 genunix: [ID 655072 kern.notice] ffffff00b81d38c0 unix:die+105 ()
Jul 2 11:49:48 levsc02 genunix: [ID 655072 kern.notice] ffffff00b81d3990 unix:trap+152b ()
Jul 2 11:49:48 levsc02 genunix: [ID 655072 kern.notice] ffffff00b81d39a0 unix:cmntrap+e7 ()
Jul 2 11:49:48 levsc02 genunix: [ID 655072 kern.notice] ffffff00b81d3aa0 zfs:l2arc_destroy_cookie+1d ()
Jul 2 11:49:48 levsc02 genunix: [ID 655072 kern.notice] ffffff00b81d3b10 zfs:l2arc_free_buffers+f6 ()
Jul 2 11:49:48 levsc02 genunix: [ID 655072 kern.notice] ffffff00b81d3b50 zfs:l2arc_evict_one+67 ()
Jul 2 11:49:48 levsc02 genunix: [ID 655072 kern.notice] ffffff00b81d3ba0 zfs:l2arc_feed_pool+139 ()
Jul 2 11:49:48 levsc02 genunix: [ID 655072 kern.notice] ffffff00b81d3c00 zfs:l2arc_feed_thread+125 ()
Jul 2 11:49:48 levsc02 genunix: [ID 655072 kern.notice] ffffff00b81d3c10 unix:thread_start+8 ()
Jul 2 11:49:48 levsc02 unix: [ID 100000 kern.notice]
Jul 2 11:49:48 levsc02 genunix: [ID 672855 kern.notice] syncing file systems...
Jul 2 11:49:48 levsc02 genunix: [ID 904073 kern.notice] done
Jul 2 11:49:49 levsc02 genunix: [ID 111219 kern.notice] dumping to /dev/zvol/dsk/system/dump, offset 65536, content: kernel + curproc
Jul 2 12:30:26 levsc02 genunix: [ID 100000 kern.notice]
Jul 2 12:30:26 levsc02 genunix: [ID 665016 kern.notice] ^M100% done: 8350742 pages dumped,
Jul 2 12:30:26 levsc02 genunix: [ID 851671 kern.notice] dump succeeded


 

Solution

  

Upgrade to firmware code 2013.1.2.13 or 2013.1.3.0 or above

  

References

<BUG:19013554> - READ COMPLETION IS RACY
<BUG:18247089> - BAD TRAP PANIC IN ARC_HASH_REMOVE
<BUG:18695640> - RACE BETWEEN ARC_WRITE_DONE AND ARC_EVICT_BUF
<BUG:18779253> - BACKPORT 18695640 TO AK-2013-REL

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback