Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1913790.1
Update Date:2018-05-25
Keywords:

Solution Type  Problem Resolution Sure

Solution  1913790.1 :   Oracle ZFS Storage Appliance: Appliance Kit Daemon (akd) Panic in zfs_send_impl()  


Related Items
  • Sun ZFS Storage 7420
  •  
  • Oracle ZFS Storage ZS5-2
  •  
  • Sun Storage 7110 Unified Storage System
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Sun Storage 7210 Unified Storage System
  •  
  • Oracle ZFS Storage ZS4-4
  •  
  • Sun Storage 7410 Unified Storage System
  •  
  • Oracle ZFS Storage ZS5-4
  •  
  • Sun ZFS Storage 7120
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun ZFS Storage 7320
  •  
  • Oracle ZFS Storage ZS3-BA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-6969876821>

Applies to:

Sun ZFS Storage 7420 - Version All Versions and later
Sun ZFS Storage 7320 - Version All Versions and later
Sun ZFS Storage 7120 - Version All Versions and later
Sun Storage 7410 Unified Storage System - Version All Versions and later
Sun Storage 7310 Unified Storage System - Version All Versions and later
7000 Appliance OS (Fishworks)

Symptoms

Suspecting that AKD 'unexpectedly' restarted:

  SUNW-MSG-ID: AK-8001-RK, TYPE: alert, VER: 1, SEVERITY: Minor
  EVENT-TIME: Sat Mar 23 15:00:39 2013
  PLATFORM: i86pc, CSN: 1101FMJ01D, HOSTNAME: adc08stor07
  SOURCE: svc:/appliance/kit/akd:default, REV: 1.0
  EVENT-ID: 0ba484c7-e754-cb67-f453-9e61bcd9fc78
  DESC: Communication with the cluster peer via a cluster interconnect link has been lost.
  AUTO-RESPONSE: None.
  IMPACT: Cluster reliability is impaired. If the cluster peer is functioning normally but no cluster interconnects remain active, arbitrary and unwanted cluster takeover may occur.
  REC-ACTION: Check the cluster interconnect cables and the state of the cluster peer. Contact your vendor for support if an interconnect link remains inexplicably down.

 

Following alerts were noticed in BUI:

Description A cluster interconnect link has been restored.
Type Minor alert
Impact Cluster reliability has improved.
Automated response None.
Recommended action None.
Event time 2013-3-23 20:36:56
Unique Identifier f685cc2f-69e0-e623-d62b-ce225994a8eb
Status This alert is not associated with a problem.

Description A cluster interconnect link has been restored.
Type Minor alert
Impact Cluster reliability has improved.
Automated response None.
Recommended action None.
Event time 2013-3-23 20:36:56
Unique Identifier 9c903ddd-9cd3-6414-f574-d5e56aa365cb
Status This alert is not associated with a problem.

Description A cluster interconnect link has been restored.
Type Minor alert
Impact Cluster reliability has improved.
Automated response None.
Recommended action None.
Event time 2013-3-23 20:37:01
Unique Identifier f7e3244d-511e-c014-c3d7-b8f86724ce5f
Status This alert is not associated with a problem.

Description The appliance has rejoined the cluster.
Type Minor alert
Impact Cluster failover is now available.
Automated response None.
Recommended action None.
Event time 2013-3-23 20:39:07
Unique Identifier ab7114a5-9ca6-c347-fd53-fe7c7d2dd8d7
Status This alert is not associated with a problem.

 

 

Cause

AKD was restarted on adc08stor08.

AKD service log:

Assertion failed: 0 == close(sdd.cleanup_fd), file ../common/libzfs_sendrecv.c, line 1545, function zfs_send
[ Mar 23 15:05:28 Stopping because process dumped core. ]
[ Mar 23 15:05:28 Executing stop method (:kill). ]
[ Mar 23 15:05:39 Executing start method ("exec /usr/lib/ak/akd"). ]
[ Mar 23 15:07:12 Method "start" exited with status 0. ]

 

AKD application coredump:

> ::status
debugging core file of akd (32-bit) from adc08stor08
initial argv: /usr/lib/ak/akd
threading model: native threads
status: process terminated by SIGABRT (Abort), pid=1326 uid=0 code=-1
panic message: Assertion failed: 0 == close(sdd.cleanup_fd), file ../common/libzfs_sendrecv.c, line 1545, function zfs_send
>

> $C
f2aec6c8 libc_hwcap1.so.1`_lwp_kill+0x15(98, 6, f2aec6e8, fee608b1)
f2aec6e8 libc_hwcap1.so.1`raise+0x25(6, 0, f2aec738, fee3869d)
f2aec738 libc_hwcap1.so.1`abort+0xf5(65737341, 6f697472, 6166206e, 64656c69, 2030203a, 63203d3d)
f2aec948 0xfee38ad0(fe69c64c, fe69d500, 6d6, fe685040)
f2aed2c8 libzfs.so.1`zfs_send_impl+0xc9b(689402c8, 6f3c2dcb, 67ced313, 6, 152, fd364f70)
f2aed308 libzfs.so.1`zfs_send+0x2e(689402c8, 6f3c2dcb, 67ced313, 6, 152, fd364f70)
f2aed588 nas.so`nas_repl_send_stream_send+0x262(f2aed5c0, 65cc0ac8, fd3b9478, fd367966)
f2aedf28 nas.so`nas_repl_eng_send+0xca(80a1c08)
f2aedf98 libak.so.1`ak_engine_worker+0x170(88aa7f8, 0, 0, f013cde9)
f2aedfc8 libak.so.1`ak_thread_start+0x6a(8cf0408, fef51000, f2aedfe8, feeb36d9)
f2aedfe8 libc_hwcap1.so.1`_thrp_setup+0x9d(f4a11140)
f2aedff8 libc_hwcap1.so.1`_lwp_start(f4a11140, 0, 0, 0, 0, 0)
>

> ::vmem
ADDR NAME INUSE TOTAL SUCCEED FAIL
fe99dc28 sbrk_top 1275564032 3451412480 857568324 4135 <<<<<<<<<
fe99e09c sbrk_heap 1275564032 1275564032 857568324 731
fe99e510 vmem_internal 43978752 43978752 24581317 0
fe99e984 vmem_seg 41811968 41811968 10208 0
fe99edf8 vmem_hash 2150656 2154496 26 0
fe99f26c vmem_vmem 17100 19128 24571101 0
08062000 umem_internal 17249280 17252352 81924 0
08062474 umem_cache 402320 577536 51 0
080628e8 umem_hash 1239040 1245184 224 0
08063000 umem_log 0 0 0 0
08063474 umem_firewall_va 0 0 0 0
080638e8 umem_firewall 0 0 0 0
08064000 umem_oversize 147805517 151138304 831194045 731 <<<<<<<<<
08064474 umem_memalign 4456464 4464640 714301 0
080648e8 umem_default 1058729984 1058729984 996737 0
>

 

This is an instance of bug CR 15826181 - akd crashed in libzfs with Assertion failed: 0 == close(sdd.cleanup_fd)

 

Solution

Upgrade to Appliance Firmware Release 2011.1.8.0 (or later)  OR Appliance Firmware Release 2013.1.0.1 (or later).

 

 

 

***Checked for relevance on 25-MAY-2018***

References

<NOTE:1494369.1> - Sun Storage 7000 Unified Storage System: BUI unavailable and seeing errors like "failed to update kstat chain: Not enough space"
<NOTE:1019887.1> - Sun Storage 7000 Unified Storage System: How to Collect a Support Bundle using the BUI or CLI
<NOTE:1325025.1> - Sun Storage 7000 Unified Storage System: aksh fatal error: no memory
<NOTE:1401282.1> - Sun Storage 7000 Unified Storage System: How to Troubleshoot Unresponsive Administrative Interface (BUI/CLI hang)
<NOTE:1401288.1> - Sun Storage 7000 Unified Storage System: Data collection for akd hang issues
<BUG:15826181> - SUNBT7207252 AKD CRASHED IN LIBZFS WITH ASSERTION FAILED: 0 == CLOSE(SDD.CLEANUP

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback