Sun Storage 7000 Unified Storage System: AKD fails to start in cluster - Error: resume() called with existing user ...

Asset ID:	1-72-1602779.1
Update Date:	2018-01-08
Keywords:

Solution Type Problem Resolution Sure

Solution 1602779.1 : Sun Storage 7000 Unified Storage System: AKD fails to start in cluster - Error: resume() called with existing user ...

Applies to:

Sun Storage 7410 Unified Storage System - Version All Versions and later
Sun Storage 7310 Unified Storage System - Version All Versions and later
Sun ZFS Storage 7420 - Version All Versions and later
Sun ZFS Storage 7320 - Version All Versions and later
Oracle ZFS Storage ZS3-2 - Version All Versions and later
7000 Appliance OS (Fishworks)
This situation occurs when the customer adds a new user to one cluster head. The propagation time of this new user information to the other head is sometimes long, so that we might be tempted to add it to the 'other' head. In that case, the user information will be stored twice in the stash, which will prevent AKD from starting.

Symptoms

We can observe that the Appliance Kit Daemon (AKD) is unable to start.

The visible symptoms are that the BUI cannot be accessed, and connection to the CLI drops to the 'emergency' shell.

SMF logs show this error message :

NAS# svcs -xv
... akd in the list ...
NAS# tail -2 /var/svc/log/*akd*
Sep 25 12:55:05 svc.startd[89]: svc:/appliance/kit/akd:default: Method "exec /usr/lib/ak/akd" failed with exit status 2.
ABORT: resume() called with existing user 'dumgui01'

Changes

A user has been added twice (once per head) into "configuration users" menu from the BUI or CLI

Users must be added on one head only, then the new user information is propagated to the other head.

Cause

This is caused by a duplicate entry of the users information in the stash.

In this real example, duplicate entries are found for user dumgui01.

NAS# cd /var/ak/stash/com/sun/ak/xmlrpc/user
NAS# aknv */obj | grep logname
logname = dusben99
  logname = dumgui01 ---> repeated twice
  logname = dumgui01 ----> here
  logname = fildom01
  logname = dusben01

Solution

Customer had duplicated entries in the stash for some users, having configured the same user(s) from both cluster nodes via BUI or CLI.

In this situation, customer must engage Support TSC via a Service Request.

Support TSC will remove the offending duplicate entry and then AKD will be able to start correctly.

Because stash is shared between cluster nodes, you will need to take one head down and work on the survivor head to make these changes.

It is best to bring up the head on which akd will not start into milestone none by adding -m milestone=none to the kernel line in the grub.

You will then use both heads to investigate the issue.

In this real example, you should look at the ls -l and aknv output for the users on both heads to determine which entry is the broken one.

Most likely, it is on the head in milestone none.

Move the single bad (directory) entry, that creates the duplicate entries for dumgui01, into the dropbox for safe keeping and to get it out of the stash.

NAS# cd /var/ak/stash/com/sun/ak/xmlrpc/user
NAS# ls -l
NAS# aknv */obj | more
NAS# mv 24be90dd-ca7c-cf82-c3fd-c8c4bee0a434 /var/ak/dropbox/

Next, restart akd on the survivor system to make sure we no longer have an offending (exported) user entry in the list:

NAS# cd /var/ak/dropbox
NAS# echo ::ak_rm_elem | mdb -p `pgrep -ox akd` | grep user

Next, bring up the passive head with a full, clean reboot, and verify the users are consistent between the heads.

Finally, failback, if needed, to go back to active/active mode.

Attachments

This solution has no attachment