Sun Storage 7000 Unified Storage System: Head fails to join cluster - attempt to import nonexistent STMF object

Asset ID:	1-72-1546309.1
Update Date:	2016-08-23
Keywords:

Solution Type Problem Resolution Sure

Solution 1546309.1 : Sun Storage 7000 Unified Storage System: Head fails to join cluster - attempt to import nonexistent STMF object

Applies to:

Oracle ZFS Storage ZS3-BA - Version All Versions and later
Oracle ZFS Storage Appliance Racked System ZS4-4 - Version All Versions and later
Oracle ZFS Storage ZS4-4 - Version All Versions and later
Sun ZFS Storage 7420 - Version All Versions and later
Sun ZFS Storage 7320 - Version All Versions and later
7000 Appliance OS (Fishworks)

Symptoms

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Disk Storage ZFS Storage Appliance

NOTE: To confirm that the cluster 'links' cabling is correctly configured - See Document ID 2081179.1

The following error messages show up after reboot:

  ABORT: attempt to import nonexistent STMF object 'view/b0708cab-ee15-cffc-dcf3-c9cf5e8c677e'; did resume fail on the local but succeed on the cluster peer?Apr 11 06:01:15 svc.startd[83]: svc:/network/ntp:default: Method "exec /usr/lib/ak/svc/method/akntp start" failed with exit status 95.
  Apr 11 06:01:15 svc.startd[83]: network/ntp:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details)
  Apr 11 06:01:47 svc.startd[83]: svc:/appliance/kit/http:default: Method "exec /usr/lib/ak/svc/method/akhttpd start" failed due to signal TERM.
  Apr 11 06:02:17 svc.startd[83]: svc:/appliance/kit/http:default: Method "exec /usr/lib/ak/svc/method/akhttpd start" failed due to signal TERM.
  Apr 11 06:02:47 svc.startd[83]: svc:/appliance/kit/http:default: Method "exec /usr/lib/ak/svc/method/akhttpd start" failed due to signal TERM.
  Apr 11 06:02:47 svc.startd[83]: appliance/kit/http:default failed: transitioned to maintenance (see 'svcs -xv' for details)

The following error messages appear before dropping into the shell:

node1:> ABORT: attempt to import nonexistent STMF object 'view/b0708cab-ee15-cffc-dcf3-c9cf5e8c677e'; did resume fail on the local but succeed on the cluster peer?
  aksh: uncaught internal exception: { akStack: { aks_content: [{ args: [], caller: }, { args: [], caller: }] },
  akWrapped: { stack: 'akshCall("utask.listTasks",[object Array])@:0\n()@akService.js:50\n()@akInterpreter.js:1879\n@/usr/lib/ak/js/shell/akLocore.js:629\n', lineNumber: 50, fileName: 'akService.js', message: 'couldn\'t make door call: Bad file number' }, message:'XML-RPC failed'}
  aksh-wrapper: FATAL: The appliance shell failed unexpectedly with error code 1.
  Dropping into failsafe shell ...

Cause

The AKD (Appliance Kit Management Daemon) has core dumped/failed because there are different stash (configuration) values in each cluster head for the same resource.

More details about the issue are described in the following two bugs:
Bug 16008643 ABORT: ATTEMPT TO IMPORT NONEXISTENT STMF OBJECT. (this is duplicate of Bug 15668474)
Bug 15668474 SUNBT6984151 AKD DYING DUE TO RM.INSERT() BEING HANDED NULL STMF VIEW STASH NODE

One cluster head core dumped and became unresponsive - which lead to the 'other' cluster head performing a takeover of all cluster resources.

Solution

If the appliance is running >= OS 8.6, and akd on the cluster peer is down or the peer is powered off, switch the watchdog to warn only mode before disabling or restarting akd on the surviving node:

echo "watchdog_warn_only/W 1" | mdb -kw

Once the service actions are completed, if the watchdog mode was modified, switch it back to standard (panic) mode:

echo "watchdog_warn_only/W 0" | mdb -kw

Under no conditions leave the tunable enabled.

1. Power off the node where akd crashes - the affected node

2. Restart akd on the other node - the healthy node

3. Power up the node with the problem - the affected node

References

<BUG:15668474> - SUNBT6984151 AKD DYING DUE TO RM.INSERT BEING HANDED NULL STMF VIEW STASH NODE

Attachments

This solution has no attachment