Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1546309.1
Update Date:2016-08-23
Keywords:

Solution Type  Problem Resolution Sure

Solution  1546309.1 :   Sun Storage 7000 Unified Storage System: Head fails to join cluster - attempt to import nonexistent STMF object  


Related Items
  • Sun ZFS Storage 7420
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Oracle ZFS Storage ZS4-4
  •  
  • Sun Storage 7410 Unified Storage System
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Oracle ZFS Storage Appliance Racked System ZS4-4
  •  
  • Sun ZFS Storage 7320
  •  
  • Oracle ZFS Storage ZS3-BA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-7050173161>

Applies to:

Oracle ZFS Storage ZS3-BA - Version All Versions and later
Oracle ZFS Storage Appliance Racked System ZS4-4 - Version All Versions and later
Oracle ZFS Storage ZS4-4 - Version All Versions and later
Sun ZFS Storage 7420 - Version All Versions and later
Sun ZFS Storage 7320 - Version All Versions and later
7000 Appliance OS (Fishworks)

Symptoms

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Disk Storage ZFS Storage Appliance

 

NOTE: To confirm that the cluster 'links' cabling is correctly configured - See Document ID 2081179.1

 

The following error messages show up after reboot:

  ABORT: attempt to import nonexistent STMF object 'view/b0708cab-ee15-cffc-dcf3-c9cf5e8c677e'; did resume fail on the local but succeed on the cluster peer?Apr 11 06:01:15 svc.startd[83]: svc:/network/ntp:default: Method "exec /usr/lib/ak/svc/method/akntp start" failed with exit status 95.
  Apr 11 06:01:15 svc.startd[83]: network/ntp:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details)
  Apr 11 06:01:47 svc.startd[83]: svc:/appliance/kit/http:default: Method "exec /usr/lib/ak/svc/method/akhttpd start" failed due to signal TERM.
  Apr 11 06:02:17 svc.startd[83]: svc:/appliance/kit/http:default: Method "exec /usr/lib/ak/svc/method/akhttpd start" failed due to signal TERM.
  Apr 11 06:02:47 svc.startd[83]: svc:/appliance/kit/http:default: Method "exec /usr/lib/ak/svc/method/akhttpd start" failed due to signal TERM.
  Apr 11 06:02:47 svc.startd[83]: appliance/kit/http:default failed: transitioned to maintenance (see 'svcs -xv' for details)

 

The following error messages appear before dropping into the shell:

  node1:> ABORT: attempt to import nonexistent STMF object 'view/b0708cab-ee15-cffc-dcf3-c9cf5e8c677e'; did resume fail on the local but succeed on the cluster peer?
  aksh: uncaught internal exception: { akStack: { aks_content: [{ args: [], caller: }, { args: [], caller: }] },
  akWrapped: { stack: 'akshCall("utask.listTasks",[object Array])@:0\n()@akService.js:50\n()@akInterpreter.js:1879\n@/usr/lib/ak/js/shell/akLocore.js:629\n', lineNumber: 50, fileName: 'akService.js', message: 'couldn\'t make door call: Bad file number' }, message:'XML-RPC failed'}
  aksh-wrapper: FATAL: The appliance shell failed unexpectedly with error code 1.
  Dropping into failsafe shell ...

 

Cause

The AKD (Appliance Kit Management Daemon) has core dumped/failed because there are different stash (configuration) values in each cluster head for the same resource.

More details about the issue are described in the following two bugs:
Bug 16008643 ABORT: ATTEMPT TO IMPORT NONEXISTENT STMF OBJECT. (this is duplicate of Bug 15668474)
Bug 15668474 SUNBT6984151 AKD DYING DUE TO RM.INSERT() BEING HANDED NULL STMF VIEW STASH NODE
 
One cluster head core dumped and became unresponsive - which lead to the 'other' cluster head performing a takeover of all cluster resources.
 

Solution

If the appliance is running >= OS 8.6, and akd on the cluster peer is down or the peer is powered off, switch the watchdog to warn only mode before disabling or restarting akd on the surviving node:

    echo "watchdog_warn_only/W 1" | mdb -kw

Once the service actions are completed, if the watchdog mode was modified, switch it back to standard (panic) mode:

    echo "watchdog_warn_only/W 0" | mdb -kw

 

Under no conditions leave the tunable enabled.

 

1.  Power off the node where akd crashes  -  the affected node

2.  Restart akd on the other node  -  the healthy node

3.  Power up the node with the problem  -  the affected node

 

 

References

<BUG:15668474> - SUNBT6984151 AKD DYING DUE TO RM.INSERT BEING HANDED NULL STMF VIEW STASH NODE

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback