Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2347792.1
Update Date:2018-01-11
Keywords:

Solution Type  Problem Resolution Sure

Solution  2347792.1 :   Oracle ZFS Storage Appliance: AKD hang after http/akhttpd service methods timeout  


Related Items
  • Sun ZFS Storage 7420
  •  
  • Oracle ZFS Storage ZS5-2
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Oracle ZFS Storage ZS4-4
  •  
  • Oracle ZFS Storage ZS5-4
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun ZFS Storage 7120
  •  
  • Sun ZFS Storage 7320
  •  
  • Oracle ZFS Storage ZS3-BA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: ZS-ES
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-16514036271>

Applies to:

Oracle ZFS Storage ZS5-4 - Version All Versions and later
Oracle ZFS Storage ZS5-2 - Version All Versions and later
Oracle ZFS Storage ZS4-4 - Version All Versions and later
Oracle ZFS Storage ZS3-4 - Version All Versions and later
Oracle ZFS Storage ZS3-2 - Version All Versions and later
7000 Appliance OS (Fishworks)

Symptoms

We are unable to access the ZFSSA through the BUI and from the shell prompt we get the following:

Last login: Thu Dec 28 10:45:09 2017 from 10.254.172.40
Waiting for the appliance shell to start ...
The appliance shell is taking longer than usual to start.
Press Ctrl-C to exit or wait 45 seconds for the emergency shell.


In addition, we are getting so many replications failing.

On the EMOC dashboard of this Exalogic, we get an alert SMF-8000-YX related to the Storage nodes.

 

TSC engineer joined webex session :

- AKD appeared hung on both cluster heads
- Took head02 'down'
- Attempted to restart AKD on Head01 ... AKD would not 'die'
- Customer shutdown all apps servers
- Head01 was then rebooted and came fully online
- Head02 was then booted and successfully re-joined the cluster
- Customer restarted all apps servers

=> All now OK/online.

 

Head 01

## debug.sys

Dec 28 00:03:07 el51-sn01 svc.startd[118]: [ID 122153 daemon.warning] svc:/appliance/kit/http:default: Method or service exit timed out. Killing contract 942278.
Dec 28 00:03:08 el51-sn01 svc.startd[118]: [ID 636263 daemon.warning] svc:/appliance/kit/http:default: Method "exec /usr/lib/ak/svc/method/akhttpd start" failed due to signal KILL.
Dec 28 00:05:09 el51-sn01 svc.startd[118]: [ID 122153 daemon.warning] svc:/appliance/kit/http:default: Method or service exit timed out. Killing contract 942279.
Dec 28 00:05:10 el51-sn01 svc.startd[118]: [ID 636263 daemon.warning] svc:/appliance/kit/http:default: Method "exec /usr/lib/ak/svc/method/akhttpd start" failed due to signal KILL.
Dec 28 00:07:11 el51-sn01 svc.startd[118]: [ID 122153 daemon.warning] svc:/appliance/kit/http:default: Method or service exit timed out. Killing contract 942280.
Dec 28 00:07:11 el51-sn01 svc.startd[118]: [ID 636263 daemon.warning] svc:/appliance/kit/http:default: Method "exec /usr/lib/ak/svc/method/akhttpd start" failed due to signal KILL.
Dec 28 00:07:11 el51-sn01 svc.startd[118]: [ID 748625 daemon.error] appliance/kit/http:default failed: transitioned to maintenance (see 'svcs -xv' for details)
Dec 28 14:55:47 el51-sn01 svc.startd[118]: [ID 694882 daemon.notice] instance svc:/system/console-login:default exited with status 1
Dec 28 15:01:37 el51-sn01 svc.startd[118]: [ID 694882 daemon.notice] instance svc:/system/console-login:default exited with status 1

Dec 29 13:37:19 el51-sn01 reboot: [ID 330035 auth.crit] initiated by root on /dev/pts/1
Dec 29 13:37:26 el51-sn01 in.mpathd[13507]: [ID 758932 daemon.warning] Disabling IP multipathing failure and repair detection with IPMP interfaces configured.
Dec 29 13:37:26 el51-sn01 rpcbind: [ID 851619 daemon.notice] rpcbind terminating on signal TERM
Dec 29 13:37:34 el51-sn01 genunix: [ID 672855 kern.notice] syncing file systems...
Dec 29 13:37:35 el51-sn01 genunix: [ID 904073 kern.notice] done

 

## appliance-kit-http:default.log

[ Dec 28 00:01:06 Stopping because service restarting. ]
[ Dec 28 00:01:06 Executing stop method ("exec /usr/lib/ak/svc/method/akhttpd stop"). ]
[ Dec 28 00:01:07 Method "stop" exited with status 0. ]
[ Dec 28 00:01:07 Executing start method ("exec /usr/lib/ak/svc/method/akhttpd start"). ]
[ Dec 28 00:03:07 Method or service exit timed out. Killing contract 942278. ]
[ Dec 28 00:03:08 Method "start" failed due to signal KILL. ]
[ Dec 28 00:03:08 Executing start method ("exec /usr/lib/ak/svc/method/akhttpd start"). ]
[ Dec 28 00:05:09 Method or service exit timed out. Killing contract 942279. ]
[ Dec 28 00:05:10 Method "start" failed due to signal KILL. ]
[ Dec 28 00:05:10 Executing start method ("exec /usr/lib/ak/svc/method/akhttpd start"). ]
[ Dec 28 00:07:11 Method or service exit timed out. Killing contract 942280. ]
[ Dec 28 00:07:11 Method "start" failed due to signal KILL. ]

[ Dec 29 13:40:15 Enabled. ]
[ Dec 29 13:41:10 Rereading configuration. ]
[ Dec 29 13:43:19 Executing start method ("exec /usr/lib/ak/svc/method/akhttpd start"). ]
[ Dec 29 13:43:23 Method "start" exited with status 0. ]

 

Head 02

## debug.sys

Dec 28 00:03:05 el51-sn02 svc.startd[118]: [ID 122153 daemon.warning] svc:/appliance/kit/http:default: Method or service exit timed out. Killing contract 850988.
Dec 28 00:03:07 el51-sn02 svc.startd[118]: [ID 636263 daemon.warning] svc:/appliance/kit/http:default: Method "exec /usr/lib/ak/svc/method/akhttpd start" failed due to signal KILL.
Dec 28 00:05:08 el51-sn02 svc.startd[118]: [ID 122153 daemon.warning] svc:/appliance/kit/http:default: Method or service exit timed out. Killing contract 850989.
Dec 28 00:05:08 el51-sn02 svc.startd[118]: [ID 636263 daemon.warning] svc:/appliance/kit/http:default: Method "exec /usr/lib/ak/svc/method/akhttpd start" failed due to signal KILL.
Dec 28 00:07:09 el51-sn02 svc.startd[118]: [ID 122153 daemon.warning] svc:/appliance/kit/http:default: Method or service exit timed out. Killing contract 850991.
Dec 28 00:07:10 el51-sn02 svc.startd[118]: [ID 636263 daemon.warning] svc:/appliance/kit/http:default: Method "exec /usr/lib/ak/svc/method/akhttpd start" failed due to signal KILL.
Dec 28 00:07:10 el51-sn02 svc.startd[118]: [ID 748625 daemon.error] appliance/kit/http:default failed: transitioned to maintenance (see 'svcs -xv' for details)

Dec 29 09:56:24 el51-sn02 reboot: [ID 330035 auth.crit] initiated by root on /dev/console
Dec 29 09:56:31 el51-sn02 syslogd: going down on signal 15
Dec 29 09:56:37 el51-sn02 genunix: [ID 672855 kern.notice] syncing file systems...
Dec 29 09:56:38 el51-sn02 genunix: [ID 904073 kern.notice] done

 

## appliance-kit-http:default.log

[ Dec 28 00:01:04 Stopping because service restarting. ]
[ Dec 28 00:01:04 Executing stop method ("exec /usr/lib/ak/svc/method/akhttpd stop"). ]
[ Dec 28 00:01:05 Method "stop" exited with status 0. ]
[ Dec 28 00:01:05 Executing start method ("exec /usr/lib/ak/svc/method/akhttpd start"). ]
[ Dec 28 00:03:05 Method or service exit timed out. Killing contract 850988. ]
[ Dec 28 00:03:07 Method "start" failed due to signal KILL. ]
[ Dec 28 00:03:07 Executing start method ("exec /usr/lib/ak/svc/method/akhttpd start"). ]
[ Dec 28 00:05:08 Method or service exit timed out. Killing contract 850989. ]
[ Dec 28 00:05:08 Method "start" failed due to signal KILL. ]
[ Dec 28 00:05:08 Executing start method ("exec /usr/lib/ak/svc/method/akhttpd start"). ]
[ Dec 28 00:07:09 Method or service exit timed out. Killing contract 850991. ]
[ Dec 28 00:07:10 Method "start" failed due to signal KILL. ]
[ Dec 29 13:54:22 Enabled. ]
[ Dec 29 13:55:10 Rereading configuration. ]
[ Dec 29 13:55:58 Executing start method ("exec /usr/lib/ak/svc/method/akhttpd start"). ]
[ Dec 29 13:56:03 Method "start" exited with status 0. ]

 

Changes

 NOne

Cause

I believe this is an instance of Bug 23107191 (The SMF refresh method timeout), which is closed as a duplicate of Bug 22474743 (SMF refresh times out due to net class xml-rpc call).

Bug 22474743 is fixed in Appliance Firmware Release 2013.1.7.0

 

Solution

Upgrade to Appliance Firmware Release 2013.1.7.0 (or later).

 

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback