Asset ID: |
1-72-2347792.1 |
Update Date: | 2018-01-11 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
2347792.1
:
Oracle ZFS Storage Appliance: AKD hang after http/akhttpd service methods timeout
Related Items |
- Sun ZFS Storage 7420
- Oracle ZFS Storage ZS5-2
- Oracle ZFS Storage ZS3-2
- Oracle ZFS Storage ZS4-4
- Oracle ZFS Storage ZS5-4
- Oracle ZFS Storage ZS3-4
- Sun ZFS Storage 7120
- Sun ZFS Storage 7320
- Oracle ZFS Storage ZS3-BA
|
Related Categories |
- PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: ZS-ES
|
In this Document
Created from <SR 3-16514036271>
Applies to:
Oracle ZFS Storage ZS5-4 - Version All Versions and later
Oracle ZFS Storage ZS5-2 - Version All Versions and later
Oracle ZFS Storage ZS4-4 - Version All Versions and later
Oracle ZFS Storage ZS3-4 - Version All Versions and later
Oracle ZFS Storage ZS3-2 - Version All Versions and later
7000 Appliance OS (Fishworks)
Symptoms
We are unable to access the ZFSSA through the BUI and from the shell prompt we get the following:
Last login: Thu Dec 28 10:45:09 2017 from 10.254.172.40
Waiting for the appliance shell to start ...
The appliance shell is taking longer than usual to start.
Press Ctrl-C to exit or wait 45 seconds for the emergency shell.
In addition, we are getting so many replications failing.
On the EMOC dashboard of this Exalogic, we get an alert SMF-8000-YX related to the Storage nodes.
TSC engineer joined webex session :
- AKD appeared hung on both cluster heads
- Took head02 'down'
- Attempted to restart AKD on Head01 ... AKD would not 'die'
- Customer shutdown all apps servers
- Head01 was then rebooted and came fully online
- Head02 was then booted and successfully re-joined the cluster
- Customer restarted all apps servers
=> All now OK/online.
Head 01
## debug.sys
Dec 28 00:03:07 el51-sn01 svc.startd[118]: [ID 122153 daemon.warning] svc:/appliance/kit/http:default: Method or service exit timed out. Killing contract 942278.
Dec 28 00:03:08 el51-sn01 svc.startd[118]: [ID 636263 daemon.warning] svc:/appliance/kit/http:default: Method "exec /usr/lib/ak/svc/method/akhttpd start" failed due to signal KILL.
Dec 28 00:05:09 el51-sn01 svc.startd[118]: [ID 122153 daemon.warning] svc:/appliance/kit/http:default: Method or service exit timed out. Killing contract 942279.
Dec 28 00:05:10 el51-sn01 svc.startd[118]: [ID 636263 daemon.warning] svc:/appliance/kit/http:default: Method "exec /usr/lib/ak/svc/method/akhttpd start" failed due to signal KILL.
Dec 28 00:07:11 el51-sn01 svc.startd[118]: [ID 122153 daemon.warning] svc:/appliance/kit/http:default: Method or service exit timed out. Killing contract 942280.
Dec 28 00:07:11 el51-sn01 svc.startd[118]: [ID 636263 daemon.warning] svc:/appliance/kit/http:default: Method "exec /usr/lib/ak/svc/method/akhttpd start" failed due to signal KILL.
Dec 28 00:07:11 el51-sn01 svc.startd[118]: [ID 748625 daemon.error] appliance/kit/http:default failed: transitioned to maintenance (see 'svcs -xv' for details)
Dec 28 14:55:47 el51-sn01 svc.startd[118]: [ID 694882 daemon.notice] instance svc:/system/console-login:default exited with status 1
Dec 28 15:01:37 el51-sn01 svc.startd[118]: [ID 694882 daemon.notice] instance svc:/system/console-login:default exited with status 1
Dec 29 13:37:19 el51-sn01 reboot: [ID 330035 auth.crit] initiated by root on /dev/pts/1
Dec 29 13:37:26 el51-sn01 in.mpathd[13507]: [ID 758932 daemon.warning] Disabling IP multipathing failure and repair detection with IPMP interfaces configured.
Dec 29 13:37:26 el51-sn01 rpcbind: [ID 851619 daemon.notice] rpcbind terminating on signal TERM
Dec 29 13:37:34 el51-sn01 genunix: [ID 672855 kern.notice] syncing file systems...
Dec 29 13:37:35 el51-sn01 genunix: [ID 904073 kern.notice] done
## appliance-kit-http:default.log
[ Dec 28 00:01:06 Stopping because service restarting. ]
[ Dec 28 00:01:06 Executing stop method ("exec /usr/lib/ak/svc/method/akhttpd stop"). ]
[ Dec 28 00:01:07 Method "stop" exited with status 0. ]
[ Dec 28 00:01:07 Executing start method ("exec /usr/lib/ak/svc/method/akhttpd start"). ]
[ Dec 28 00:03:07 Method or service exit timed out. Killing contract 942278. ]
[ Dec 28 00:03:08 Method "start" failed due to signal KILL. ]
[ Dec 28 00:03:08 Executing start method ("exec /usr/lib/ak/svc/method/akhttpd start"). ]
[ Dec 28 00:05:09 Method or service exit timed out. Killing contract 942279. ]
[ Dec 28 00:05:10 Method "start" failed due to signal KILL. ]
[ Dec 28 00:05:10 Executing start method ("exec /usr/lib/ak/svc/method/akhttpd start"). ]
[ Dec 28 00:07:11 Method or service exit timed out. Killing contract 942280. ]
[ Dec 28 00:07:11 Method "start" failed due to signal KILL. ]
[ Dec 29 13:40:15 Enabled. ]
[ Dec 29 13:41:10 Rereading configuration. ]
[ Dec 29 13:43:19 Executing start method ("exec /usr/lib/ak/svc/method/akhttpd start"). ]
[ Dec 29 13:43:23 Method "start" exited with status 0. ]
Head 02
## debug.sys
Dec 28 00:03:05 el51-sn02 svc.startd[118]: [ID 122153 daemon.warning] svc:/appliance/kit/http:default: Method or service exit timed out. Killing contract 850988.
Dec 28 00:03:07 el51-sn02 svc.startd[118]: [ID 636263 daemon.warning] svc:/appliance/kit/http:default: Method "exec /usr/lib/ak/svc/method/akhttpd start" failed due to signal KILL.
Dec 28 00:05:08 el51-sn02 svc.startd[118]: [ID 122153 daemon.warning] svc:/appliance/kit/http:default: Method or service exit timed out. Killing contract 850989.
Dec 28 00:05:08 el51-sn02 svc.startd[118]: [ID 636263 daemon.warning] svc:/appliance/kit/http:default: Method "exec /usr/lib/ak/svc/method/akhttpd start" failed due to signal KILL.
Dec 28 00:07:09 el51-sn02 svc.startd[118]: [ID 122153 daemon.warning] svc:/appliance/kit/http:default: Method or service exit timed out. Killing contract 850991.
Dec 28 00:07:10 el51-sn02 svc.startd[118]: [ID 636263 daemon.warning] svc:/appliance/kit/http:default: Method "exec /usr/lib/ak/svc/method/akhttpd start" failed due to signal KILL.
Dec 28 00:07:10 el51-sn02 svc.startd[118]: [ID 748625 daemon.error] appliance/kit/http:default failed: transitioned to maintenance (see 'svcs -xv' for details)
Dec 29 09:56:24 el51-sn02 reboot: [ID 330035 auth.crit] initiated by root on /dev/console
Dec 29 09:56:31 el51-sn02 syslogd: going down on signal 15
Dec 29 09:56:37 el51-sn02 genunix: [ID 672855 kern.notice] syncing file systems...
Dec 29 09:56:38 el51-sn02 genunix: [ID 904073 kern.notice] done
## appliance-kit-http:default.log
[ Dec 28 00:01:04 Stopping because service restarting. ]
[ Dec 28 00:01:04 Executing stop method ("exec /usr/lib/ak/svc/method/akhttpd stop"). ]
[ Dec 28 00:01:05 Method "stop" exited with status 0. ]
[ Dec 28 00:01:05 Executing start method ("exec /usr/lib/ak/svc/method/akhttpd start"). ]
[ Dec 28 00:03:05 Method or service exit timed out. Killing contract 850988. ]
[ Dec 28 00:03:07 Method "start" failed due to signal KILL. ]
[ Dec 28 00:03:07 Executing start method ("exec /usr/lib/ak/svc/method/akhttpd start"). ]
[ Dec 28 00:05:08 Method or service exit timed out. Killing contract 850989. ]
[ Dec 28 00:05:08 Method "start" failed due to signal KILL. ]
[ Dec 28 00:05:08 Executing start method ("exec /usr/lib/ak/svc/method/akhttpd start"). ]
[ Dec 28 00:07:09 Method or service exit timed out. Killing contract 850991. ]
[ Dec 28 00:07:10 Method "start" failed due to signal KILL. ]
[ Dec 29 13:54:22 Enabled. ]
[ Dec 29 13:55:10 Rereading configuration. ]
[ Dec 29 13:55:58 Executing start method ("exec /usr/lib/ak/svc/method/akhttpd start"). ]
[ Dec 29 13:56:03 Method "start" exited with status 0. ]
Changes
NOne
Cause
I believe this is an instance of Bug 23107191 (The SMF refresh method timeout), which is closed as a duplicate of Bug 22474743 (SMF refresh times out due to net class xml-rpc call).
Bug 22474743 is fixed in Appliance Firmware Release 2013.1.7.0
Solution
Upgrade to Appliance Firmware Release 2013.1.7.0 (or later).
Attachments
This solution has no attachment