![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 2174141.1 : Oracle ZFS Storage Appliance: Restart of the Appliance Kit Daemon (akd) May Panic a ZFS Cluster Node
In this Document
Created from <SR 3-12810845751> Applies to:Oracle ZFS Storage ZS4-4 - Version All Versions to All Versions [Release All Releases]Oracle ZFS Storage ZS3-4 - Version All Versions to All Versions [Release All Releases] Oracle ZFS Storage ZS3-2 - Version All Versions to All Versions [Release All Releases] Sun ZFS Storage 7420 - Version All Versions to All Versions [Release All Releases] Sun ZFS Storage 7320 - Version All Versions to All Versions [Release All Releases] 7000 Appliance OS (Fishworks) SymptomsAn Oracle ZFSSA cluster has transitioned into a degraded state. In this particular issue, three things will be true.
The reason for the degraded state may be known (for example, Node-B is turned off), or it may be unknown. You can see these cluster states from the CLI with the following commands. ZFSSA:> configuration cluster show ZFSSA:> configuration cluster links clustron2:0/clustron_uart:0 = AKCIOS_TIMEDOUT While in this cluster state, the management software on Node-A, (aka the Appliance Kit Damon, or akd) is restarted. If akd on Node-A is stopped (or restarted) while akd on Node-B is in unknown state, down, or the head is powered off, this head WILL PANIC Node-A to prevent a situation that might corrupt the data in the pool. In summary, if akd is stopped while akd is not running correctly on the other head, we will panic to prevent data corruption. ZFSSA:> confirm maintenance system restart
If there is access to console, you will see the following panic string in the console log. panic[cpu11]/thread=ffffff005d0eec20: akd_failed:pools_imported;no_working_uarts ffffff005d0eeb10 clustron:clustron_akd_watchdog+d2 () syncing file systems... 1 done
ChangesThis new watchdog feature was added to AK 8.6.0 (2013.05.06.6.0) (2013.1.6) cluster systems. This feature is also present in AK 8.7.0, and will likely remain in future code releases. CausePrior to the release of 2013.1.6, a serious data integrity issue could occur if the Appliance Kit Daemon software stops, while the cluster links are down because both heads could attempt to write to the data pool(s) causing corruption. In order to avoid corrupting data, we will panic the head when akd goes down while akd on the other head is not communicating, for whatever reason. This is expected behavior in AK8.6 and future releases.
SolutionThe solution to the problem is to fix the cluster link issue at hand. Whether it be the simple solution of just powering up the other node, or something more complicated requiring Oracle Support, you need to check the cluster links prior to restarting the Appliance Kit Software.
From the cli, the links should report like this... ZFSSA :> configuration cluster links clustron2:0/clustron_uart:0 = AKCIOS_ACTIVE
Note: this problem does NOT apply to Standalone Appliances (there will be no cluster commands) or Appliances where the cluster is not configured. ZFSSA :> configuration cluster show
There are circumstances when Oracle Support personnel are required to put a cluster into this known state. The most obvious reason is when a clean up of cluster objects (aka stash objects) is required. The steps should NOW be:
Without disabling the watchdog, node-B will panic as soon as akd is disabled.
Ref: Bug 24484064 ZFS appliance panics during akd restart if cluster links are down
Attachments This solution has no attachment |
||||||||||||||||||
|