![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||
Solution Type Technical Instruction Sure Solution 2216800.1 : Oracle ZFS Storage Appliance: How to address False Clustron FMA Faults
FMA fault on Clustron devices may be observed even though hardware is functioning correctly. This document explains the background, occurrences, and workaround. In this Document
Applies to:Sun ZFS Storage 7420 - Version All Versions and laterSun ZFS Storage 7320 - Version All Versions and later Oracle ZFS Storage ZS3-4 - Version All Versions and later Oracle ZFS Storage ZS3-2 - Version All Versions and later Oracle ZFS Storage ZS4-4 - Version All Versions and later 7000 Appliance OS (Fishworks) FMA fault on Clustron devices may be observed even though hardware is functioning correctly. This document explains the background, occurrences, and workaround. GoalIn order to identify faulted Clustron links, 8.6 has introduced a new functionality which provides some visibility to Clustron connectivity faults. However, in certain situations, the Clustron link will be marked offline correctly, but the subsequent online event will not be recognized, so that the Clustron link is incorrectly marked faulted. This can occur after reboot, or any other time that akd restarts. Faults because of this bug do not impact the functionality of the clustering subsystem. However, genuine hardware faults are flagged in the same way, so the user should not assume that all faults are due to this bug. SolutionWe provide here a workaround for this issue. If the customer desires to clear the faults, he/she should first verify the state of the Clustron hardware by using the “configuration cluster links” command and verifying that all 3 links are active. Assuming they are active, the “maintenance problems mark repaired” command can clear the fault. This is an example of a case where hardware is good, and any fault may be cleared: hostname:> configuration cluster links show clustron2_embedded:0/clustron_uart:0 = AKCIOS_ACTIVE In this example, the hardware is correctly faulted, and should not be marked repaired: hostname:> configuration cluster links show clustron2_embedded:0/clustron_uart:0 = AKCIOS_TIMEDOUT
When the hardware is good, the 'markrepaired' functionality should be used in order to clear this issue : hostname:> maintenance problems select problem-000 markrepaired
That should be done for problems of these types : Communication with the cluster peer via the serial link is lost. Communication with the cluster peer via the Ethernet port is lost.
If the problem persists despite this action, the customer should gather a support bundle from each cluster node and Oracle Support will need to be engaged. If an AKCIOS_TIMEDOUT state is observed on any of the links, Oracle Support will also need to be engaged.
Bug 23092294 describes this issue (Fixed in Micro Release 2013.1.6.13). Attachments This solution has no attachment |
||||||||||||||
|