Sun Storage 7000 Unified Storage System: How to identify a faulted cache disk in a cluster

Asset ID:	1-72-1557748.1
Update Date:	2018-01-05
Keywords:

Solution Type Problem Resolution Sure

Solution 1557748.1 : Sun Storage 7000 Unified Storage System: How to identify a faulted cache disk in a cluster

Applies to:

Sun ZFS Storage 7320 - Version All Versions and later
Sun ZFS Storage 7420 - Version All Versions and later
Sun Storage 7720 Unified Storage System - Version All Versions and later
Sun Storage 7310 Unified Storage System - Version All Versions and later
Sun Storage 7410 Unified Storage System - Version All Versions and later
7000 Appliance OS (Fishworks)
How to identify if a cache disk is really faulted and which head of a cluster a cache disk belongs to.

Symptoms

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Disk Storage ZFS Storage Appliance

A 'faulted' status on a (readzilla) cache disk can happen on 7x10, 7x20, ZS3-x & ZS4-x clustered configurations.

Changes

Some takeover/failback operations have been performed earlier.

The cluster is working fine, but some or all of the cache disks remain in a faulted status.

This might cause performance issues on the cluster heads.

Cause

This applies when both heads of a ZFS Storage Appliance cluster have one or more readzillas.

When examining the zpool status from one of the cluster heads, we notice that some or all of the cache devices are faulted.

Solution

In order to determine the cache device status, we should determine which cluster head is 'owns' which cache disks:

* The cache disk always belongs to the pool.

* The pool is owned by a node

* The node currently owning the pool might or might not be the initialized owner of the pool

To do this, we should check which node was initialized as the owning pool by using the CLI command:

NAS> configuration cluster resources show

The owner node is the node that SHOULD own the pool when the cluster is in a 'normal running' status.

Now to see which pool is currently owned by the node, we could use the CLI command:

NAS> configuration storage show

This will indicate which pool is currently owned by a node.

If there is a mismatch in the above data and the head supposed to own the pool is not currently owning it, and if this pool contains cache disks, then the cache disks will appear as faulted.

Those disks are not to be replaced, but the pool must be rebalanced (using 'failback') so that the owner head currently owns its pool.

The 'designated' owner could also be changed to correct this by modifying the "owner" field in the configuration cluster menu on the BUI.

If the 'current' owner and the 'assigned' owner are the same, then the cache disks may be really faulted or failed. In this case, customers should open a Service Request at Oracle Support.

Attachments

This solution has no attachment