Precautions while working with guest LDoms containing virtual devices on SPARC SuperCluster

Asset ID:	1-72-1523629.1
Update Date:	2013-02-12
Keywords:

Solution Type Problem Resolution Sure

Solution 1523629.1 : Precautions while working with guest LDoms containing virtual devices on SPARC SuperCluster

Applies to:

SPARC SuperCluster T4-4 Full Rack - Version All Versions to All Versions [Release All Releases]
SPARC SuperCluster T4-4 Half Rack - Version All Versions to All Versions [Release All Releases]
SPARC SuperCluster T4-4 - Version All Versions to All Versions [Release All Releases]
Oracle Solaris on SPARC (64-bit)

Symptoms

In a SPARC SuperCluster configurations with more than 2 LDoms/Domains like ConfigE/ConfigF of SSC 1.0.1 software version, the first general purpose (GP) domain has virtual devices as its' boot and boot mirror disks. Rebooting any of the I/O domains will cause the zpool of the first GP to be in degraded state and a manual intervention is required to fix/correct the zpool state. Rebooting the first GP domain where the zpool is in a degraded state could lead in zpool data corruption as the mirrors are out of sync and eventually result in unbootable / unstable domain.

The first GP domain has the following virtual disks.

root@ssccn1-app1 # echo | format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c0d0 <SUN-DiskSlice-279GB cyl 32546 alt 2 hd 27 sec 668>
          /virtual-devices@100/channel-devices@200/disk@0
       1. c0d1 <SUN-DiskSlice-279GB cyl 32546 alt 2 hd 27 sec 668>
          /virtual-devices@100/channel-devices@200/disk@1

In the case where one of its' I/O provider domains is rebooted; loss of access to one of it's root drives results in the following messages on the console:

The below message infers that one of the I/O domain might be down due to which the vdisk (vdisk@0) sourced from this domain is not accessible.

Apr 17 18:30:01 ssccn1-app1 vdc: NOTICE: vdisk@0 disk access failed

The below message infers that the I/O domain is up and running and hence the vdisk sourced from this domain is now accessible.

Apr 17 18:30:27 ssccn1-app1 vdc: NOTICE: vdisk@0 disk access recovered

Login to the first GP domain we see the root pool (BIrpool2) is degraded as one of the mirror device is 'UNAVAILABLE' which is the result of the reboot of the I/O domain that sources the vdisk (c0d0s0).

ssccn1-app1 console login: root
Password:

root@ssccn1-app1 # zpool status
pool: BIrpool-2
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist
for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see:
http://www.sun.com/msg/ZFS-8000-2Q
scan: resilvered 8.82G in 0h6m with 0 errors on Tue Apr 17 17:43:09 2012
config:

        NAME         STATE     READ WRITE CKSUM
        BIrpool-2    DEGRADED     0     0     0
          mirror-0   DEGRADED     0     0     0
            c0d1s0   ONLINE       0     0     0
            c0d0s0   UNAVAIL      4   319     0 cannot open

errors: No known data errors

The disk remains in this state until a) The serving domain is back up and b) You manually repair the zpool using 'zpool clear'. If you do not manually repair the zpool, then the disk will remain in this state and will obviously not receive any updates. It will remain a stale mirror indefinitely.

If we reboot all 3 domains simultaneously, then the primary domain will serve it's vdisk before the last domain is able to serve its vdisk. So when the middle domain boots from c0d0 (stale mirror), the up-to-date mirror on c0d1 is not visible yet. ZFS can't know at this point that there is a more up-to-date mirror out there. So it mounts the old mirror and continues as if everything is OK, but having jumped back in time! When the last domain is available and it's mirror is mounted, it just looks "different" so ZFS will re-silver it, and you lose all that later information.

Cause

In a SPARC SuperCluster configurations with more than 2 LDoms like ConfigE/ConfigF of SSC 1.0.1 software version, the first general purpose (GP) domain has virtual devices as its' boot and boot mirror disk. This is by design since this LDom's only has root complex with no physical disks.

We have a max of 4 LDoms/domains per T4-4 node. Each domain owns at least one Root Complex (RC).

There are 4 Root Complexes (RC) on T4-4 domain:

RC0 - pci@400 - owns 4 physical disks
RC1 - pci@500 - No disks owned
RC2 - pci@600 - No disks owned
RC3 - pci@700 - has 2 physical disks and 2 SSDs

There are four possible LDOM types in SPARC SuperCluster:

LDOM for an Oracle DB stack running on Solaris 11 (2-PCI root complexes minimum)
LDOM for a General Purpose stack running on Solaris 11 (1-PCI root complex minimum)
LDOM for a General Purpose stack running on Solaris 10 (1-PCI root complex minimum)
LDOM for a Middleware stack running on Solaris 11 (1-PCI root complex maximum)

In a SPARC SuperCluster LDom/domain configuration with more than 2 LDoms/Domains, RC1 (pci@500) and RC2 (pci@600) are assigned disk slices from the other root complexes (RC0 & RC3) to act as their boot and book mirror virtual disks.

For Example:

Config F on SPARC SuperCluster we have

Primary LDom (11gR2) has two root complexes, RC0 and RC1
First General Purpose domain has RC2 root complex
Second General Purpose domain has RC3 root complex

Here RC2 which is part of the first general purpose domain uses the virtual disks sourced from RC0 and RC3 as their boot and mirror virtual disks.

Solution

To avoid this issue:

Before performing any OS work on a SuperCluster domain (like QMU installation & reboots), always check "zpool status" to ensure the root zpool is in a healthy state. If the zpool status is degraded then execute "zpool clear" to fix the pool state.
When rebooting domains on a SuperCluster always do it sequentially and in order so that the middle (diskless) domain(s) do not lose both their mirrors simultaneously, and always use "zpool clear" afterward to ensure the zpool is recovered. Refer MOS note: 1487791.1 on How to cleanly shutdown and startup a SPARC SuperCluster T4-4.
A new tool called ssctuner via the exa-family repository is available with January QMU. One of the functions of this tool is to check the health of root zpools every 2 minutes and run "pool clear" whenever it detects a faulted vdisk (meaning the administrator doesn't have to remember to do it manually if the other domains are rebooted).

References

<NOTE:1487791.1> - How to cleanly shutdown and startup a SPARC SuperCluster T4-4
<BUG:15786278> - SUNBT7162378 REBOOTING SUPERCLUSTER LDOM'S CAN LEAVE AN LDOM UNBOOTABLE (CONFIGE
<BUG:15788025> - SUNBT7164499 VDISK DRIVER SHOULD SEND DISK OFFLINE/ONLINE SYSTEM EVENTS

Attachments

This solution has no attachment