Oracle ZFS Storage Appliance: Cluster peer's readzillas are listed under problems after upgrade from 2011.1.x to 2013.1.x

Asset ID:	1-72-2010507.1
Update Date:	2018-05-30
Keywords:

Solution Type Problem Resolution Sure

Solution 2010507.1 : Oracle ZFS Storage Appliance: Cluster peer's readzillas are listed under problems after upgrade from 2011.1.x to 2013.1.x

Applies to:

Sun Storage 7310 Unified Storage System - Version All Versions and later
Oracle ZFS Storage ZS4-4 - Version All Versions and later
Sun Storage 7410 Unified Storage System - Version All Versions and later
Sun ZFS Storage 7320 - Version All Versions and later
Sun ZFS Storage 7420 - Version All Versions and later
7000 Appliance OS (Fishworks)

Symptoms

After Exalogic patching to Jan-2015-PSU, found the problems below on the 2nd ZFS head:

DATE DESCRIPTION TYPE PHONED HOME
2015-3-15 16:13:58 ZFS device 'id1,sd@SATA_____TOSHIBA_THNSNC51________23RS100VTM6Z/a' in pool 'exalogic' failed. Major Fault Never
2015-3-15 16:13:58 ZFS device 'id1,sd@SATA_____TOSHIBA_THNSNC51________23RS100STM6Z/a' in pool 'exalogic' failed. Major Fault Never
2015-3-15 16:13:58 ZFS device 'id1,sd@SATA_____TOSHIBA_THNSNC51________23RS1008TM6Z/a' in pool 'exalogic' failed. Major Fault Never
2015-3-15 16:13:57 ZFS device 'id1,sd@SATA_____TOSHIBA_THNSNC51________23RS100DTM6Z/a' in pool 'exalogic' failed.

The 'exalogic' ZFS pool is fully online (owned by 'el01sn01' - head1) - no issues:

pool: exalogic
state: ONLINE
status: The pool is formatted using an older on-disk format.
action: Upgrade the pool using 'zpool upgrade'.

NAME                       STATE     READ WRITE CKSUM
exalogic                   ONLINE       0     0     0
mirror-0                 ONLINE       0     0     0
    c0t5000CCA03EAF49B0d0 ONLINE       0     0     0 1322FMD00G bay=10
    c0t5000CCA03EAFB7CCd0 ONLINE       0     0     0 1322FMD00G bay=9
mirror-1                 ONLINE       0     0     0
    c0t5000CCA03EAFF2B4d0 ONLINE       0     0     0 1322FMD00G bay=2
    c0t5000CCA03EAFF320d0 ONLINE       0     0     0 1322FMD00G bay=5
mirror-2                 ONLINE       0     0     0
    c0t5000CCA03EB3C494d0 ONLINE       0     0     0 1322FMD00G bay=8
    c0t5000CCA03EB3DE30d0 ONLINE       0     0     0 1322FMD00G bay=6
mirror-3                 ONLINE       0     0     0
    c0t5000CCA03EB06DACd0 ONLINE       0     0     0 1322FMD00G bay=13
    c0t5000CCA03EB39C28d0 ONLINE       0     0     0 1322FMD00G bay=7
mirror-4                 ONLINE       0     0     0
    c0t5000CCA03EB42AD4d0 ONLINE       0     0     0 1322FMD00G bay=18
    c0t5000CCA03EB42C30d0 ONLINE       0     0     0 1322FMD00G bay=15
mirror-5                 ONLINE       0     0     0
    c0t5000CCA03EB42D90d0 ONLINE       0     0     0 1322FMD00G bay=17
    c0t5000CCA03EB48E14d0 ONLINE       0     0     0 1322FMD00G bay=11
mirror-6                 ONLINE       0     0     0
    c0t5000CCA03EB48E98d0 ONLINE       0     0     0 1322FMD00G bay=14
    c0t5000CCA03EB48F40d0 ONLINE       0     0     0 1322FMD00G bay=12
mirror-7                 ONLINE       0     0     0
    c0t5000CCA03EB398D0d0 ONLINE       0     0     0 1322FMD00G bay=4
    c0t5000CCA03EB428C0d0 ONLINE       0     0     0 1322FMD00G bay=0
mirror-8                 ONLINE       0     0     0
    c0t5000CCA03EB42824d0 ONLINE       0     0     0 1322FMD00G bay=1
    c0t5000CCA03EB42970d0 ONLINE       0     0     0 1322FMD00G bay=3
logs
c0t5000A72030081B6Fd0    ONLINE       0     0     0 1322FMD00G bay=23
c0t5000A72030081B92d0    ONLINE       0     0     0 1322FMD00G bay=21
c0t5000A72030081B95d0    ONLINE       0     0     0 1322FMD00G bay=20
c0t5000A72030081C82d0    ONLINE       0     0     0 1322FMD00G bay=22
cache
c2t2d0                   ONLINE       0     0     0 1333FMM014 bay=2
c2t3d0                   ONLINE       0     0     0 1333FMM014 bay=3
c2t4d0                   ONLINE       0     0     0 1333FMM014 bay=4
c2t5d0                   ONLINE       0     0     0 1333FMM014 bay=5
c2t2d0                   UNAVAIL      0     0     0
c2t3d0                   UNAVAIL      0     0     0
c2t4d0                   UNAVAIL      0     0     0
c2t5d0                   UNAVAIL      0     0     0
spares
c0t5000CCA03EB49070d0    AVAIL    1322FMD00G bay=16
c0t5000CCA03EB49098d0    AVAIL    1322FMD00G bay=19

errors: No known data errors

The FMA events are reported ON 'el01sn02', but for the readzillas installed ON 'el01sn01':

--------------- ------------------------------------ -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------ -------------- ---------
Mar 15 14:13:57 84ae0767-e083-41d6-a09f-cc6cb2499065 ZFS-8000-D3    Major

Problem Status    : open
    Server_Name   : el01sn02
   Fault class : fault.fs.zfs.device
   Affects     : zfs://pool=64cdce9a513e717c/vdev=955ea6498c365966/pool_name=exalogic/vdev_name=id1,sd@SATA_____TOSHIBA_THNSNC51________23RS100DTM6Z/a

--------------- ------------------------------------ -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------ -------------- ---------
Mar 15 14:13:58 70c5255a-ceb4-469b-a9ba-f846d4505b5d ZFS-8000-D3    Major

Problem Status    : open
    Server_Name   : el01sn02
   Fault class : fault.fs.zfs.device
   Affects     : zfs://pool=64cdce9a513e717c/vdev=925dcb6ff4e932e4/pool_name=exalogic/vdev_name=id1,sd@SATA_____TOSHIBA_THNSNC51________23RS100VTM6Z/a

--------------- ------------------------------------ -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------ -------------- ---------
Mar 15 14:13:58 bd26440a-56db-63e6-918d-d620ae1ecb81 ZFS-8000-D3    Major

Problem Status    : open
    Server_Name   : el01sn02
   Fault class : fault.fs.zfs.device
   Affects     : zfs://pool=64cdce9a513e717c/vdev=3ada275798fec99b/pool_name=exalogic/vdev_name=id1,sd@SATA_____TOSHIBA_THNSNC51________23RS1008TM6Z/a

--------------- ------------------------------------ -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------ -------------- ---------
Mar 15 14:13:58 728c79b5-bbd8-c394-85d0-89ac6b05762e ZFS-8000-D3    Major

Problem Status    : open
    Server_Name   : el01sn02
   Fault class : fault.fs.zfs.device
   Affects     : zfs://pool=64cdce9a513e717c/vdev=559457cb0f886dfb/pool_name=exalogic/vdev_name=id1,sd@SATA_____TOSHIBA_THNSNC51________23RS100STM6Z/a

             NAME        STATE     MANUFACTURER            MODEL                   SERIAL
chassis-000 el01sn01    ok        Oracle                  Sun ZFS Storage 7320    1333FMM014
disk-002     HDD 2       ok        TOSHIBA                 THNSNC512GBSJ           23RS100DTM6Z    <<<<<<<<
disk-003     HDD 3       ok        TOSHIBA                 THNSNC512GBSJ           23RS100VTM6Z    <<<<<<<<
disk-004     HDD 4       ok        TOSHIBA                 THNSNC512GBSJ           23RS100STM6Z    <<<<<<<<
disk-005     HDD 5       ok        TOSHIBA                 THNSNC512GBSJ           23RS1008TM6Z    <<<<<<<<

             NAME        STATE     MANUFACTURER            MODEL                   SERIAL
chassis-000 el01sn02    ok        Oracle                  Sun ZFS Storage 7320    1333FMM00W
disk-002     HDD 2       ok        TOSHIBA                 THNSNC512GBSJ           23RS107QTM6Z
disk-003     HDD 3       ok        TOSHIBA                 THNSNC512GBSJ           23RS107TTM6Z
disk-004     HDD 4       ok        TOSHIBA                 THNSNC512GBSJ           23RS1080TM6Z
disk-005     HDD 5       ok        TOSHIBA                 THNSNC512GBSJ           23RS108GTM6Z

Cause

This is a known issue :

Bug 19024367 (Peer node's RZ listed under problems and new problem entry is created on repair).
Bug 20073746 (Include fru in ereport if chassis information is absent)

This issue is hit consistently during upgrades from 2011.1.x to 2013.1.2.x builds.

This would trigger a false alert on the Maintenance > Problems page when the cluster is in a 'akcs_owner' state and can mislead administrators (during an upgrade window) that there is a problem with the readzillas.

It is an issue with the device 'vdev_devchassis/vdev_chassissn' fields going from 2011.1.x to 2013.1.x.

Solution

Workaround: In the BUI - 'Maintenance > Problems', please 'markrepaired' any problems associated with the readzillas installed on the 'other' head.

NOTE: If this issue is seen at the 'intermediate' stage of a cluster upgrade (ie. one head is upgraded to the 'new' release and the 'other' head is not),
please continue on to upgrade the 'other' head to the 'new' release. Then 'markrepaired' any problems associated with readzillas.

Resolution: Bug 19024367 is fixed in Appliance Firmware Release 2013.1.4.0 - Bug 20073746 is fixed in Appliance Firmware Release 2013.1.5.2.

***Checked for relevance on 30-MAY-2018***

References

<BUG:19024367> - PEER NODE'S RZ LISTED UNDER PROBLEMS AND NEW PROBLEM ENTRY IS CREATED ON REPAIR
<BUG:20073746> - IGNORE EREPORTS IF CHASSIS INFORMATION IS ABSENT

Attachments

This solution has no attachment