Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1966470.1
Update Date:2018-05-25
Keywords:

Solution Type  Problem Resolution Sure

Solution  1966470.1 :   Oracle ZFS Storage Appliance: Unable to 'markrepaired' persistent FMA events for a replaced readzilla  


Related Items
  • Sun ZFS Storage 7420
  •  
  • Oracle ZFS Storage ZS5-2
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Oracle ZFS Storage ZS4-4
  •  
  • Sun Storage 7410 Unified Storage System
  •  
  • Oracle ZFS Storage ZS5-4
  •  
  • Sun ZFS Storage 7120
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Sun ZFS Storage 7320
  •  
  • Oracle ZFS Storage ZS3-BA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Symptoms
Cause
Solution


Created from <SR 3-10173633549>

Applies to:

Oracle ZFS Storage ZS3-BA - Version All Versions and later
Sun ZFS Storage 7320 - Version All Versions and later
Sun ZFS Storage 7120 - Version All Versions and later
Oracle ZFS Storage ZS3-4 - Version All Versions and later
Sun Storage 7310 Unified Storage System - Version All Versions and later
7000 Appliance OS (Fishworks)

Symptoms

Symptoms observable by the customer:

  •     Problem is reported in 'maintenance problems'
  •     Attempting to 'markrepaired' the issue is rarely successful and in the situations where 'markrepaired' is successful, the problems will re-appear at a later date.
  •     Performance issues (when readzillas are not configured into a pool)

    Reported alerts:

      ZFS device 'id1,sd@SATA_____TOSHIBA_THNSNC51________522S10AZTM6Z/a' in pool '<POOL>' failed.
      ZFS device 'id1,sd@SATA_____TOSHIBA_THNSNC51________42JS100LTM6Z/a' in pool '<POOL>' failed to open.

 

Problems are reported against a readzilla in the data pool:

NAS1:maintenance problems> show
Problems:

COMPONENT DIAGNOSED TYPE DESCRIPTION
problem-000 2015-1-24 02:23:25 Major Fault ZFS device 'id1,sd@SATA_____TOSHIBA_THNSNC51________522S10AZTM6Z/a' in pool 'exalogic' failed.
problem-001 2015-1-26 02:11:43 Major Fault ZFS device 'id1,sd@SATA_____TOSHIBA_THNSNC51________522S10AZTM6Z/a' in pool 'exalogic' failed.
problem-002 2015-1-26 06:03:06 Major Fault ZFS device 'id1,sd@SATA_____TOSHIBA_THNSNC51________522S10AZTM6Z/a' in pool 'exalogic' failed.


However, the readzilla with the reported serial number '522S10AZTM6Z' was successfully replaced several months ago.


 - The SP does NOT show any 'faults':

  -> show faulty
  Target | Property | Value
  --------------------+------------------------+---------------------------------

 

FMA event:

NAS1# fmadm faulty
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Jan 22 08:37:33 202f34d0-66af-edd0-de30-bdf8b8b0fd4d  ZFS-8000-D3    Major

Problem Status    : open
Diag Engine       : zfs-diagnosis / 1.0
System
    Manufacturer  : unknown
    Name          : unknown
    Part_Number   : unknown
    Serial_Number : unknown

System Component
    Manufacturer  : Oracle-Corporation
    Name          : SUN-FIRE-X4170-M2-SERVER
    Part_Number   : 7046051
    Serial_Number : 1225FMM0H1
    Host_ID       : 00000000

----------------------------------------
Suspect 1 of 1 :
   Fault class : fault.fs.zfs.device
   Certainty   : 100%
   Affects     : zfs://pool=26b2fed27b4904f7/vdev=4722f5983e1f4b4a/pool_name=exalogic/vdev_name=id1,sd@SATA_____TOSHIBA_THNSNC51________522S10AZTM6Z/a
   Status      : faulted and taken out of service

   FRU
     Name             : "zfs://pool=26b2fed27b4904f7/vdev=4722f5983e1f4b4a/pool_name=exalogic/vdev_name=id1,sd@SATA_____TOSHIBA_THNSNC51________522S10AZTM6Z/a"
        Status        : faulty

Description : ZFS device
              'id1,sd@SATA_____TOSHIBA_THNSNC51________522S10AZTM6Z/a' in pool
              'exalogic' failed.

Response    : No automated response will occur.

Impact      : Fault tolerance of the pool may be compromised.

Action      : Replace the affected device. If you are a qualified service
              person, detailed information on this problem can be found at
              http://support.oracle.com/msg/ZFS-8000-D3



NOTE: The readzilla serial number given in 'maintenance problems' ('522S10AZTM6Z' above) shows that this readzilla is NOT present in the current cluster configuration:

             NAME        STATE     MANUFACTURER            MODEL                   SERIAL
chassis-000  slce34sn01  ok        Oracle                  Sun ZFS Storage 7320    1225FMM0H1
........
disk-002     HDD 2       ok        TOSHIBA                 THNSNC512GBSJ           52VS104XTM6Z
disk-003     HDD 3       ok        TOSHIBA                 THNSNC512GBSJ           52VS1099TM6Z
disk-004     HDD 4       ok        TOSHIBA                 THNSNC512GBSJ           52VS105ITM6Z
disk-005     HDD 5       ok        TOSHIBA                 THNSNC512GBSJ           Z1IS102VTM6Z


             NAME        STATE     MANUFACTURER            MODEL                   SERIAL
chassis-000  slce34sn02  ok        Oracle                  Sun ZFS Storage 7320    1224FMM00J
........
disk-002     HDD 2       ok        TOSHIBA                 THNSNC512GBSJ           522S10ALTM6Z
disk-003     HDD 3       ok        TOSHIBA                 THNSNC512GBSJ           522S10AMTM6Z
disk-004     HDD 4       ok        TOSHIBA                 THNSNC512GBSJ           43AS101HTM6Z
disk-005     HDD 5       ok        TOSHIBA                 THNSNC512GBSJ           522S101FTM6Z



The FMA event re-appears after manually removing it as per the resolution given in MOS Document ID 1587750.1

  Sun Storage 7000 Unified Storage System: Disk remains marked as 'faulty' in hardware view, despite several replacements done. (Unable to clear FMA event)

 

Cause

Bogus 'stale' entries for readzillas in the system 'stash'.


Related bugs:

    CR 19639176 (cannot replace CACHE drive in 7420)
    CR 19832375 (Akd enumerates wrongly devices)

 

Solution

Recommended action for the customer:

You will need to engage Oracle Support, by opening a Service Request, so that Oracle Support Services can provide confirmation of this issue and then carry out the appropriate activities to resolve the issue.


Recommended actions for the Oracle Support engineer:

  Delete the 'bogus (stale)' stash entries for the affected readzillas - see the following wiki document:

    https://stbeehive.oracle.com/teamcollab/wiki/AmberRoadSupport:Solaris+shell+procedure+to+remove+bogus+stash+entries+for+readzillas


  If you cannot access this document engage NAS Storage-TSC for assistance

 

 

***Checked for relevance on 25-MAY-2018***


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback