Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2176634.1
Update Date:2017-01-18
Keywords:

Solution Type  Problem Resolution Sure

Solution  2176634.1 :   Executing 'zpool replace' or 'zpool attach' May Generate an 'fault.fs.zfs.vdev.dtl' (ZFS-8000-QJ) FMA Fault  


Related Items
  • Fujitsu M10-1
  •  
  • Solaris x64/x86 Operating System
  •  
  • Solaris Operating System
  •  
Related Categories
  • PLA-Support>Sun Systems>SAND>Kernel>SN-SND: Sun Kernel ZFS
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Applies to:

Solaris Operating System - Version 11 11/11 to 11.2 [Release 11.0]
Solaris x64/x86 Operating System - Version 11 11/11 to 11.2 [Release 11.0]
Fujitsu M10-1
Information in this document applies to any platform.

Symptoms

Should the issue described in this document be encountered, shortly after executing a 'zpool replace pool device new_device' or 'zpool attach pool device new_device', FMA may generate a new fault.  The fault summary can be seen in /var/adm/messages

Aug 25 12:30:08 hostname fmd: [ID 377184 daemon.error] SUNW-MSG-ID: ZFS-8000-QJ, TYPE: Fault, VER: 1, SEVERITY: Minor
Aug 25 12:30:08 hostname EVENT-TIME: Thu Aug 25 12:30:07 EDT 2016
Aug 25 12:30:08 hostname PLATFORM: unknown, CSN: unknown, HOSTNAME: hostname
Aug 25 12:30:08 hostname SOURCE: zfs-diagnosis, REV: 1.0
Aug 25 12:30:08 hostname EVENT-ID: 96d96404-35b6-4fef-8c73-a606186ccdbb
Aug 25 12:30:08 hostname DESC: Missing data on ZFS device 'id1,sd@n5000cca0703af118/a' in pool 'rpool'. Applications are unaffected if sufficient replicas exist.
Aug 25 12:30:08 hostname AUTO-RESPONSE: An attempt will be made automatically to recover the data. The device and pool will be degraded.
Aug 25 12:30:08 hostname IMPACT: The device and pool may continue functioning in degraded state until data is recovered.
Aug 25 12:30:08 hostname REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Run 'zpool status -lx' for more information. Please refer to the associated reference document at http://support.oracle.com/msg/ZFS-8000-QJ for the latest service procedures and policies regarding this diagnosis.

The fault details can be viewed using 'fmadm faulty'

$ fmadm faulty
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Aug 25 12:30:07 96d96404-35b6-4fef-8c73-a606186ccdbb  ZFS-8000-QJ    Minor   

Problem Status    : resolved
Diag Engine       : fmd / 1.2
System
    Manufacturer  : unknown
    Name          : unknown
    Part_Number   : unknown
    Serial_Number : unknown
    Host_ID       : 84f97beb

----------------------------------------
Suspect 1 of 1 :
   Fault class : fault.fs.zfs.vdev.dtl
   Certainty   : 100%
   Affects     : zfs://pool=66a5950dd1f75e4/vdev=9d8c69da15185db3/pool_name=rpool/vdev_name=id1,sd@n5000cca0703af118/a

   FRU
     Name             : "zfs://pool=66a5950dd1f75e4/vdev=9d8c69da15185db3/pool_name=rpool/vdev_name=id1,sd@n5000cca0703af118/a"
        Status        : repaired

Description : Missing data on ZFS device 'id1,sd@n5000cca0703af118/a' in pool
              'rpool'. Applications are unaffected if sufficient replicas
              exist.

Response    : An attempt will be made automatically to recover the data. The
              device and pool will be degraded.

Impact      : The device and pool may continue functioning in degraded state
              until data is recovered.

Action      : Use 'fmadm faulty' to provide a more detailed view of this event.
              Run 'zpool status -lx' for more information. Please refer to the
              associated reference document at
              http://support.oracle.com/msg/ZFS-8000-QJ for the latest service
              procedures and policies regarding this diagnosis.

Running fmdump -e will show a DTL (Dirty Time Log) ereport

$ fmdump -e
Aug 25 12:29:54.8026 ereport.fs.zfs.vdev.dtl

To get the event detail:

$ fmdump -eV -c "ereport.fs.zfs.vdev.dtl"
Aug 25 2016 12:29:54.802620422 ereport.fs.zfs.vdev.dtl
nvlist version: 0
        class = ereport.fs.zfs.vdev.dtl
        ena = 0xc0547f0e25404c01
        detector = (embedded nvlist)
        nvlist version: 0
                version = 0x0
                scheme = zfs
                pool = 0x66a5950dd1f75e4
                vdev = 0x9d8c69da15185db3
        (end detector)

        pool = rpool
        pool_guid = 0x66a5950dd1f75e4
        pool_context = 0
        pool_failmode = wait
        vdev_guid = 0x9d8c69da15185db3
        vdev_type = disk
        vdev_path = /dev/dsk/c0t5000CCA0703AF118d0s0 
        vdev_devid = id1,sd@n5000cca0703af118/a
        parent_guid = 0xcafd14107238129e
        parent_type = mirror
        prev_state = 0x0
        __ttl = 0x1
        __tod = 0x57bf1d02 0x2fd70406

  

Note:  The disk (vdev_path) will be the new device being introduced in to to the zpool

 Reviewing the 'zpool history -il' for the pool in question will confirm which command was executed in or around the time of the DTL ereport and fault

$ zpool history -il rpool
2016-08-25.12:03:16 zpool create -f -B rpool c0t5000CCA070384F60d0 [user root on solaris:global]
......
2016-08-25.12:29:54 [internal pool scrub txg:513] func=2 mintxg=3 maxtxg=516 logs=0 [user root on hostname]
2016-08-25.12:30:11 [internal vdev attach txg:518] attach vdev=/dev/dsk/c0t5000CCA0703AF118d0s0 to vdev=/dev/dsk/c0t5000CCA070384F60d0s0 [user root on hostname]
2016-08-25.12:30:11 zpool attach -f rpool c0t5000CCA070384F60d0 c0t5000CCA0703AF118d0 [user root on hostname:global]

 

Changes

The issue can be observed when attaching (mirroring) or replacing an existing vdev in a zpool.

Cause

ZFS uses DTLs (Dirty Time Logs) to understand which devices are missing data and performs the necessary resilver operation automatically.  When a device is attached as a mirror or being used to replace an existing pool vdev (Virtual Device), it is expected that it should have no data.  ZFS initiates an internal DTL and resilver event.  Due to 'Bug 19304740 - Events related to missing DTLs cause submirrors to be marked as degraded', the DTL was being reported as a fault rather than being silently handled internally.

Solution

This issue is addressed in the following releases:

SPARC & x86 Platform

  • Solaris 11.3
  • Solaris 11.2 with SRU 14.5 or higher
Note: Solaris 10 is not affected by this issue

 

References

<BUG:19304740> - EVENTS RELATED TO MISSING DTLS CAUSE SUBMIRRORS TO BE MARKED AS DEGRADED.

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback