ZFS-8000-8A - A file or directory could not be read due to corrupt data

Asset ID:	1-79-1021607.1
Update Date:	2018-03-27
Keywords:

Solution Type Predictive Self-Healing Sure

Solution 1021607.1 : ZFS-8000-8A - A file or directory could not be read due to corrupt data

Applies to:

SPARC M7-8
SPARC M7-16
Solaris Operating System - Version 10 3/05 and later
SPARC T8-1
SPARC T8-2
All Platforms

Purpose

Provide additional information for message ID: ZFS-8000-8A

Details

Predictive Self-Healing Article
ZFS-8000-8A - Corrupted data

Corrupted data

Type

: Fault; fault.fs.zfs.object.corrupt_data

Severity

: Critical

Description

: A file or directory could not be read due to corrupt data.

Automated Response

: No automated response will be taken.

Impact

: The file or directory is unavailable.

Suggested Action for System Administrator

Run 'zpool status -x' to determine which pool is damaged:

# zpool status -x
  pool: test
 state: ONLINE
status: One or more devices has experienced an error and no valid replicas
        are available.  Some filesystem data is corrupt, and applications
        may have been affected.
action: Destroy the pool and restore from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME                  STATE     READ WRITE CKSUM
        test                  ONLINE       0     0     2
          c0t0d0              ONLINE       0     0     2
          c0t0d1              ONLINE       0     0     0

errors: 1 data errors, use '-v' for a list

The checksum errors as reported above can happen anywhere in the I/O data-path as soon as ZFS has submitted an I/O. This includes Solaris Target drivers and HBA driver bugs, DMA transfers, HBA F/W, the I/O link, SAN and it's components (if involved, like SAN switches) and the end target device (including the Storage Array controller, F/W and devices that might make up the LUNs being presented as vdev to ZFS zpools). ZFS has no control over any of these but is the only conventional file-system that has ability (due to built-in checksum) to detect and report these errors. Please note that there is no need to have associated I/O errors (like SCSI errors) for ZFS to report checksum errors.

Unfortunately, if the data cannot be repaired, then the only choice to repair the data is to restore the pool from backup. Applications attempting to access the corrupted data will get an error (EIO), and data may be permanently lost.

On recent versions of Solaris, the list of affected files can be retrieved by using the '-v' option to 'zpool status':

# zpool status -xv
  pool: test
 state: ONLINE
status: One or more devices has experienced an error and no valid replicas
        are available.  Some filesystem data is corrupt, and applications
        may have been affected.
action: Destroy the pool and restore from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:

        NAME                  STATE     READ WRITE CKSUM
        test                  ONLINE       0     0     2
          c0t0d0              ONLINE       0     0     2
          c0t0d1              ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        /export/example/foo

Damaged files may or may not be able to be removed depending on the type of corruption. If the corruption is within the plain data, the file should be removable. If the corruption is in the file metadata, then the file cannot be removed, though it can be moved to an alternate location. In either case, the data should be restored from a backup source. It is also possible for the corruption to be within pool-wide metadata, resulting in entire datasets being unavailable. If this is the case, the only option is to destroy the pool and re-create the datasets from backup.

          Based on the zpool configuration (replicated; either mirror or raidz type vs non-replicated; simple stripe
        as above) will determine if running "zpool scrub" will help with the "self-heal" feature of ZFS.

          All pool meta-data is replicated (even on non-replicated zpool configurations) and thus will have good
          chance of recovery via "self-heal" if scrub operation is performed. Similarly, for replicated zpool
          configurations, scrub might be able to repair the checksum errors if good copy is available.

          So, running:

     # zpool clear test

          Followed by:

     # zpool scrub test

          And, waiting for the scrub to finish and then running:

     # zpool status -v test

          To see if the scrub was really able to "self-heal" any of the corrupted data will be good idea.
      If the errors remain "Permanent" even after the scrub operation, then taking action as described as
          part of "zpool status -v" output, which is to either remove and/or restore the files in question is way
          to proceed to get the zpool back in "healthy" state.

Details

: The Message ID: ZFS-8000-8A indicates corrupted data exists in the current pool

Product
Solaris Operating System

Product_uuid
596ffcfa-63d5-11d7-9886-ac816a682f92

Attachments

This solution has no attachment