Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-79-1021621.1
Update Date:2018-04-17
Keywords:

Solution Type  Predictive Self-Healing Sure

Solution  1021621.1 :   ZFS-8000-GH - Too many checksum errors on ZFS device  


Related Items
  • SPARC T8-1
  •  
  • SPARC T8-4
  •  
  • SPARC T7-4
  •  
  • SPARC M8-8
  •  
  • SPARC M7-8
  •  
  • Oracle SuperCluster M7 Hardware
  •  
  • Solaris Operating System
  •  
  • SPARC T8-2
  •  
  • Oracle SuperCluster M8 Hardware
  •  
  • SPARC T7-2
  •  
  • OpenSolaris Operating System
  •  
  • SPARC T7-1
  •  
  • SPARC M7-16
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun PSH
  •  

PreviouslyPublishedAs
ZFS-8000-GH


Applies to:

Oracle SuperCluster M8 Hardware
SPARC T7-4
SPARC T7-2
SPARC T7-1
SPARC T8-4
All Platforms

Purpose

This document provides additional information for message ID: ZFS-8000-GH

Details

Predictive Self-Healing Article

Too many checksum errors on ZFS device

Type

Fault
  fault.fs.zfs.vdev.checksum

Severity

Major

Description

The Message ID: ZFS-8000-GH indicates a ZFS device experienced too many checksum errors to continue, and may be faulty.  

Automated Response

The device has been marked as degraded. An attempt will be made to activate a hot spare if available.

Impact

The fault tolerance of the pool may be affected.

Suggested Action for System Administrator

 

A device within a ZFS pool experienced too many checksum errors. Use 'zpool status -x' to determine exactly which device failed and why:
 
# <strong>zpool status -x</strong>
  pool: pool1
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver in progress, 44.83% done, 0h0m to go
config:

        NAME          STATE     READ WRITE CKSUM
        pool1         DEGRADED     0     0     0
          mirror      DEGRADED     0     0     0
            spare     DEGRADED     0     0     0
              disk1   DEGRADED     0     0   162  too many errors
              spare1  ONLINE       0     0     0
            disk2     ONLINE       0     0     0
        spares
          spare1      INUSE     currently in use

errors: No known data errors.
        We can use FMA to get additional information:
# <strong>fmadm faulty</strong>
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Feb 18 09:56:24 d82d1716-c920-6243-e899-b7ddd386902e  ZFS-8000-GH    Major    

Fault class : fault.fs.zfs.vdev.checksum

Description : The number of checksum errors associated with a ZFS device
              exceeded acceptable levels.  Refer to
              http://sun.com/msg/ZFS-8000-GH for more information.

Response    : The device has been marked as degraded.  An attempt
              will be made to activate a hot spare if available.

Impact      : Fault tolerance of the pool may be compromised.

Action      : Run 'zpool status -x' and replace the bad device.
 
This tells us all that we need to know. The device disk1 was found to have quite a few checksum errors - so many in fact that it was replaced automatically by a hot spare spare1. 
The spare was resilvering and a full complement of data replicas would be available soon. The entire process was automatic and completely observable. 
 

To repair the pool, replace the physical device in the system and issue a 'zpool replace' command:

 
# zpool replace pool1 disk1

To replace the disk with a different disk, specify the replacement disk as the second argument to 'zpool replace':

# zpool replace pool1 disk1 new-disk

This will begin resilvering data to the new device. Use 'zpool status' to monitor resilvering progress. When the resilvering completes, any hot spares will be removed and the pool will return to the healthy state.

If the device has been diagnosed in error, then run 'zpool clear pool1' to clear the errors and the associated status. If the errors persist, the device may be diagnosed as faulty again. Replace the disk as described above, or contact your service provider.

 

 

 


Product
Solaris Operating System

Product_uuid
596ffcfa-63d5-11d7-9886-ac816a682f92

References

HTTPS://BLOGS.ORACLE.COM/BOBN/ENTRY/ZFS_AND_FMA_TWO_GREAT

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback