![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
||
Solution Type Predictive Self-Healing Sure Solution 1021621.1 : ZFS-8000-GH - Too many checksum errors on ZFS device
PreviouslyPublishedAs ZFS-8000-GH Applies to:Oracle SuperCluster M8 HardwareSPARC T7-4 SPARC T7-2 SPARC T7-1 SPARC T8-4 All Platforms PurposeThis document provides additional information for message ID: ZFS-8000-GH DetailsPredictive Self-Healing Article Too many checksum errors on ZFS device Type
Severity
Description
Automated Response
Impact
Suggested Action for System Administrator
# <strong>zpool status -x</strong>
pool: pool1
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
scrub: resilver in progress, 44.83% done, 0h0m to go
config:
NAME STATE READ WRITE CKSUM
pool1 DEGRADED 0 0 0
mirror DEGRADED 0 0 0
spare DEGRADED 0 0 0
disk1 DEGRADED 0 0 162 too many errors
spare1 ONLINE 0 0 0
disk2 ONLINE 0 0 0
spares
spare1 INUSE currently in use
errors: No known data errors.
We can use FMA to get additional information:
# <strong>fmadm faulty</strong>
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Feb 18 09:56:24 d82d1716-c920-6243-e899-b7ddd386902e ZFS-8000-GH Major
Fault class : fault.fs.zfs.vdev.checksum
Description : The number of checksum errors associated with a ZFS device
exceeded acceptable levels. Refer to
http://sun.com/msg/ZFS-8000-GH for more information.
Response : The device has been marked as degraded. An attempt
will be made to activate a hot spare if available.
Impact : Fault tolerance of the pool may be compromised.
Action : Run 'zpool status -x' and replace the bad device.
This tells us all that we need to know. The device disk1 was found to have quite a few checksum errors - so many in fact that it was replaced automatically by a hot spare spare1. The spare was resilvering and a full complement of data replicas would be available soon. The entire process was automatic and completely observable.
ReferencesHTTPS://BLOGS.ORACLE.COM/BOBN/ENTRY/ZFS_AND_FMA_TWO_GREATAttachments This solution has no attachment |
||||||||||||
|
||||||||||||