Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1366035.1
Update Date:2018-05-01
Keywords:

Solution Type  Troubleshooting Sure

Solution  1366035.1 :   Oracle ZFS Storage Appliance: Troubleshooting Disk Drive Failures  


Related Items
  • Sun ZFS Storage 7420
  •  
  • Oracle ZFS Storage Appliance Racked System ZS5-4
  •  
  • Oracle ZFS Storage ZS5-2
  •  
  • Exalogic Elastic Cloud X4-2 Quarter Rack
  •  
  • Sun Storage 7110 Unified Storage System
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • OPC ZFS ZS5-ES Rack
  •  
  • Sun Storage 7210 Unified Storage System
  •  
  • Oracle ZFS Storage ZS4-4
  •  
  • Oracle ZFS Storage ZS5-4
  •  
  • Sun Storage 7410 Unified Storage System
  •  
  • Sun ZFS Storage 7120
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Oracle ZFS Storage Appliance Racked System ZS4-4
  •  
  • PDIT Single Rack ZFS Storage ZS4-4
  •  
  • Sun ZFS Storage 7320
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  
  • _Old GCS Categories>Sun Microsystems>Storage - Disk>Unified Storage
  •  



In this Document
Purpose
Troubleshooting Steps
 1. What problem are you encountering?
 2. Verify the problems list from the Appliance
 3. Check against the following FMA faults
 4. Check drive status
 5. Disk fault matches DISK-8000-12
 6. Clear fault
 7. Check pool status
 8. Are the faulted drives located on the same tray?
 9. Did the drives fail in close succession?
 10. Check if the HW configuration is correct
 11. Re-seat the drive
References


Applies to:

Sun Storage 7210 Unified Storage System - Version All Versions and later
Exalogic Elastic Cloud X4-2 Quarter Rack - Version X4 to X4 [Release X4]
Oracle ZFS Storage Appliance Racked System ZS4-4 - Version All Versions and later
PDIT Single Rack ZFS Storage ZS4-4 - Version All Versions to All Versions [Release All Releases]
Sun Storage 7110 Unified Storage System - Version All Versions and later
7000 Appliance OS (Fishworks)

Purpose

The purpose of this document is to troubleshoot disk drive failures on a Sun Storage 7000 ZFS Appliance.

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Disk Storage ZFS Storage Appliance Community


NAS head revision : not dependent
BIOS revision : not dependent
ILOM revision : not dependent
JBODs Model : J4400|J4410|J4500|Oracle Storage DE2-24C
CLUSTER related : not dependent

Troubleshooting Steps

1. What problem are you encountering?

  A drive has just failed  Step 2
  Multiple drive failure  Step 7
  One or more drives were added to the appliance, but they were not recognized by the system  Step 10
  A drive has been replaced and the new drive is not working  Step 11

2. Verify the problems list from the Appliance

Check the problems list

From CLI

maintenance problems show  


From BUI

  1. Click Maintenance
  2. Click System

From support bundle

cat /fm/fmadm.out

Found any faults?

  • If YES, proceed to step 3.
  • If NO, contact Oracle to investigate this issue.

 

3. Check against the following FMA faults

For the disk you have identified at step 1, check if it matches one of these FMA faults/problem descriptions:

Error CodeDescription
DISK-8000-5C The device has failed. The service may have been lost or degraded.
DISK-8000-0X SMART health-monitoring firmware reported that a failure is imminent on disk.
DISK-8000-2J The disk has failed.
DISK-8000-3E The device has failed. The service may have been lost or degraded.
DISK-8000-4Q SCSI fault for media.
DISK-8000-D5 SCSI disk in zpool has transient ZFS checksum fault
DISK-8000-74 SCSI disk medium read fault
DISK-8000-8D SCSI disk medium write fault
DISK-8000-CY SCSI disk in zpool has ZFS checksum fault
DISK-8000-6R SCSI transport unstable fault
ZFS-8000-LR  ZFS device failed to open.  Please refer to Document : 2133261.1
ZFS-8000-FD  The number of I/O errors associated with the device has exceeded acceptable levels.
ZFS-8000-D3  Fault tolerance of the pool may be compromised.
ZFS-8000-FD The number of I/O errors associated with the device has exceeded acceptable levels.
AK-8000-H7 The disks contained within the enclosure cannot be used as part of a storage pool.
AK-8003-Y6 The JBOD is configured incorrectly.

Any match?

  • If YES, proceed to step 4.
  • If none of the above, but found DISK-8000-12, proceed to step 5.
  • If none of the above, contact Oracle to investigate the issue.

 

4. Check drive status

Check if the status of the drive is "absent" or "removed".

From CLI

maintenance hardware show  


From BUI

  1. Click Maintenance
  2. Click Hardware
  3. Click the chassis of the disk

From support bundle

cat /hw/hw.aksh

What's the status of the drive?

  • If absent/faulted, replace the disk drive.
  • If ok, proceed to step 6.

 

5. Disk fault matches DISK-8000-12

We may be dealing with a fan fault instead of a drive fault.

Check again the list of problems / FMA faults for "External sensors indicate that a fan is no longer operating correctly" - SENSOR-8000-26 events against fans for the tray the drive is located in. 
Also check if there any fans in "Failed" or "Unavailable" status.

See Oracle ZFS Storage Appliance: Solaris Fault Manager received an event DISK-8000-12 (Doc ID 1966841.1) for more details.

 

6. Clear fault

From BUI:

Maintenance ->Problems -> Click on Problem->Click Marked Repaired

From CLI:

maintenance problems select <problem-id> markrepaired

Did this clear the fault?

  • If YES, the problem was solved.
  • If NO, contact Oracle to investigate this issue.

 

7. Check pool status

From CLI

> configuration storage
> show

 

Example:

zs3-2-ftlauder-a:configuration storage> ls
Properties:
                          pool = pool-de2-24c4t
                        status = online
                        errors = 0
                       profile = mirror
                   log_profile = log_stripe
                 cache_profile = cache_stripe
                         scrub = scrub completed after 0h0m with 0 errors at 2015-12-22 14:26:55


From BUI

  1. Click Configuration
  2. Click Storage
  3. Check the status of the available pools

From support bundle

cat zfs/status.out

Example:

  pool: POOL501
 state: ONLINE
 scan: none requested
config:

    NAME                       STATE     READ WRITE CKSUM
    POOL501                    ONLINE       0     0     0
      mirror-0                 ONLINE       0     0     0
        c2t5000CCA03E1E376Cd0  ONLINE       0     0     0
        c2t5000CCA03E22A74Cd0  ONLINE       0     0     0
      mirror-1                 ONLINE       0     0     0
        c2t5000CCA03EAAAADCd0  ONLINE       0     0     0
        c2t5000CCA03EBB4504d0  ONLINE       0     0     0
      [...]

 

  • If all the pools are online, proceed to step 8.
  • If one or more pools are NOT online, contact Oracle to investigate this issue.

 

8. Are the faulted drives located on the same tray?

Get a list of the faulted drives checking the problems list for disk issues. Write down the disk location (tray and slot).

From CLI

maintenance problems show  


From BUI

  1. Click Maintenance
  2. Click System

From support bundle

cat /fm/fmadm.out

 

  • If the faulted drives are on the SAME tray, proceed to step 9.
  • If the faulted drives are on DIFFERENT trays, take each drive and analyze it separately. In order to do that, proceed to step 3.

 

9. Did the drives fail in close succession?

At the previous step, we asked you to write down the timestamp of the faulted drives.

Did the faulted drives fail in close succession?

  • If YES, contact Oracle to investigate this issue.
  • If NO, take each drive and analyze it separately. In order to do that, proceed to step 3.

10. Check if the HW configuration is correct

Check if the new disks were introduced in correct slots, if the disk type is supported by this appliance, etc. 

Please consult Sun Storage 7000 Unified Storage System: Quick Reference for ZFS Storage Appliance Hardware Configuration (Doc ID 1554743.1) for more details.

  • If the HW configuration is incorrect, then make the necessary corrections or inform sales of error.
  • If the HW configuration is correct, contact Oracle to check if the new components are compatible with the current firmware version.

TSC ONLY

  • If the HW configuration is correct, check the system handbook (under the table of disk components) for the minimum ak version.

Does the appliance contain HDDs or SSDs that are not compatible with the current ak release?

 

11. Re-seat the drive

It may be that the disk did not make proper contact the first time. Try to re-seat the drive.

If the drive was part of a pool, go to "Configuration Storage" to check if the pool gets resilvered.
When the resilvering completes,  the drive should become optimal (status "ok").

Did reseating the drive solve the issue?

  • If NO, contact Oracle to investigate this issue.

 

TSC ONLY

If you reached this scenario, you'll have to look into this issue deeper. 

Check the Disk Replacement Insider's Guide for more details.

References

<NOTE:1164934.1> - Sun Storage 7000 Unified Storage System: ZFS - Slow resilvering and/or zpool scrub
<NOTE:1399057.1> - Oracle ZFS Storage Appliance: How To Recover From An Unavailable / Faulted Or Corrupted Boot Disk After Replacement
<NOTE:1532677.1> - Sun Storage 7000 Unified Storage System: How to perform FCO 328 ( 600GB Hitachi Drives )
<NOTE:1427028.1> - Sun Storage 7000 Unified Storage System: How to Collect SMART Data for Disks failing repetitively
<NOTE:1523277.1> - Sun Storage 7000 Unified Storage System: ASR Misconfigured Chassis Alarm Verification
https://www.freebsd.org/doc/en/books/handbook/zfs-term.html#zfs-term-scrub
<NOTE:2133261.1> - Zpool Errors At Boot Time - ZFS-8000-LR ZFS device in pool 'rpool' failed to open
<NOTE:1447054.1> - How To Recover From System Disk Failing To Re-silver For ZFS Unified Storage
<NOTE:1388529.1> - Sun Storage 7000 Unified Storage System: How to Troubleshoot ZFS System Pool Issues
<NOTE:1400613.1> - Sun Storage 7000 Unified Storage System: How to check if excessive ZFS checksum errors are due to a failing disk
<NOTE:1410463.1> - How To Replace A Hard Disk Or Solid State Drive In A Oracle ZFS Storage ZS3, ZS4, ZS5 & Sun Storage 7000 Series [VCAP]

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback