![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Solution Type Troubleshooting Sure Solution 1366035.1 : Oracle ZFS Storage Appliance: Troubleshooting Disk Drive Failures
In this Document
Applies to:Sun Storage 7210 Unified Storage System - Version All Versions and laterExalogic Elastic Cloud X4-2 Quarter Rack - Version X4 to X4 [Release X4] Oracle ZFS Storage Appliance Racked System ZS4-4 - Version All Versions and later PDIT Single Rack ZFS Storage ZS4-4 - Version All Versions to All Versions [Release All Releases] Sun Storage 7110 Unified Storage System - Version All Versions and later 7000 Appliance OS (Fishworks) PurposeThe purpose of this document is to troubleshoot disk drive failures on a Sun Storage 7000 ZFS Appliance. To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Disk Storage ZFS Storage Appliance Community
Troubleshooting Steps1. What problem are you encountering?
2. Verify the problems list from the ApplianceCheck the problems list From CLI maintenance problems show
From support bundle cat /fm/fmadm.out
Found any faults?
3. Check against the following FMA faultsFor the disk you have identified at step 1, check if it matches one of these FMA faults/problem descriptions:
Any match?
4. Check drive statusCheck if the status of the drive is "absent" or "removed". From CLI maintenance hardware show
From support bundle cat /hw/hw.aksh
What's the status of the drive?
5. Disk fault matches DISK-8000-12We may be dealing with a fan fault instead of a drive fault. Check again the list of problems / FMA faults for "External sensors indicate that a fan is no longer operating correctly" - SENSOR-8000-26 events against fans for the tray the drive is located in. See Oracle ZFS Storage Appliance: Solaris Fault Manager received an event DISK-8000-12 (Doc ID 1966841.1) for more details.
6. Clear faultFrom BUI: Maintenance ->Problems -> Click on Problem->Click Marked Repaired
From CLI: maintenance problems select <problem-id> markrepaired
Did this clear the fault?
7. Check pool statusFrom CLI > configuration storage
> show
Example: zs3-2-ftlauder-a:configuration storage> ls
Properties: pool = pool-de2-24c4t status = online errors = 0 profile = mirror log_profile = log_stripe cache_profile = cache_stripe scrub = scrub completed after 0h0m with 0 errors at 2015-12-22 14:26:55
From support bundle cat zfs/status.out
Example: pool: POOL501
state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM POOL501 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 c2t5000CCA03E1E376Cd0 ONLINE 0 0 0 c2t5000CCA03E22A74Cd0 ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 c2t5000CCA03EAAAADCd0 ONLINE 0 0 0 c2t5000CCA03EBB4504d0 ONLINE 0 0 0 [...]
8. Are the faulted drives located on the same tray?Get a list of the faulted drives checking the problems list for disk issues. Write down the disk location (tray and slot). From CLI maintenance problems show
From support bundle cat /fm/fmadm.out
9. Did the drives fail in close succession?At the previous step, we asked you to write down the timestamp of the faulted drives. Did the faulted drives fail in close succession?
10. Check if the HW configuration is correctCheck if the new disks were introduced in correct slots, if the disk type is supported by this appliance, etc. Please consult Sun Storage 7000 Unified Storage System: Quick Reference for ZFS Storage Appliance Hardware Configuration (Doc ID 1554743.1) for more details.
TSC ONLY
Does the appliance contain HDDs or SSDs that are not compatible with the current ak release?
11. Re-seat the driveIt may be that the disk did not make proper contact the first time. Try to re-seat the drive. If the drive was part of a pool, go to "Configuration Storage" to check if the pool gets resilvered. Did reseating the drive solve the issue?
TSC ONLY If you reached this scenario, you'll have to look into this issue deeper. Check the Disk Replacement Insider's Guide for more details. References<NOTE:1164934.1> - Sun Storage 7000 Unified Storage System: ZFS - Slow resilvering and/or zpool scrub<NOTE:1399057.1> - Oracle ZFS Storage Appliance: How To Recover From An Unavailable / Faulted Or Corrupted Boot Disk After Replacement <NOTE:1532677.1> - Sun Storage 7000 Unified Storage System: How to perform FCO 328 ( 600GB Hitachi Drives ) <NOTE:1427028.1> - Sun Storage 7000 Unified Storage System: How to Collect SMART Data for Disks failing repetitively <NOTE:1523277.1> - Sun Storage 7000 Unified Storage System: ASR Misconfigured Chassis Alarm Verification https://www.freebsd.org/doc/en/books/handbook/zfs-term.html#zfs-term-scrub <NOTE:2133261.1> - Zpool Errors At Boot Time - ZFS-8000-LR ZFS device in pool 'rpool' failed to open <NOTE:1447054.1> - How To Recover From System Disk Failing To Re-silver For ZFS Unified Storage <NOTE:1388529.1> - Sun Storage 7000 Unified Storage System: How to Troubleshoot ZFS System Pool Issues <NOTE:1400613.1> - Sun Storage 7000 Unified Storage System: How to check if excessive ZFS checksum errors are due to a failing disk <NOTE:1410463.1> - How To Replace A Hard Disk Or Solid State Drive In A Oracle ZFS Storage ZS3, ZS4, ZS5 & Sun Storage 7000 Series [VCAP] Attachments This solution has no attachment |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|