Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1628328.1
Update Date:2014-03-10
Keywords:

Solution Type  Technical Instruction Sure

Solution  1628328.1 :   Sun Storage 7000 Unified Storage System: Abnormal Disk Resilver Process  


Related Items
  • Sun ZFS Storage 7420
  •  
  • Sun Storage 7110 Unified Storage System
  •  
  • Sun Storage 7210 Unified Storage System
  •  
  • Sun Storage 7410 Unified Storage System
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Sun ZFS Storage 7120
  •  
  • Sun ZFS Storage 7320
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Goal
Solution
References


Oracle Confidential PARTNER - Available to partners (SUN).
Reason: Useful information for internal engineers and partners
Created from <SR 3-8516129448>

Applies to:

Sun Storage 7310 Unified Storage System - Version All Versions and later
Sun Storage 7410 Unified Storage System - Version All Versions and later
Sun Storage 7210 Unified Storage System - Version All Versions and later
Sun Storage 7110 Unified Storage System - Version All Versions and later
Sun ZFS Storage 7420 - Version All Versions and later
7000 Appliance OS (Fishworks)

Goal

There was a disk faulty.
A FSE replaced the failed disk.
Another disk was faulted during the disk resilver.

However the FSE found the system has not triggered the spare disk to resilver, after the spare disk completed the resilver for the first failed disk.
The spare disk started to resilver when the FSE pulled the failed disk from the shelves.

Why?

 

Solution

From your description of the fault this is what we have:
  1. A pool with 1 spare drive
  2. A drive fails and the only spare kicks in and resilvering starts to replace the faulted drive.
  3. Then while this is resilvering another drive fails.
  4. But at this point in time there are "No" spares available
  5. Once the resilver to the spare has completed - the faulty drive is replaced
  6. This starts a resilver to the new drive and when it completes, the spare detaches.

Question:  Should the newly available spare drive then start replacing/resilvering for the 2nd faulted drive?

With the current implementation, the reprocessing of drive faults will not trigger the newly free spare to enter the pool.
It will remain AVAIL in the spares list, until an event occurs that requires it to be resilvered.

What happens is the FMA ZFS module which handles the 2nd fault will look for spares. But if FMD does not find any in an AVAIL state,
the process will stop at that point. The code does not 'go back' and re-process the faults in this implementation.

This is why the spare was only triggered to start a resilver - when it was in an AVAIL state, i.e when the FSE removed the 2nd faulty drive.

 

References

<NOTE:1416406.1> - Sun ZFS Storage Appliances Troubleshooting Resource Center
<NOTE:1366035.1> - Sun Storage 7000 Unified Storage System: Troubleshooting Disk Drive Failures

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback