![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1392228.1 : Pillar Axiom: Fibre Channel Brick RAID Rebuilds Subsequent to a Drive Failure May, on Rare Occasions, Rebuild to an Incorrect Spare Drive
A race condition may occur when a drive in an FC Brick momentarily goes offline, then returns before the two RAID controllers can synchronize information regarding the drive failure status. If this occurs, the RAID CU detecting the drive offline may initiate recovery to the spare drive, but the other CU does not, resulting in ongoing data corruption because the two RCs do not agree on the RAID array members. In this Document
Applies to:Pillar Axiom 600 Storage System - Version Not Applicable to Not Applicable [Release N/A]Pillar Axiom 300 Storage System - Version Not Applicable to Not Applicable [Release N/A] Pillar Axiom 500 Storage System - Version Not Applicable to Not Applicable [Release N/A] Information in this document applies to any platform. N/A SymptomsAxiom Fibre Channel Brick RAID rebuilds subsequent to a drive failure may, on rare occasions, rebuild to an incorrect spare drive. This issue affects all Axiom systems comprising of Fibre Channel disk drives that are on Axiom Releases 02.04.01 through 04.03.17 and from 05.00.00 through 05.00.06. This defect does not affect SATA drives or systems on any Axiom release prior to 02.04.02 or after 05.00.06.
CauseThere was a very small corner condition where the firmware on each of the two companion RAID controllers of the Axiom Fibre Channel Brick could choose different target drives for a rebuild. This would only occur if a candidate target drive suddenly went offline during the rebuild preparation phase and then suddendly came back online within the span of one second. This was an exceedingly rare case where a drive happened to fail in a very specific way during a very small timing window. Should this condition occur, the two RAID Controllers in the Fibre Channel or Fibre Channel V2 Brick will not agree on the members of the RAID array. The CU detecting the original drive offline will remove that drive from the array and begin recovery to and use of the spare. The CU that did not get the drive status update will continue to use the drive that has just gone offline and returned before it was made aware of this condition. IMPORTANT: Contact Pillar Data Systems Customer Support immediately if this issue is encountered.
SolutionSoftware fixes have been propagated to mitigate the problem scenario and are included in Axiom Releases 04.03.18 and above (R4, e.g. 04.05.00) and 05.00.07 and above (R5, all of 05.02.xx and higher). An upgrade to the currently recommended software level will prevent this issue and is highly recommended. To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Disk Storage Pillar Axiom System
Attachments This solution has no attachment |
||||||||||||||||||||
|