Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1544234.1
Update Date:2014-08-26
Keywords:

Solution Type  Sun Alert Sure

Solution  1544234.1 :   Pillar Axiom: Sector Repair Operation May Overlap With a Read or Write Data Operation From the Companion RAID Controller  


Related Items
  • Pillar Axiom 300 Storage System
  •  
  • Pillar Axiom 500 Storage System
  •  
  • Pillar Axiom 600 Storage System
  •  
  • Sun Hardware - Generic
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun Alert
  •  
  • _Old GCS Categories>Sun Microsystems>Sun Alert>Criteria Category>Data Loss
  •  
  • _Old GCS Categories>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  




In this Document
Description
Occurrence
Symptoms
Workaround
Patches
History
References


Applies to:

Pillar Axiom 300 Storage System
Pillar Axiom 500 Storage System
Pillar Axiom 600 Storage System
Sun Hardware - Generic
Sun Microsystems > Storage - Disk
Information in this document applies to any platform.
_________________________________________



Date of Resolved Release: 09-Apr-2013
_________________________________________

Description

On Pillar/Axiom Systems with earlier Software  revisions, there is a very small probability that a fibre channel disk write error and corresponding (normal) sector repair operation may overlap with a read or write data operation from the companion RAID controller. This may be accompanied by a drive failure, or the affected drive may continue in operation if the Sector Repair operation completes successfully.

Occurrence

This issue can occur on the following platforms:

  • Pillar Axiom Ax300, Ax500, and Ax600 storage systems with software versions below  4.6.6
  • Pillar Axiom  Ax500 and Ax600 storage systems on Release 5.x.x with software below 5.3.8
Notes:

1. This issue requires that a data read or write operation overlap with an internal sector repair operation on the Fibre Channel Brick. This is a rare occurrence as the timing window for overlap is minute, but if the issue does occur, user data located on that sector of the RAID array may be at risk.

2. This issue does not affect SATA Storage Enclosures (Bricks)

 
To determine the installed Software Release on the system, log into the Axiom Storage Services Manager and go to the Support -> Software Modules screen.  The installed software release is the "Pilot Software" component:

Example of R4 Screen:

R4Screen

Example of R5 Screen:

R5Screen

Symptoms

Symptoms can range from no effect at all to incorrect data error messages from user applications, file systems, or database software. There may be no indication to the user application that the data is incorrect, and unless the application explicitly checks its data's identity, the issue may go unreported. (The absence of these messages does not indicate that you have not had a silent instance of this issue).  Attempted read operations to fetch the data from permanent storage may return missing/zero-filled/incorrect data.

If the sector repair fails, the drive will fail. If the sector repair succeeds, the drive continues operating without any obvious symptoms; however, the data on the drive can still be affected. (By definition there will almost always be host data at the location of the sector repair, since the issue requires that type of access to trigger the problem).

If a drive fails and the host notices data issues, the issue can be confirmed from the host error messages/logs, plus the logs from the Axiom taken automatically as the drive fails. In this case, the issue would be identified while there is still a backup or other source of data reasonably available to restore from.

An example of immediate notification that indicates this issue has occurred may be similar to the following:

      Windows VM system event logs
      =====================================================
      Error    25/08/2012 14:16:46    Ntfs    55    The file system structure on the disk is
      corrupt and unusable. Please run the chkdsk utility on the volume C:.
      Error    25/08/2012 14:16:46    Ntfs    55    The file system structure on the disk is
      corrupt and unusable. Please run the chkdsk utility on the volume C:.
      Error    25/08/2012 13:27:20    Ntfs    55    The file system structure on the disk is
      corrupt and unusable. Please run the chkdsk utility on the volume

Or a stack trace similar to the following:

      Corrupt block relative dba: 0x034ddae8 (file 13, block 908008)
        Bad check value found during buffer read
        Data in bad block:
         type: 6 format: 2 rdba: 0x034ddae8
         last change scn: 0x0000.3408a8d6 seq: 0x1 flg: 0x04
         spare1: 0x0 spare2: 0x0 spare3: 0x0
         consistency value in tail: 0xa8d60601
         check value in block header: 0x14c0
         computed block checksum: 0x19d1
         Reread of rdba: 0x034ddae8 (file 13, block 908008) found same corrupted data
         Sat Aug 25 15:00:54 CEST 2012
         Corrupt Block Found
             TSN = 4, TSNAME = PSAPR3S
             RFN = 13, BLK = 908008, RDBA = 55433960
             OBJN = 212814, OBJD = 212814, OBJECT = MONI, SUBOBJECT =
             SEGMENT OWNER = SAPR3S, SEGMENT TYPE = Table Segment

Or from database log files:

      Hex dump of (file 13, block 908008) in trace file
      /oracle/KET/saptrace/usertrace/ket_ora_10627.trc
      Corrupt block relative dba: 0x034ddae8 (file 13, block 908008)
      Bad check value found during buffer read
      Data in bad block:
        type: 6 format: 2 rdba: 0x034ddae8
        last change scn: 0x0000.3408a8d6 seq: 0x1 flg: 0x04
        spare1: 0x0 spare2: 0x0 spare3: 0x0
        consistency value in tail: 0xa8d60601
        check value in block header: 0x14c0
        computed block checksum: 0x19d1
      Reread of rdba: 0x034ddae8 (file 13, block 908008) found same corrupted data
      Sat Aug 25 15:00:54 CEST 2012
      Corrupt Block Found
           TSN = 4, TSNAME = PSAPR3S
           RFN = 13, BLK = 908008, RDBA = 55433960
           OBJN = 212814, OBJD = 212814, OBJECT = MONI, SUBOBJECT =
           SEGMENT OWNER = SAPR3S, SEGMENT TYPE = Table Segment 

Workaround

The minimum release to avoid this issue on Axiom systems with FC Bricks is 5.3.8 or 4.6.6.

Oracle Engineering highly recommends upgrading to the current recommended release:

 

 

Hardware

  Minimum Release for
FC Bricks
 Current Recommended Release
 Pillar Axiom Storage System  SAN Only  5.3.8  5.4.1
 Pillar Axiom Storage System  NAS Only  4.6.6  4.6.11
 Pillar Axiom Storage System  SAN/NAS  4.6.6  4.6.11

Note:  This recommended upgrade will prevent this issue from happening, but will not resolve issues that have may have already happened during sector repair prior to upgrading your software.  You need to use data checking utilities based on the filesystem in use to resolve issues.

In some cases, running a file system scan such as fsck(1M) or chkdsk will report inconsistencies in file systems. These typically require taking the file system offline.

Some file systems have T10 built in, or use other methods of actually checksumming the end user data inside the file system, and would flag or in some cases quarantine [set read only] that data. Some file systems do not continuously perform these checks, but do have a means of adding integrity information to the end user data and suggest to the administrator with a fairly immediate recognition from the host that that they run periodic online checks, e.g. ZFS scrub which can be done online.

Similarly, databases vary in their ability to detect or scrub end user data compared to more obvious metadata.

File recovery from this issue may require re-writing the affected data or restoring from backup.

Axiom Releases and Patches are available on My Oracle Support.

Pillar Axiom Storage System Software/Firmware download instructions:

      Pillar Axiom: How to Find Pillar Axiom firmware and patches in MOS <Document:1422199.1>
     
Upgrading the Pillar Axiom Storage System:

      Pillar Axiom: Software/Firmware upgrade procedure R3.x to R4.x <Document:1472345.1>
      Pillar Axiom: Software/Firmware upgrade procedure R4.x to R4.x <Document:1517987.1>
      Pillar Axiom: Software/Firmware upgrade procedure R4.x to R5.x <Document:1472278.1>
      Pillar Axiom: Software/Firmware upgrade procedure R5.x to R5.x <Document:1441772.1>

Note: Be sure to use the correct patch for your Axiom model.

Please see the following documents for more information on the Pillar/Axiom Software.

My Oracle Support Infodocs:

      Information Center: Pillar Axiom 600 Storage System <Document:1450100.2>
      Information Center: Pillar Axiom 500 Storage System <Document:1450089.2>
      Information Center: Pillar Axiom 300 Storage System <Document:1450084.2>

Pillar/Axiom Release Notes can be found at:

      http://docs.oracle.com/cd/E39446_01/index.htm or:
      http://www.oracle.com/technetwork/documentation/oracle-unified-ss-193371.html

Patches

Ax300, Ax500, Ax600 Patch 4.6.6 or higher
Ax500, Ax600 Patch 5.3.6 or higher.
Ax500, Ax600 Release 5.4.0 or higher

History

09-Apr-2013: Document released, status Resolved
10-Apr-2013: Minor maintenance edit, no change in content
26-Aug-2014: Minor maintenance edit, no change in content

Pillar Axiom: Recommended Patches and Software Releases Matrix Document:1525438.1
is an Internal Only document intended for support personnel only - originally posted in
this document, the link was rejected by the HealthPlan Activity

Questions regarding this document should be addressed to
sunalertpublication_us_grp@oracle.com and copy the
responsible engineer and contributor listed below.

Internal Contributor/Submitter: Pillar/Axiom Sustaining (axiom_raid_sustaining_us_grp@oracle.com)
Internal Eng Responsible Engineer: vishvanathan.alur.ramamurthy@oracle.com
Internal Services Knowledge Analyst: david.mariotto@oracle.com
Internal Eng Business Unit Group: Storage - Disk
Internal Escalation ID: 3-6936749701, 3-6939598611

References

Fixed in 04.06.06
Fixed in 05.03.08
Fixed in 05.04.00

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback