Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1991445.1
Update Date:2015-06-25
Keywords:

Solution Type  Sun Alert Sure

Solution  1991445.1 :   Bug 19695225 - Running Many Create or Alter Griddisk Commands Over Time Causes Cell Disk Metadata Corruption (ORA-600 [addNewSegmentsToGDisk_2]) and Loss of Cell Disk Content  


Related Items
  • Oracle Exadata Storage Server Software
  •  
  • Oracle SuperCluster T5-8 Half Rack
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Exadata>DB: Exadata_EST
  •  




In this Document
Description
Occurrence
 Risk and Detection
Symptoms
Workaround
Patches
History
References


Applies to:

Oracle Exadata Storage Server Software - Version 11.2.1.2.0 to 12.1.1.1.1 [Release 11.2 to 12.1]
Oracle SuperCluster T5-8 Half Rack
Information in this document applies to any platform.

Description

On Exadata Storage Server version 12.1.1.1.1 and earlier, cell disk metadata corruption and loss of cell disk content (i.e. grid disk, ASM disk) will occur if many CREATE GRIDDISK or ALTER GRIDDISK commands that modify cell disk space configuration are run over time for the same cell disk.  If CellCLI griddisk commands are typically run in parallel on all storage servers simultaneously, which is a common maintenance practice, and the issue occurs on multiple storage servers at the same time such that all redundant disk extents are lost for files in an ASM disk group, then the disk group will dismount and database will crash, and will require restoring files from backup.  Rolling cell maintenance commands that change grid disk state, such as ALTER GRIDDISK INACTIVE and ALTER GRIDDISK ACTIVE, do not contribute to this issue.

This problem is filed as <bug 19695225>.

Occurrence

Since initial system deployment if you have recreated or reconfigured grid disks using CellCLI commands CREATE GRIDDISK or ALTER GRIDDISK more than 31 times, then the likelihood of occurrence is high.

Risk and Detection

The risk to test and development systems is expected to be higher than production systems due to the dynamic manner in which they may be reconfigured.

To determine if your system is exposed to this issue, and how close the system is to having cell disk metadata corruption, download and run the script attached to this document on all storage servers as the root user.

# ./check_bug19695225.sh

or via dcli

# dcli -l root -g cell_group -x check_bug19695225.sh

 

Script outputAction
ALERT: One or more celldisks are at immediate risk to metadata corruption and data loss due to bug 19695225 caused by a high number of CREATE/ALTER GRIDDISK commands.

Immediately stop issuing CREATE GRIDDISK or ALTER GRIDDISK commands.

The recommended action is to upgrade storage servers to Exadata 12.1.2.1.1 or later.  An acceptable alternative is to apply <patch 19695225> to storage servers.

WARNING: System does not contain the fix for bug 19695225. Celldisks are at risk to metadata corruption and data loss due to bug 19695225 caused by a high number of CREATE/ALTER GRIDDISK commands.

The recommended action is to upgrade storage servers to Exadata 12.1.2.1.1 or later.  An acceptable alternative is to apply <patch 19695225> to storage servers.

Avoid issuing CREATE GRIDDISK or ALTER GRIDDISK commands until storage servers are upgraded or the patch is applied.

SUCCESS: System contains the fix for bug 19695225 No action is necessary.  The system already contains the fix to bug 19695225 and is not susceptible to the cell disk metadata corruption issue.
INFO: You are currently running very old version of cell software : 11.2.2.x.x. Please contact support to check if the patch for bug 19695225 is needed for your systems.

On Exadata versions older than 11.2.2.3.0 it is not possible to programmatically determine how many previous CREATE GRIDDISK or ALTER GRIDDISK commands have been issued.  Contact Oracle Support for further guidance.

Attached script check_bug19695225.sh uses utility cellutil to determine how many CREATE GRIDDISK or ALTER GRIDDISK commands have been previously issued.  This is not possible in versions older than 11.2.2.3.0, hence their current risk cannot be determined.  The recommendation to customers running such an old version is to stop issuing CREATE GRIDDISK or ALTER GRIDDISK commands and upgrade to an Exadata release that contains the fix to bug 19695225.

 

The script produces additional details about each cell disk in /tmp/check_bug19695225.log on each storage server.  Reported for each cell disk is the number of records in the last segmap sector, which increases when CREATE GRIDDISK or ALTER GRIDDISK commands that modify cell disk space configuration are run.  A command that causes the number of records to exceed 31 will introduce bug 19695225.  The script will report ALERT when it detects 25 or more records and the fix is not yet applied, and WARNING when it detects less than 25 records and the fix is not yet applied.

The number of records can only be reset by recreating cell disks, which requires dropping grid disks first.  This is not a recommended course of action.

 

Symptoms

Possible symptoms that cell disk metadata corruption has occurred as a result of this bug include the following:

  1. ASM disk group(s) dismount and database crash following CREATE GRIDDISK or ALTER GRIDDISK.
  2. ASM disk group(s) cannot be mounted following the disk group dismount.
  3. Error ORA-600 [addNewSegmentsToGDisk_2] is reported in the cell alert.log.  

Workaround

The cell disk corruption cannot be repaired once it occurs.  Recovery requires recreating cell disks, grid disks, and ASM disk groups, then restoring affected databases from backup.

Patches

Perform one of the following actions to prevent bug 19695225:

  1. Upgrade to Exadata Storage Server version 12.1.2.1.1 or later (Exadata 12.1.2.1.0 contains the fix to this issue, however 12.1.2.1.1 or later is the recommended version).
  2. Upgrade to Exadata Storage Server version 12.1.1.1.2 or later 12.1.1.1.x.
  3. Apply <patch 19695225> to all Exadata Storage Servers.  At the time of writing a patch is available for Exadata versions 12.1.1.1.1, 11.2.3.3.1, and 11.2.3.3.0.
  4. Avoid running CellCLI commands CREATE GRIDDISK or ALTER GRIDDISK until the code fix is applied via upgrade or patch apply.

History

13-May-2015 - Add additional detail about the number of records in the last segmap sector
27-Apr-2015 - Creation

References

<BUG:19695225> - SPECIFIC ORDER OF CREATE/ALTER GRIDDISK CAUSES ORA-600 [ADDNEWSEGMENTSTOGDISK_2]

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback