Asset ID: |
1-77-2162312.1 |
Update Date: | 2016-11-17 |
Keywords: | |
Solution Type
Sun Alert Sure
Solution
2162312.1
:
(EX29) 600GB High Performance Disk Drives in Exadata Storage Servers V2, X2, X3 May Experience High Rates of Failure
Related Items |
- Exadata X3-8 Hardware
- Exadata Database Machine X2-2 Hardware
- Exadata Database Machine X2-8
- Oracle Exadata Storage Server Software
- Exadata X3-2 Hardware
- Exadata Database Machine V2
- Oracle SuperCluster Specific Software
|
Related Categories |
- PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Exadata>DB: Exadata_EST
|
In this Document
Applies to:
Exadata Database Machine V2
Exadata X3-8 Hardware
Oracle Exadata Storage Server Software
Oracle SuperCluster
Exadata X3-2 Hardware
Information in this document applies to any platform.
Description
Exadata Storage Servers V2, X2, X3 may experience high rates of disk drive failure of certain 600GB high performance disk drives.
Occurrence
This issue may occur when the following conditions are met:
- Exadata Storage Servers are V2, X2, or X3 hardware with one or more 600GB High Performance disk drive model "HITACHI HUS1560SCSUN600G"
- Note that drive model "HITACHI HUS1560SCSUN600G" may be received as a replacement drive.
- Exadata software on storage servers is 12.1.2.3.2 or lower, with disk drive firmware lower than A8C0.
- There is potential for higher disk drive failure rate after updating to a release or patch that contains disk drive firmware version A880.
- Drives have had a long service life
- Drives have low I/O activity (which can be caused by high flash cache hit rates)
Due to related critical issue EX25 (<Document 2073916.1>) the guidance in this document should also be followed for all V2, X2, or X3 Exadata Storage Servers with 600GB High Performance disk drives running any Exadata version 12.1.2.3.2 or lower, regardless of drive models or drive firmware versions currently installed.
1). Exadata Storage Server model is determined with the following CellCLI command:
CellCLI> list cell attributes makeModel
The affected 600GB High Performance disk drives are supplied in the following systems:
- Exadata Storage Server V2 with High Performance disks (model X4275)
- Exadata Storage Server X2 with High Performance disks (model X4270 M2)
- Exadata Storage Server X3 with High Performance disks (model X4270 M3)
2). Disk drive model and drive firmware version on Exadata Storage Server is determined with the following CellCLI command
CellCLI> list physicaldisk attributes makeModel, physicalFirmware where diskType = HardDisk
There may be a mix of disk drive models in an Exadata Storage Server. It is necessary to query every disk drive on every storage server. Note that drive model "HITACHI HUS1560SCSUN600G" may be received as a replacement drive.
Symptoms
Possible symptoms include the following:
- Increasing number of unrecoverable media errors, predictive disk failures, and disk failures causing periods of reduced ASM redundancy.
- Multiple disks failing at or around the same time soon after applying firmware A880 causing possible disk group dismount and potential data loss.
Bug 23750777
Workaround
None
Patches
Recommended Action
The recommendation applies to:
- Any X3, X2, or V2 Exadata Storage Server that contains 600GB High Performance disk drives.
- Even if the storage server currently contains no affected drive model "HITACHI HUS1560SCSUN600G" (since this drive model may be received as a replacement for a failed or predictive failure drive).
- And the storage server is running Exadata version 12.1.2.3.2 or lower (including any prior update released for this issue).
There are two recommended actions (perform both actions).
- Enable weekly Automatic Hard Disk Scrub and Repair
- Update Exadata Storage Server software to 12.1.2.3.3 or higher
Please review the following information carefully. Contact Oracle Support if you have additional questions.
1. Enable weekly Automatic Hard Disk Scrub and Repair
This issue can cause large numbers of corrupt sectors on the disk drives. The only way to repair the corrupt sectors is to make sure that Automatic Hard Disk Scrub and Repair is implemented and enabled in the software. If the corrupt sectors are not corrected then permanent data loss could occur. Automatic Hard Disk Scrub and Repair is set to run biweekly, by default. Change the setting to run scrubbing weekly by running the following CellCLI command on all Exadata Storage Servers:
CellCLI> ALTER CELL hardDiskScrubInterval = weekly
Cell dm01cel01 successfully altered
The ALTER CELL hardDiskScrubInterval command sets the interval for proactive resilvering of latent bad sectors. Disk scrubbing will throttle itself using IORM based on disk activity. When requests come in, disk scrubbing activity will decrease. Customer workloads should not be affected by disk scrubbing. If the system is idle, disk scrubbing can drive the disk utilization to 100%. This is expected.
The Grid Infrastructure home version and patch level required for full automatic hard disk scrubbing and repair support is one of the following:
- 12.1.0.2.4 (Jan 2015) or higher
- 11.2.0.4.16 (Apr 2015) or higher
Systems running an older version or patch level of Grid Infrastructure must upgrade for disk scrubbing to correct this issue.
History of scrub-related feature adoption and fixes (Grid Infrastructure home)
- 12.1.0.2.4 / 11.2.0.4.16 - includes fix for <bug 19900800> - this is recommended minimum
- 12.1.0.2.0 - contains kff scrub repair
- 11.2.0.4.7 - adds logging of blocks that ASM skipped for repair (BP7 requires patch 19513710 due to build issue)
- 11.2.0.4.5 - adds kff scrub repair (repairs more cases, more robust)
- 11.2.0.4.0 - contains resilver-based repair only
See Oracle Exadata Database Machine System Overview, 12c Release 1 (12.1) for additional information about Automatic Hard Disk Scrub and Repair.
2. Update Exadata Storage Server software
Update storage servers to Exadata 12.1.2.3.3, or higher (see <Document 888828.1>).
Notes:
- Storage server rollback - it is recommended not to rollback storage servers updated to 12.1.2.3.3, or higher. However, if rolling back, drive firmware must not be downgraded during the rollback. While this is the default patchmgr rollback behavior, the default behavior will change and the drive firmware downgraded if file /.updfrm_exact exists on the storage server. To ensure default patchmgr behavior, remove file /.updfrm_exact, if it exists, from every storage server prior to rollback.
- Manual firmware downgrade - do not downgrade (manually or with CheckHWnFWProfile) drive firmware to an older version.
The complete fix for this issue is comprised of 3 parts:
- Disk drive firmware A8C0 or higher. Firmware A8C0 supersedes all prior versions.
- Disabling drive overall command timer (set OCT=0)
- Fixes for bug 22387980 and bug 22161196
Exadata 12.1.2.3.3 is the first release to contain all 3 parts.
Note - The following information is obsolete and is retained for reference purposes only due to the history of this issue. The only supported action is to update storage servers to a release that supplies firmware A8C0. Exadata 12.1.2.3.3 is the first release to supply firmware A8C0.
Alternate Action
- On all storage servers remove file /.updfrm_exact if it exists
# dcli -l root -g <cell_group> 'rm -fv /.updfrm_exact'
- Update storage servers to the revised 12.1.2.3.1 or 12.1.2.2.2 release (see <Document 888828.1>).
Note the following about Exadata 12.1.2.3.1 and 12.1.2.2.2:
- The original Exadata 12.1.2.3.1 and 12.1.2.2.2 releases, and the April 2016 QFSDP, were revised to contain a fix for this issue and re-released on 20-Jul-2016.
- The full image version and date code of the revised releases is as follows:
- Exadata 12.1.2.3.1.160718 (<Patch 24306177>) - supersedes original release 12.1.2.3.1.160411
- Exadata 12.1.2.2.2.160715 (<Patch 24306258>) - supersedes original release 12.1.2.2.2.160410
- The revised releases are full releases and may be used to update storage servers from any prior release, including the original 12.1.2.3.1 and 12.1.2.2.2 releases. It is installed like any Exadata release using the patchmgr tool.
- The revised releases affect storage servers only. There has been no change to already released database server or InfiniBand switch software.
Notes for Both Recommended and Alternate Actions
- Expected drive firmware version - After updating storage servers to one of the releases specified above
- Existing drives that already had firmware A880 installed are expected to retain firmware A880.
- Existing drives that had firmware A820 or earlier installed are expected to have firmware A820.
- Storage server rollback - Storage servers that contain the affected disk drives and are running an Exadata version that supplied A880 (e.g. 12.1.2.3.1.160411, 12.1.2.2.2.160410, or an earlier version patch) may be rolled back to the inactive image version. However, drive firmware for the affected drives must not be downgraded during the rollback and must remain at A880. This is the default patchmgr rollback behavior. However, the default behavior will change and the drive firmware downgraded if file /.updfrm_exact exists on the storage server. To ensure default patchmgr behavior that will retain drive firmware A880 upon storage server rollback, remove file /.updfrm_exact, if it exists, from every storage server prior to rollback.
- Drive firmware downgrade - Do not downgrade (manually or with CheckHWnFWProfile) drive firmware from A880 to an older version. Due to the differences between A880 and older firmware versions, and what metadata structures are retained through downgrade, a drive that had firmware updated to A880, then downgraded to an older version, has a higher likelihood of failing than if the firmware remains at A880. Storage server software may be rolled back, but do not downgrade (manually, or with CheckHWnFWProfile, or by creating file /.updfrm_exact prior to cell upgrade or rollback) disk drive firmware from A880 to an older version. Default storage server rollback behavior is to retain higher version firmware (i.e. firmware is not downgraded on rollback).
- Replacement drive firmware version - After updating storage servers to one of the releases specified above, then receiving a replacement drive that is model "HITACHI HUS1560SCSUN600G"
- A drive that arrives with firmware A880 installed is expected to retain firmware A880.
- A drive that arrives with firmware A820 installed is expected to retain firmware A820.
- A drive that arrives with firmware earlier than A820 installed is expected to be upgraded to A820 automatically upon insertion.
History
17-Nov-2016 - Recommended action revised to state storage servers running any 12.1.2.3.2 or lower release/patch shall update to 12.1.2.3.3
29-Jul-2016 - Add reference to EX25
21-Jul-2016 - Add reference to revised 12.1.2.3.1 and 12.1.2.2.2 releases and appropriate guidance regarding actions to take
18-Jul-2016 - Created
References
<NOTE:2199949.1> - (EX32) V2 and X2 storage servers with 600GB high performance disks running Exadata version 11.2.2.4.0 or lower require software update to receive replacement drives
Attachments
This solution has no attachment