Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-2238782.1
Update Date:2018-01-05
Keywords:

Solution Type  Sun Alert Sure

Solution  2238782.1 :   Alert:ODA X5-2 Physical Disk(s) Which are Good May Be Dropped from ASM Because of an IO Error  


Related Items
  • Oracle Database Appliance X5-2
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Database Appliance>DB: ODA_EST
  •  


Because of disk firmware bug 25114213, at ODA X5-2 platform, ASM may drop disk because of IO error but the disk itself is good. This same problem can occur for more than one disk. Some evidence includes OS Messages include: "issue target reset:"  "_scsi_send_scsi_io: timeout" .   ASM alert.log  may show "Time waited on I/O: 0 usec" as well as the disk being offlined and not brought automatically back online.

In this Document
Description
Occurrence
Symptoms
Workaround
Patches
History
References


Applies to:

Oracle Database Appliance X5-2 - Version All Versions to All Versions [Release All Releases]
Linux x86-64
This happen only on ODAHA X5-2 platform using 8T disk at version 12.1.2.9 or earlier.

Description

 Because of disk firmware bug 25114213,  ODA X5-2 platform, ASM may drop disk because of a transient and not terminal IO error but the disk itself is physically good.

Occurrence

This problem only happens on the ODA X5-2 platform at version 12.1.2.9 or earlier.  
This can occur on either ODA X5-2 (Bare Metal)  or the ODAVP X5-2HA
Sometime multiple disks may be dropped in short period because of this issue.

Symptoms

Disk checks show the physical disk is good and the disk can be added back manually to ASM

Various OS/HW checks show that the disk is good even though it is dropped from the ASM diskgroups.

OS Messages

 In OS message we can find something a lot mpt2sas0 error/timeout.

sd X:0:X:0: device_blocked, handle(0x000d)
kernel: mpt3sas0: log_info(0x31120101):originator(PL), code(0x12), sub_code(0x0101)
[sdXX] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
mpt3sas0: issue target reset: handle(0x000d)
mpt3sas0: _scsi_send_scsi_io: timeout
mpt3sas0: TEST_UNIT_READY: handle(0x002c), lun(0)

 

ASM alert log

The following messages may be found in the alert_+ASM1.log or alert_+ASM2.log:

WARNING: Write Failed. group:3 disk:30 AU:2 offset:602112 size:4096
path:/dev/mapper/HDD_E1_S14_1XXXXp2 incarnation:0xe96894db synchronous result:'I/O error'
subsys:System krq:0x7f7918ff3908 bufp:0x7be8f000 osderr1:0x69b5 osde
IO elapsed time: 0 usec Time waited on I/O: 0 usec

 

Note:"Write Failed" messages from asm alert log can be related to multiple issues
Confirm if this issue is related by reviewing the OS log and OSW (no outstanding io on any disks during the time) to confirm you have hit this problem.

Workaround

 Manually online the disk after confirming the disk is good using

oakcli stordiag   e#_pd_<slot#> 

e.g.  

    oakcli stordiag e0_pd_04        << this Example would be for the First Enclosure (0), ( The Second Enclosure is (1) ) , physical disk , in Slot #4

 

 add example from good and bad stordiag
 add syntax on how to add the disk back to the disk group
 add example of syntax adding the disk back and results including rebalance
 confirm steps or point to note on how to address the problem depending on the number of offlined disks = the more offlined disks the steps and more complex

Patches

ODA 12.1.2.11 and higher includes the fixed disk firmware.  

All the ODAs using X5-2 should be upgraded to 12.1.2.11 asap.

The fix is actually with 2 components with FW13 and mpt3sas driver 13

History

 28-Feb-2017 created.

 03-Mar-2017 reviewed with minor editorial changes and added comments for clarification

19-Jul-2017 Minor changes to sentence structure + changed the fixed version to 12.1.2.11.0

References

<BUG:25114213> - MULTIPLE DISK OFFLINE FROM ASM IN SHORT PERIOD

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback