Alert : ODA V1: SSD Disks Used for the ODA ASM +REDO Diskgroup Are Turning Into "Write Protected" Mode Preventing I/O Write Operations

Asset ID:	1-77-2195194.1
Update Date:	2016-10-19
Keywords:

Solution Type Sun Alert Sure

Solution 2195194.1 : Alert : ODA V1: SSD Disks Used for the ODA ASM +REDO Diskgroup Are Turning Into "Write Protected" Mode Preventing I/O Write Operations

Applies to:

Oracle Database Appliance - Version All Versions to All Versions [Release All Releases]
Oracle Database Appliance Software - Version 2.10.0.0 to 12.1.2.7
Oracle Database - Enterprise Edition - Version 12.1.0.1 to 12.1.0.2 [Release 12.1]
Information in this document applies to any platform.

Description

SSD disks used for the ODA ASM +REDO diskgroup are turning into "Write Protected" mode preventing I/O write operations.

Occurrence

At the moment the problem is only occurring and being reported in ODA V1 configurations, example:

[root@asmcloud1 ~]# oakcli show env_hw
BM ODA V1

Symptoms

1) ODA ASM +REDO diskgroup cannot be mounted due to the next "WARNING: Write Failed" errors:

SQL> ALTER DISKGROUP REDO MOUNT /* asm agent *//* {1:21142:425} */
NOTE: cache registered group REDO number=3 incarn=0xcf4880b4
NOTE: cache began mount (first) of group REDO number=3 incarn=0xcf4880b4
NOTE: Assigning number (3,23) to disk (/dev/mapper/SSD_E1_S23_805696743p1)
NOTE: Assigning number (3,22) to disk (/dev/mapper/SSD_E1_S22_805699136p1)
NOTE: Assigning number (3,21) to disk (/dev/mapper/SSD_E0_S21_805699139p1)
NOTE: Assigning number (3,20) to disk (/dev/mapper/SSD_E0_S20_805699133p1)
Sat Aug 27 17:01:09 2016
NOTE: cache closing disk 20 of grp 3: (not open) SSD_E0_S20_805699133P1
NOTE: cache closing disk 22 of grp 3: (not open) SSD_E1_S22_805699136P1
WARNING: Write Failed. group:3 disk:23 AU:1 offset:4190208 size:4096
WARNING: Hbeat write to PST disk 23.3915935947 in group 3 failed. [4]
ERROR: GMON could not set any hearbeat (grp 3)
NOTE: cache dismounting (clean) group 3/0xCF4880B4 (REDO)
NOTE: messaging CKPT to quiesce pins Unix process pid: 52344, image: oracle@asmcloud1
NOTE: dbwr not being msg'd to dismount
NOTE: lgwr not being msg'd to dismount
NOTE: cache dismounted group 3/0xCF4880B4 (REDO)
NOTE: cache ending mount (fail) of group REDO number=3 incarn=0xcf4880b4
NOTE: cache deleting context for group REDO 3/0xcf4880b4
GMON dismounting group 3 at 19 for pid 31, osid 52344
NOTE: Disk SSD_E0_S20_805699133P1 in mode 0x1 marked for de-assignment
NOTE: Disk SSD_E0_S21_805699139P1 in mode 0x0 marked for de-assignment
NOTE: Disk SSD_E1_S22_805699136P1 in mode 0x1 marked for de-assignment
NOTE: Disk SSD_E1_S23_805696743P1 in mode 0x7f marked for de-assignment
ERROR: diskgroup REDO was not mounted
ORA-15032: not all alterations performed
ORA-15017: diskgroup "REDO" cannot be mounted

2) Also, trying to recreate the ODA ASM REDO diskgroup using the original SSD disks also fails with the next "Input/output" write errors:

SQL> create diskgroup REDO HIGH REDUNDANCY DISK
' 2 /dev/mapper/SSD_E1_S23_805696743p1' NAME SSD_E1_S23_805696743p1 FORCE,
3 '/dev/mapper/SSD_E1_S22_805699136p1' NAME SSD_E1_S22_805699136p1 FORCE,
4 '/dev/mapper/SSD_E0_S21_805699139p1' NAME SSD_E0_S21_805699139p1 FORCE,
5 '/dev/mapper/SSD_E0_S20_805699133p1' NAME SSD_E0_S20_805699133p1 FORCE
6 attribute 'compatible.asm'='11.2.0.4', 'compatible.rdbms'='11.2.0.2','sector_size'='512','AU_SIZE'='4M','content.type'='redo';
create diskgroup REDO HIGH REDUNDANCY DISK
*
ERROR at line 1:
ORA-15018: diskgroup cannot be created
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 65536
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 65536
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 65536
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 65536
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 65536
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 65536
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 65536
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 65536
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 65536
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 65536
ORA-15080: synchronous I/O operation to a disk failed
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 65536
ORA-15080: synchronous I/O operation to a disk failed
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 65536
ORA-15080: synchronous I/O operation to a disk failed
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 65536
ORA-15080: synchronous I/O operation to a disk failed
ORA-27061:

3) "oakcli show disk" ODA command reports the SSD disk in "Good” state:

NAME PATH TYPE STATE STATE_DETAILS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pd_00 /dev/sdc HDD ONLINE Good
pd_01 /dev/sdm HDD ONLINE Good
pd_02 /dev/sdo HDD ONLINE Good
pd_03 /dev/sdy HDD ONLINE Good
pd_04 /dev/sdd HDD ONLINE Good
pd_05 /dev/sdn HDD ONLINE Good
pd_06 /dev/sdp HDD ONLINE Good
pd_07 /dev/sdz HDD ONLINE Good
pd_08 /dev/sde HDD ONLINE Good
pd_09 /dev/sdk HDD ONLINE Good
pd_10 /dev/sdq HDD ONLINE Good
pd_11 /dev/sdw HDD ONLINE Good
pd_12 /dev/sdf HDD ONLINE Good
pd_13 /dev/sdl HDD ONLINE Good
pd_14 /dev/sdr HDD ONLINE Good
pd_15 /dev/sdx HDD ONLINE Good
pd_16 /dev/sdg HDD ONLINE Good
pd_17 /dev/sdi HDD ONLINE Good
pd_18 /dev/sds HDD ONLINE Good
pd_19 /dev/sdu HDD ONLINE Good
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pd_20 /dev/sdh SSD ONLINE Good <(===
pd_21 /dev/sdj SSD ONLINE Good <(===
pd_22 /dev/sdt SSD ONLINE Good <(===
pd_23 /dev/sdv SSD ONLINE Good <(===

4) Nevertheless, the SSD disks are reporting ["Write protected"] mode errors in the OS logs ("/var/log/messages"), which confirm for sure this physical disk failure:

a) pd_23 = /dev/sdv:

Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:20:0: [sdv] Unhandled sense code
Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:20:0: [sdv] Result: hostbyte=invalid driverbyte=DRIVER_SENSE
Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:20:0: [sdv] Sense Key : Data Protect [current]
Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:20:0: [sdv] Add. Sense: Write protected <(====
Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:20:0: [sdv] CDB: Write(10): 2a 00 00 00 17 00 00 00 80 00

b) pd_22 = /dev/sdt:

Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:18:0: [sdt] Unhandled sense code
Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:18:0: [sdt] Result: hostbyte=invalid driverbyte=DRIVER_SENSE
Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:18:0: [sdt] Sense Key : Data Protect [current]
Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:18:0: [sdt] Add. Sense: Write protected <(====
Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:18:0: [sdt] CDB: Write(10): 2a 00 00 00 17 00 00 00 80 00

c) pd_21 = /dev/sdj:

Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:7:0: [sdj] Unhandled sense code
Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:7:0: [sdj] Result: hostbyte=invalid driverbyte=DRIVER_SENSE
Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:7:0: [sdj] Sense Key : Data Protect [current]
Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:7:0: [sdj] Add. Sense: Write protected <(====
Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:7:0: [sdj] CDB: Write(10): 2a 00 00 00 17 00 00 00 80 00

d) pd_20= /dev/sdh:

Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:5:0: [sdh] Unhandled sense code
Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:5:0: [sdh] Result: hostbyte=invalid driverbyte=DRIVER_SENSE
Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:5:0: [sdh] Sense Key : Data Protect [current]
Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:5:0: [sdh] Add. Sense: Write protected <(====
Oct 14 14:53:07 asmcloud1 kernel: sd 6:0:5:0: [sdh] CDB: Write(10): 2a 00 00 00 17 00 00 00 80 00

5) The problem occurred due to the SSD disks are faulty and need to be replaced (end of life disks).

Workaround

1) Open a Service Request with Oracle Support to replace the faulty SSD disk(s) right away.

2) If all the SSD disks are affected at the same time, then you will need to recreate the +REDO diskgroup on brand new SSD disks and recreate the associated ACFS filesystems.

Community Discussions ODA

Still have questions? Use the communities window below to search for similar discussions or start a new discussion on this subject. (Window is the live community not a screenshot)

Click here to open in main browser window

History

[16-OCT-2016] - [Alert: 2195194.1 was created]

Attachments

This solution has no attachment