ODA : Old Disk Path information still exist in ASM

Asset ID:	1-72-1637898.1
Update Date:	2016-08-17
Keywords:

Solution Type Problem Resolution Sure

Solution 1637898.1 : ODA : Old Disk Path information still exist in ASM

Applies to:

Oracle Database Appliance - Version All Versions and later
Information in this document applies to any platform.

Symptoms

After a disk replacement in ODA machine , old disk information still exist in environment.

You can get following symptoms for the old disk.

1. Output of v$asm_disk will still show old disk path with group_number=0 . < group 0 means unowned

2. vgs output will still show IO errors for old disk.

3. /etc/multipath.conf will show old disk information.

4. /dev/mapper will show old disk information.

5. /dev will also show device information.

6. /dev/mpath will also be having entry for old disk.

7. /var/log/message will also record IO errors.

8. ASM alert logs also display IO errors.

Along with entries for the old disk , new disk information will also available and it can be added back to diskgroup successfully.

*****These symptoms will be on both the nodes.*****

Example :

------------

Check in /dev/mapper to check old information -->>

[root@NODE1 mapper]# ls -altr *S03*
brw-rw---- 1 grid asmadmin 253, 24 Mar 27 2013 HDD_E1_S03_1211732875
brw-rw---- 1 grid asmadmin 253, 30 Mar 7 13:23 HDD_E1_S03_1211732875p1
brw-rw---- 1 grid asmadmin 253, 31 Mar 7 13:30 HDD_E1_S03_1211732875p2
brw-rw---- 1 grid asmadmin 253, 72 Mar 10 14:19 HDD_E1_S03_1797259843
brw-rw---- 1 grid asmadmin 253, 74 Mar 10 16:29 HDD_E1_S03_1797259843p2
brw-rw---- 1 grid asmadmin 253, 73 Mar 10 16:29 HDD_E1_S03_1797259843p1

Disk pd_03 was replaced in this environment but ls output from /dev/mapper is still showing old and new disk information.

PATH HDD_E1_S03_1211732875 is for old disk and PATH HDD_E1_S03_1797259843 is representing new disk.

Output of "vgs" will show IO errors like given below -->>

[root@NODE1 mapper]# vgs
/dev/mpath/HDD_E1_S03_1211732875: read failed after 0 of 4096 at 600127176704: Input/output error
/dev/mpath/HDD_E1_S03_1211732875: read failed after 0 of 4096 at 600127258624: Input/output error
/dev/mpath/HDD_E1_S03_1211732875: read failed after 0 of 4096 at 0: Input/output error
/dev/mpath/HDD_E1_S03_1211732875: read failed after 0 of 4096 at 4096: Input/output error
/dev/mpath/HDD_E1_S03_1211732875p1: read failed after 0 of 4096 at 515396009984: Input/output error
/dev/mpath/HDD_E1_S03_1211732875p1: read failed after 0 of 4096 at 515396067328: Input/output error
/dev/mpath/HDD_E1_S03_1211732875p1: read failed after 0 of 4096 at 0: Input/output error
/dev/mpath/HDD_E1_S03_1211732875p1: read failed after 0 of 4096 at 4096: Input/output error
/dev/mpath/HDD_E1_S03_1211732875p2: read failed after 0 of 512 at 84726382592: Input/output error
/dev/mpath/HDD_E1_S03_1211732875p2: read failed after 0 of 512 at 84726472704: Input/output error
/dev/mpath/HDD_E1_S03_1211732875p2: read failed after 0 of 512 at 0: Input/output error
/dev/mpath/HDD_E1_S03_1211732875p2: read failed after 0 of 512 at 4096: Input/output error
/dev/mpath/HDD_E1_S03_1211732875p2: read failed after 0 of 2048 at 0: Input/output error
VG #PV #LV #SN Attr VSize VFree
VolGroupSys 1 4 0 wz--n- 465.66G 251.66G

These error messages are related with old disk.

ASM alert log error message :--

Mon Mar 10 13:59:39 2014
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_pz99_29322.trc:
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 4096
WARNING: Read Failed. group:0 disk:51 AU:0 offset:0 size:4096
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_pz99_29322.trc:
ORA-27061: waiting for async I/Os failed
Linux-x86_64 Error: 5: Input/output error
Additional information: -1
Additional information: 4096
ORA-15080: synchronous I/O operation to a disk failed

V$ASM_DISK output :--

SQL> select path, name, header_status, mode_status, mount_status, state, failgroup, group_number from v$asm_disk order by path;

PATH                                           NAME                   HEADER_STATUS      MODE_ST    MOUNT_S       STATE            FAILGROUP                                  GROUP_NUMBER
-------------------------------------                    -----------------           ---------------- ----------    ------------     ---------    ----------------      ------------------------ ------------
/HDD_E0_S19_1130281880p1                 HDD_E0_S19_1130281880P1      MEMBER          ONLINE    CACHED         NORMAL HDD_E0_S19_1130281880P1                     2   << --- New disk already added back to diskgroup 2
/HDD_E0_S19_1230211442p1                 HDD_E0_S19_1230211442p1      MEMBER          ONLINE       CLOSED       NORMAL                                                                     0   << --- Group #0 (Old disk )

***Where Group 0 means the disk does not belong to any of the expected disk groups.

Changes

Failed disk is replaced.

Cause

Old disk information is not removed.

This must be removed automatically but due to following bug OLD disk information is not removed.

Bug 16964646 : DISK REPLACEMENT/FCO ISSUES WITH BOTH OLD AND NEW DISK SEEN ---->> This bug is closed as duplicate of Bug 14223113
Bug 14223113 : ASM DISK NOT RELEASED BY CRSD.BIN PROCESS AFTER DROPPING DISK

Solution

To remove the old disk entry follow the steps from any one option.

Option 1:

Reboot of the both nodes can solve this issue automatically.

In case reboot is not possible follow the steps given in option 2.

Option 2:

Remove the old disk information manually.

*****These steps must be run on both the nodes.*****

STEP 1.

--- Remove multipath information from "/dev/mapper" ---

[root@NODE1 mapper]# ls -altr *S03*
brw-rw---- 1 grid asmadmin 253, 24 Mar 27 2013 HDD_E1_S03_1211732875 ------>> OLD DISK
brw-rw---- 1 grid asmadmin 253, 30 Mar 7 13:23 HDD_E1_S03_1211732875p1 ------>> OLD DISK
brw-rw---- 1 grid asmadmin 253, 31 Mar 7 13:30 HDD_E1_S03_1211732875p2 ------>> OLD DISK

brw-rw---- 1 grid asmadmin 253, 72 Mar 10 14:19 HDD_E1_S03_1797259843 ------>> NEW DISK
brw-rw---- 1 grid asmadmin 253, 74 Mar 10 16:29 HDD_E1_S03_1797259843p2 ------>> NEW DISK
brw-rw---- 1 grid asmadmin 253, 73 Mar 10 16:29 HDD_E1_S03_1797259843p1 ------>> NEW DISK

Remove old disk files.

[root@NODE1 mapper]# rm HDD_E1_S03_1211732875p2 HDD_E1_S03_1211732875p1 HDD_E1_S03_1211732875

STEP 2.

--- Remove entry from "/dev/mpath" ---

[root@NODE1 mpath]# ls -altr *S03*
lrwxrwxrwx 1 root root 8 Mar 27 2013 HDD_E1_S03_1211732875 -> ../dm-24       ------>> OLD DISK
lrwxrwxrwx 1 root root 8 Mar 27 2013 HDD_E1_S03_1211732875p2 -> ../dm-31   ------>> OLD DISK
lrwxrwxrwx 1 root root 8 Mar 27 2013 HDD_E1_S03_1211732875p1 -> ../dm-30   ------>> OLD DISK
lrwxrwxrwx 1 root root 8 Mar 10 13:15 HDD_E1_S03_1797259843p1 -> ../dm-73 ------>> NEW DISK
lrwxrwxrwx 1 root root 8 Mar 10 13:16 HDD_E1_S03_1797259843 -> ../dm-72     ------>> NEW DISK
lrwxrwxrwx 1 root root 8 Mar 10 13:16 HDD_E1_S03_1797259843p2 -> ../dm-74 ------>> NEW DISK

[root@NODE1 mpath]# rm HDD_E1_S03_1211732875 HDD_E1_S03_1211732875p1 HDD_E1_S03_1211732875p2

rm: remove symbolic link `HDD_E1_S03_1211732875'? y
rm: remove symbolic link `HDD_E1_S03_1211732875p1'? y
rm: remove symbolic link `HDD_E1_S03_1211732875p2'? y

STEP 3.

--- Remove device Information from "/dev" ---

[root@NODE1 dev]# rm dm-24 dm-30 dm-31
rm: remove block special file `dm-24'? y
rm: remove block special file `dm-30'? y
rm: remove block special file `dm-31'? y

Option 3:

Apply solution which is not specific to ODA from Document 1485163.1 Disks of Dismounted Diskgroup Are Still Hold / Lock By Oracle Process on 11.2.0.3
- Note that this workaround can be used on versions of the ODA which are higher than 11.2.0.3

Steps To verify :

Now verify the following to check that old disk information is removed

1. Now there is no IO error in vgs output.

[root@NODE1 dev]# vgs
VG #PV #LV #SN Attr VSize VFree
VolGroupSys 1 4 0 wz--n- 465.66G 251.66G

2. Check v$asm_disk output old disk information is be removed.

3. Check in /var/log/messages file , IO errors is stopped now.

4.Check ASM alert log , there will be no new IO error.

References

<NOTE:1485163.1> - Disks of Dismounted Diskgroup Are Still Hold / Lock By Oracle Process on 11.2.0.3
<BUG:13869294> - DISMOUNTING DISKGROUP IN ASM BUT DEVICE STILL IN USE BY AN ASM PROCESS
<BUG:14223113> - ASM DISK NOT RELEASED BY CRSD.BIN PROCESS AFTER DROPPING DISK
<NOTE:1644043.1> - ODA : 24 Extra Disk paths exists in ASM with group_number 0 and without suffix p1 or p2
<NOTE:1981125.1> - Oracle Database Appliance (ODA) Reference to Disk and Storage Issue Notes

Attachments

This solution has no attachment