ODA: Unable to Add Multipath Disk Partition to ASM Diskgroup Due to ORA-00600: [kfgCanRepartner04] (Includes a Troubleshooting Demo)

Asset ID:	1-72-2180519.1
Update Date:	2016-11-10
Keywords:

Solution Type Problem Resolution Sure

Solution 2180519.1 : ODA: Unable to Add Multipath Disk Partition to ASM Diskgroup Due to ORA-00600: [kfgCanRepartner04] (Includes a Troubleshooting Demo)

Applies to:

Oracle Database Appliance Software - Version 12.1.2 to 12.1.2.7 [Release 12.1]
Oracle Database - Enterprise Edition - Version 12.1.0.1 to 12.2 BETA1 [Release 12.1 to 12.2]
Oracle Database Appliance - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

1) The next P1 and P2 partitions cannot be added to the RECO and DATA diskgroup respectively:

SQL> select HEADER_STATUs, path from v$asm_disk where GROUP_NUMBER = 0;

HEADER_STATUS PATH
------------------------------
CANDIDATE /dev/mapper/HDD_E0_S17_1092437096p2
CANDIDATE /dev/mapper/HDD_E0_S17_1092437096p1

2) The 2 disk multipath partitions can be read at OS level:

SQL> !ls -l /dev/mapper/HDD_E0_S17_1092437096p2
brw-rw---- 1 grid asmadmin 252, 39 Aug 31 18:02 /dev/mapper/HDD_E0_S17_1092437096p2

SQL> !ls -l /dev/mapper/HDD_E0_S17_1092437096p1
brw-rw---- 1 grid asmadmin 252, 38 Aug 31 18:02 /dev/mapper/HDD_E0_S17_1092437096p1

SQL> !dd if=/dev/mapper/HDD_E0_S17_1092437096p2 of=/dev/null count=100 bs=8192
100+0 records in
100+0 records out
819200 bytes (819 kB) copied, 0.00654007 seconds, 125 MB/s

SQL> !dd if=/dev/mapper/HDD_E0_S17_1092437096p1 of=/dev/null count=100 bs=8192
100+0 records in
100+0 records out
819200 bytes (819 kB) copied, 0.0119715 seconds, 68.4 MB/s

3) But adding the 2 partitions are reporting the next errors:

SQL> alter diskgroup DATA add failgroup HDD_E0_S17_1092437096p1 disk '/dev/mapper/HDD_E0_S17_1092437096p1' name HDD_E0_S17_1092437096p1 force;
alter diskgroup DATA add failgroup HDD_E0_S17_1092437096p1 disk '/dev/mapper/HDD_E0_S17_1092437096p1' name HDD_E0_S17_1092437096p1 force
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kfgCanRepartner04], [16], [65535],[100], [], [], [], [], [], [], [], []

SQL> alter diskgroup DATA add disk '/dev/mapper/HDD_E0_S17_1092437096p1' force;
alter diskgroup DATA add disk '/dev/mapper/HDD_E0_S17_1092437096p1' force
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kfgCanRepartner04], [16], [65535],[100], [], [], [], [], [], [], [], []

SQL> alter diskgroup /*+ _OAK_AsmCookie */ DATA add failgroup HDD_E0_S17_1092437096p1 disk '/dev/mapper/HDD_E0_S17_1092437096p1' name HDD_E0_S17_1092437096p1 force;
alter diskgroup /*+ _OAK_AsmCookie */ DATA add failgroup HDD_E0_S17_1092437096p1 disk '/dev/mapper/HDD_E0_S17_1092437096p1' name HDD_E0_S17_1092437096p1 force
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kfgCanRepartner04], [16], [65535],[100], [], [], [], [], [], [], [], []

SQL> alter diskgroup RECO add failgroup HDD_E0_S17_1092437096p2 disk '/dev/mapper/HDD_E0_S17_1092437096p2' name HDD_E0_S17_1092437096p2 force;
alter diskgroup RECO add failgroup HDD_E0_S17_1092437096p2 disk '/dev/mapper/HDD_E0_S17_1092437096p2' name HDD_E0_S17_1092437096p2 force
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kfgCanRepartner04], [16], [65535],[100], [], [], [], [], [], [], [], []

SQL> alter diskgroup RECO add disk '/dev/mapper/HDD_E0_S17_1092437096p2' force;
alter diskgroup RECO add disk '/dev/mapper/HDD_E0_S17_1092437096p2' force
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kfgCanRepartner04], [16], [65535],[100], [], [], [], [], [], [], [], []

SQL> alter diskgroup /*+ _OAK_AsmCookie */ RECO add failgroup HDD_E0_S17_1092437096p2 disk '/dev/mapper/HDD_E0_S17_1092437096p2' name HDD_E0_S17_1092437096p2 force;
alter diskgroup /*+ _OAK_AsmCookie */ RECO add failgroup HDD_E0_S17_1092437096p2 disk '/dev/mapper/HDD_E0_S17_1092437096p2' name HDD_E0_S17_1092437096p2 force
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kfgCanRepartner04], [16], [65535],[100], [], [], [], [], [], [], [], []

4) /opt/oracle/extapi/asmappl.config on both nodes seems in good shape:

[grid@asmcloud07 ~]$ grep E0_S17 /opt/oracle/extapi/asmappl.config
disk /dev/mapper/HDD_E0_S17_1092437096p1 0 17 1
disk /dev/mapper/HDD_E0_S17_1092437096p2 0 17 2

[grid@asmcloud08 ~]$ grep E0_S17 /opt/oracle/extapi/asmappl.config
disk /dev/mapper/HDD_E0_S17_1092437096p1 0 17 1
disk /dev/mapper/HDD_E0_S17_1092437096p2 0 17 2

5) Also, the physical disk in the slot#17 is in good shape:

grid@asmcloud07 ~]$ oakcli show disk
NAME PATH TYPE STATE STATE_DETAILS

pd_00 /dev/sdc HDD ONLINE Good
pd_01 /dev/sdm HDD ONLINE Good
pd_02 /dev/sdb HDD ONLINE Good
pd_03 /dev/sdx HDD ONLINE Good
pd_04 /dev/sdd HDD ONLINE Good
pd_05 /dev/sdn HDD ONLINE Good
pd_06 /dev/sdo HDD ONLINE Good
pd_07 /dev/sdy HDD ONLINE Good
pd_08 /dev/sde HDD ONLINE Good
pd_09 /dev/sdk HDD ONLINE Good
pd_10 /dev/sdp HDD ONLINE Good
pd_11 /dev/sdv HDD ONLINE Good
pd_12 /dev/sdf HDD ONLINE Good
pd_13 /dev/sdl HDD ONLINE Good
pd_14 /dev/sdq HDD ONLINE Good
pd_15 /dev/sdw HDD ONLINE Good
pd_16 /dev/sdg HDD ONLINE Good
pd_17 /dev/sdi HDD ONLINE Good <(====
pd_18 /dev/sdr HDD ONLINE Good
pd_19 /dev/sdt HDD ONLINE Good
pd_20 /dev/sdh SSD ONLINE Good
pd_21 /dev/sdj SSD ONLINE Good
pd_22 /dev/sds SSD ONLINE Good
pd_23 /dev/sdu SSD ONLINE Good

[grid@asmcloud08 ~]$ oakcli show disk pd_17
Resource: pd_17
ActionTimeout : 1500
ActivePath : /dev/sdi
AsmDiskList : |data_17||reco_17|
AutoDiscovery : 1
AutoDiscoveryHi : |data:86:HDD||reco:14:HDD||redo:100
:SSD|
CheckInterval : 300
ColNum : 1
DependListOpr : add
Dependency : |0|
DiskId : 35000cca0411d4468
DiskType : HDD
Enabled : 0
ExpNum : 0
IState : 0
Initialized : 0
IsConfigDepende : false
MonitorFlag : 0
MultiPathList : |/dev/sdas||/dev/sdi|
Name : pd_17
NewPartAddr : 0
OSUserType : |userType:Multiuser|
PrevState : Invalid
PrevUsrDevName :
SectorSize : 512
SerialNum : 001250KJ2Z6N
Size : 600127266816
SlotNum : 17
State : Online
StateChangeTs : 1472677767
StateDetails : Good
TotalSectors : 1172123568
TypeName : 0
UsrDevName : HDD_E0_S17_1092437096
gid : 0
mode : 660
uid : 0

[grid@asmcloud08 ~]$ su -
Password:
[root@asmcloud08 ~]# oakcli stordiag pd_17
Node Name : asmcloud08
Test : Diagnostic Test Description

1 : OAK Check
NAME PATH TYPE STATE STATE_DETAILS
pd_17 /dev/sdi HDD ONLINE Good

2 : ASM Check
ASM Disk Status : group_number state mode_s mount_s header_s
/dev/mapper/HDD_E0_S17_1092437096p1 : 0 NORMAL ONLINE CLOSED MEMBER
/dev/mapper/HDD_E0_S17_1092437096p2 : 0 NORMAL ONLINE CLOSED MEMBER

3 : Multipathd Status
multipathd running on system

4 : Multipath Status
Device List : /dev/sdi /dev/sdas
Info:
HDD_E0_S17_1092437096 (35000cca0411d4468) dm-9 HITACHI,HUS1560SCSUN600G
size=559G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| '- 6:0:6:0 sdi 8:128 active ready running
'-+- policy='round-robin 0' prio=1 status=enabled
'- 7:0:19:0 sdas 66:192 active ready running

5 : Check Partition using fdisk
Check using active device path: /dev/sdi
Partition check on device /dev/sdi : PASS
Partition list found by fdisk for active device path: /dev/sdi
Device Boot Start End Blocks Id System
/dev/sdi1 1 62661 503316480 83 Linux
/dev/sdi2 62661 72232 76882638+ 83 Linux
Check using passive device path: /dev/sdas
Partition check on device /dev/sdas : PASS
Partition list found by fdisk for passive device path: /dev/sdas
Device Boot Start End Blocks Id System
/dev/sdas1 1 62661 503316480 83 Linux
/dev/sdas2 62661 72232 76882638+ 83 Linux

6 : Device Mapper Diagnostics
Mapper Device : dm-9
Partition List: HDD_E0_S17_1092437096p2 HDD_E0_S17_1092437096p1
Permissions :
/dev/mapper/HDD_E0_S17_1092437096p2 : brw-rw---- grid asmadmin
/dev/mapper/HDD_E0_S17_1092437096p1 : brw-rw---- grid asmadmin
Open Ref Count:

7 : asmappl.config and multipath.conf consistency check
/opt/oracle/extapi/asmappl.config file is in sync between nodes
/etc/multipath.conf file is in sync between nodes

8 : fwupdate
ID Manufacturer Model Chassis Slot Type Media Size (GB) FW Version XML Support
c1d17 HITACHI HUS1560SCSUN600G 0 17 sas HDD 559 A820 N/A
c2d17 HITACHI HUS1560SCSUN600G 0 17 sas HDD 559 A820 N/A

9 : Fishwrap
Controller "mpt2sas:0d:00.0"
Disk /dev/sdi: HITACHI HUS1560SCSUN600G (s/n "001250KJ2Z6N CZVJ2Z6N"), bay 17
Controller "mpt2sas:1f:00.0"
Disk /dev/sdas: HITACHI HUS1560SCSUN600G (s/n "001250KJ2Z6N CZVJ2Z6N"), bay 17

10 : SCSI INQUIRY
Active multipath device /dev/sdi : PASS
Passive multipath device /dev/sdas : PASS

11 : Multipath Conf for device
multipath {
wwid 35000cca0411d4468
alias HDD_E0_S17_1092437096
}

12 : Last few LSI Events Received for slot 17
[INFO]: No LSI events are recorded in OAKD logs

13 : Version Information
OAK : 12.1.2.4.0
kernel : 2.6.39-400.250.6.el5uek
mpt2sas : 17.00.06.00
Multipath : 0.4.9
Disk Firmware : A820

14 : OAK Conf Parms
Device : queue_depth Timeout max_sectors_kb nr_requests read_ahead_kb scheduler
/dev/sdi : 32 32 1024 4096 128 noop [deadline] cfq
/dev/sdas : 32 32 1024 4096 128 noop [deadline] cfq

******************************
********** 2nd NODE **********
******************************

Node Name : asmcloud07
Test : Diagnostic Test Description

1 : OAK Check
NAME PATH TYPE STATE STATE_DETAILS
pd_17 /dev/sdi HDD ONLINE Good

3 : Multipathd Status
multipathd running on system

4 : Multipath Status
Device List : /dev/sdi /dev/sdar
Info:
HDD_E0_S17_1092437096 (35000cca0411d4468) dm-10 HITACHI,HUS1560SCSUN600G
size=559G features='0' hwhandler='0' wp=rw
|-+- policy='round-robin 0' prio=1 status=active
| '- 6:0:7:0 sdi 8:128 active ready running
'-+- policy='round-robin 0' prio=1 status=enabled
'- 7:0:19:0 sdar 66:176 active ready running

5 : Check Partition using fdisk
Check using active device path: /dev/sdi
Partition check on device /dev/sdi : PASS
Partition list found by fdisk for active device path: /dev/sdi
Device Boot Start End Blocks Id System
/dev/sdi1 1 62661 503316480 83 Linux
/dev/sdi2 62661 72232 76882638+ 83 Linux
Check using passive device path: /dev/sdar
Partition check on device /dev/sdar : PASS
Partition list found by fdisk for passive device path: /dev/sdar
Device Boot Start End Blocks Id System
/dev/sdar1 1 62661 503316480 83 Linux
/dev/sdar2 62661 72232 76882638+ 83 Linux

6 : Device Mapper Diagnostics
Mapper Device : dm-10
Partition List: HDD_E0_S17_1092437096p2 HDD_E0_S17_1092437096p1
Permissions :
/dev/mapper/HDD_E0_S17_1092437096p2 : brw-rw---- grid asmadmin
/dev/mapper/HDD_E0_S17_1092437096p1 : brw-rw---- grid asmadmin
Open Ref Count:

7 : asmappl.config and multipath.conf consistency check
/opt/oracle/extapi/asmappl.config file is in sync between nodes
/etc/multipath.conf file is in sync between nodes

9 : Fishwrap
Controller "mpt2sas:0d:00.0"
Disk /dev/sdi: HITACHI HUS1560SCSUN600G (s/n "001250KJ2Z6N CZVJ2Z6N"), bay 17
Controller "mpt2sas:1f:00.0"
Disk /dev/sdar: HITACHI HUS1560SCSUN600G (s/n "001250KJ2Z6N CZVJ2Z6N"), bay 17

10 : SCSI INQUIRY
Active multipath device /dev/sdi : PASS
Passive multipath device /dev/sdar : PASS

11 : Multipath Conf for device
multipath {
wwid 35000cca0411d4468
alias HDD_E0_S17_1092437096
}

12 : Last few LSI Events Received for slot 17
[INFO]: No LSI events are recorded in OAKD logs

13 : Version Information
OAK : 12.1.2.4.0
kernel : 2.6.39-400.250.6.el5uek
mpt2sas : 17.00.06.00
Multipath : 0.4.9
Disk Firmware : A820

14 : OAK Conf Parms
Device : queue_depth Timeout max_sectors_kb nr_requests read_ahead_kb scheduler
/dev/sdi : 32 32 1024 4096 128 noop [deadline] cfq
/dev/sdar : 32 32 1024 4096 128 noop [deadline] cfq

6) “Check All NoRepair” health check is not reporting ASM metadata inconsistencies in the diskgroup:

[grid@asmcloud07 ~]$ sqlplus " /as sysasm"

SQL*Plus: Release 12.1.0.2.0 Production on Thu Sep 1 13:05:23 2016

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter diskgroup data check all norepair;

Diskgroup altered.

SQL>

.
.
.
Thu Sep 01 13:06:19 2016
SUCCESS: check of diskgroup DATA found no errors
Thu Sep 01 13:06:19 2016
SUCCESS: alter diskgroup data check all norepair

grid@asmcloud08 ~]$ sqlplus "/as sysasm"

SQL*Plus: Release 12.1.0.2.0 Production on Thu Sep 1 13:06:24 2016

Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options

SQL> alter diskgroup reco check all norepair;

Diskgroup altered.

SQL>

.
.
.
Thu Sep 01 13:06:43 2016
SUCCESS: check of diskgroup RECO found no errors
Thu Sep 01 13:06:43 2016
SUCCESS: alter diskgroup reco check all norepair

7) AMDU health check is not reporting corruption issues in both diskgroups:

------------------------- SUMMARY FOR DISKGROUP RECO -------------------------
Allocated AU's: 200616
Free AU's: 174784
AU's read for dump: 522
Block images saved: 142321
Map lines written: 522
Heartbeats seen: 3
Corrupt metadata blocks: 0
Corrupt AT blocks: 0

------------------------- SUMMARY FOR DISKGROUP DATA -------------------------
Allocated AU's: 1107107
Free AU's: 1350493
AU's read for dump: 541
Block images saved: 155860
Map lines written: 541
Heartbeats seen: 3
Corrupt metadata blocks: 0
Corrupt AT blocks: 0

Cause

1) In order to obtain an accurate diagnostic, additional tracing information was generated as follows:

1.1) Connected as grid OS user to the first ODA node (+ASM).

1.2) All the logs in the /u01/app/grid/diag/asm/+asm/+ASM1/trace/ directory were move to another location.

1.3) Then, the problem was reproduced as follows:

a) Connect to the +ASM1 instance as sysasm and execute the next steps:

SQL> alter system set events'trace[KGF] disk highest';

SQL> alter system set events'trace[KFD] disk highest';

SQL> alter system set events'trace[ASM] disk highest';

SQL> alter system set events '600 trace name errorstack level 3; name systemstate level 267';

SQL> alter diskgroup DATA add failgroup HDD_E0_S17_1092437096p1 disk '/dev/mapper/HDD_E0_S17_1092437096p1' name HDD_E0_S17_1092437096p1 force;

SQL> alter diskgroup RECO add failgroup HDD_E0_S17_1092437096p2 disk '/dev/mapper/HDD_E0_S17_1092437096p2' name HDD_E0_S17_1092437096p2 force;

SQL> alter system set events'trace[KGF] off';

SQL> alter system set events'trace[KFD] off';

SQL> alter system set events'trace[ASM] off';

SQL> alter system set events '600 trace name errorstack off';

2.4) The entire /u01/app/grid/diag/asm/+asm/+ASM1/trace/ directory (asm1-trace.tar.Z) was provided:

# tar -cvf asm1-trace.tar /u01/app/grid/diag/asm/+asm/+ASM1/trace/

# compress asm1-trace.tar

3) The alert_+ASM1.log(asm1-trace.tar.Z) reported the next information:

.
.
.
Fri Sep 02 17:24:35 2016
OS Pid: 1296 executed alter system set events 'trace[KGF] off'
Fri Sep 02 17:24:44 2016
OS Pid: 1296 executed alter system set events 'trace[KFD] off'
OS Pid: 1296 executed alter system set events 'trace[ASM] off'
OS Pid: 1296 executed alter system set events '600 trace name errorstack off'
Fri Sep 02 17:25:13 2016
OS Pid: 1296 executed alter system set events 'trace[KGF] disk highest'
OS Pid: 1296 executed alter system set events 'trace[KFD] disk highest'
OS Pid: 1296 executed alter system set events 'trace[ASM] disk highest'
OS Pid: 1296 executed alter system set events '600 trace name errorstack level 3; name systemstate level 267'
Fri Sep 02 17:25:18 2016
SQL> alter diskgroup DATA add failgroup HDD_E0_S17_1092437096p1 disk '/dev/mapper/HDD_E0_S17_1092437096p1' name HDD_E0_S17_1092437096p1 force
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (1,17) to disk (/dev/mapper/HDD_E0_S17_1092437096p1)
NOTE: requesting all-instance membership refresh for group=1
NOTE: Disk 17 in group 1 is assigned fgnum=18
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_1296.trc (incident=121856):
ORA-00600: internal error code, arguments: [kfgCanRepartner04], [16], [65535], [100], [], [], [], [], [], [], [], []
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_121856/+ASM1_ora_1296_i121856.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Fri Sep 02 17:25:20 2016
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_121856/+ASM1_ora_1296_i121856.trc:
ORA-00600: internal error code, arguments: [kfgCanRepartner04], [16], [65535], [100], [], [], [], [], [], [], [], []
Fri Sep 02 17:25:20 2016
Dumping diagnostic data in directory=[cdmp_20160902172520], requested by (instance=1, osid=1296), summary=[incident=121856].
Fri Sep 02 17:25:21 2016
System State dumped to trace file /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_121856/+ASM1_ora_1296_i121856.trc
Fri Sep 02 17:25:21 2016
Dumping diagnostic data in directory=[cdmp_20160902172521], requested by (instance=1, osid=1296), summary=[incident=121856].
Fri Sep 02 17:25:25 2016
NOTE: membership refresh pending for group 1/0xe3d92177 (DATA)
Fri Sep 02 17:25:25 2016
GMON querying group 1 at 93 for pid 21, osid 36635
GMON querying group 1 at 94 for pid 21, osid 36635
Fri Sep 02 17:25:25 2016
NOTE: Disk HDD_E0_S17_1092437096P1 in mode 0x0 marked for de-assignment
Fri Sep 02 17:25:25 2016
GMON querying group 1 at 95 for pid 21, osid 36635
Fri Sep 02 17:25:25 2016
SUCCESS: refreshed membership for 1/0xe3d92177 (DATA)
Fri Sep 02 17:25:26 2016
Sweep [inc][121856]: completed
Sweep [inc2][121856]: completed
NOTE: Attempting voting file refresh on diskgroup DATA
NOTE: Refresh completed on diskgroup DATA
. Found 3 voting file(s).
Fri Sep 02 17:25:28 2016
ORA-00600: internal error code, arguments: [kfgCanRepartner04], [16], [65535], [100], [], [], [], [], [], [], [], []

Fri Sep 02 17:25:28 2016
ERROR: alter diskgroup DATA add failgroup HDD_E0_S17_1092437096p1 disk '/dev/mapper/HDD_E0_S17_1092437096p1' name HDD_E0_S17_1092437096p1 force
Fri Sep 02 17:25:31 2016
SQL> alter diskgroup RECO add failgroup HDD_E0_S17_1092437096p2 disk '/dev/mapper/HDD_E0_S17_1092437096p2' name HDD_E0_S17_1092437096p2 force
NOTE: GroupBlock outside rolling migration privileged region
NOTE: Assigning number (2,17) to disk (/dev/mapper/HDD_E0_S17_1092437096p2)
NOTE: requesting all-instance membership refresh for group=2
NOTE: Disk 17 in group 2 is assigned fgnum=18
Errors in file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_1296.trc (incident=121857):
ORA-00600: internal error code, arguments: [kfgCanRepartner04], [16], [65535], [100], [], [], [], [], [], [], [], []
Incident details in: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_121857/+ASM1_ora_1296_i121857.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Fri Sep 02 17:25:32 2016
NOTE: membership refresh pending for group 2/0xe3d92178 (RECO)
.
.

4) The associated trace file (/u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_1296.trc ) reported & detected the affected disk partitions as follows:

Trace file /u01/app/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_1296.trc
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
ORACLE_HOME = /u01/app/12.1.0.2/grid
System name: Linux
Node name: asmcloud07
Release: 2.6.39-400.250.6.el5uek
Version: #1 SMP Tue Jun 23 23:41:58 PDT 2015
Machine: x86_64
Instance name: +ASM1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 0
Unix process pid: 1296, image:

*** 2016-09-02 17:24:29.663
kgfmReadOak: max_disk_count is 100
kgfmReadOak: appliance_name is ODA
kgfmReadOak: diskstring is /dev/mapper/*D_*p*
kgfmReadOak: file_version is 2
kgfmReadOak: oda_version is 2
kgfmReadOak: jbod_count is 1
kgfmReadOak: jbod_slot_count is 24
kgfmReadOak: data_slot_count is 20
kgfmReadOak: reco_slot_count is 20
kgfmReadOak: redo_slot_count is 4
kgfmReadOak: max_missing is 0
kgfmReadOak: min_partners is 2
kgfmReadOak: agent_sql_identifier is /*+ _OAK_AsmCookie
kgfmReadOak: asm_compatibility is 12.1.0.2
kgfmReadOak: rdbms_compatibility is 11.2.0.2
kgfmReadOak: _asm_hbeatiowait is 100
NOTE: GroupBlock outside rolling migration privileged region
.
.
.
-09-02 17:25:18.651682 : kfdDiscoverPostCb: Not found ODA disk=16(DATA_0016) slot=65535 path=/dev/mapper/HDD_E0_S16_2522793145p1 group=1 in config file
.
.
.
disk: DATA_0016 num: 16/140737109414364 grp: 1/140737016045943 compat: 12.1.0.2.0 dbcompat: 11.2.0.2.0
fg: DATA_0016 path: /dev/mapper/HDD_E0_S16_2522793145p1
mnt: C hdr: M mode: v v(rw) p(rw) a(x) d(x) sta: N flg: 1011 lflg: 4
totau: 122880 ddeau: 122880 cmdau: 0
slot 65535 ddeslot 16 numslots 0 dtype 0 enc 0 part 0 flags 0
kfts: 2016/08/25 01:44:12.880000
kfts: 2016/08/25 01:44:12.903000
pcnt: 4 (2 3 19 18) ()
apcnt: 4 (2 3 19 18) ()
kfkid: 0x95911d30, label: , status: IDENTIFIED
path: /dev/mapper/HDD_E0_S16_2522793145p1
fob: (KSFD)0x9917d2e0, magic: bebe ausize: 4194304
kfdds: dn=16 inc=0xe969d1dc dsk=0x967a7cc0 usrp=0x7f7cbb71b7a8
kfdds: gn=1 inc=0xe3d92177 magic=0xcc56 rel=0 pendio=0 trace=0
kfkds 0x7f7cba807ce8, kfkid 0x95911d30, magic abbe, libnum 0, bpau 8192, fob 0x9918c520
Incident 121856 created, dump file: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_121856/+ASM1_ora_1296_i121856.trc
ORA-00600: internal error code, arguments: [kfgCanRepartner04], [16], [65535], [100], [], [], [], [], [], [], [], []

2016-09-02 17:25:25.900938 : kfgbSendWithPin kfgbmt=3
.
.
.
2016-09-02 17:25:31.341697 : kfdDiscoverPostCb: Not found ODA disk=16(RECO_0016) slot=65535 path=/dev/mapper/HDD_E0_S16_2522793145p2 group=2 in config file
.
.
.
.
disk: RECO_0016 num: 16/140737109414384 grp: 2/140737016045944 compat: 12.1.0.2.0 dbcompat: 11.2.0.2.0
fg: RECO_0016 path: /dev/mapper/HDD_E0_S16_2522793145p2
mnt: C hdr: M mode: v v(rw) p(rw) a(x) d(x) sta: N flg: 1019 lflg: 4
totau: 18770 ddeau: 18770 cmdau: 0
slot 65535 ddeslot 16 numslots 0 dtype 0 enc 0 part 0 flags 0
kfts: 2016/08/25 07:14:19.434000
kfts: 2016/08/25 07:14:19.452000
pcnt: 4 (6 7 10 11) ()
apcnt: 4 (6 7 10 11) ()
kfkid: 0x95911860, label: , status: IDENTIFIED
path: /dev/mapper/HDD_E0_S16_2522793145p2
fob: (KSFD)0x9917d1a0, magic: bebe ausize: 4194304
kfdds: dn=16 inc=0xe969d1f0 dsk=0x967a7898 usrp=0x7f7cbb062b10
kfdds: gn=2 inc=0xe3d92178 magic=0xcc56 rel=0 pendio=0 trace=0
kfkds 0x7f7cba807d98, kfkid 0x95911860, magic abbe, libnum 0, bpau 8192, fob 0x9918c3e0
Incident 121857 created, dump file: /u01/app/grid/diag/asm/+asm/+ASM1/incident/incdir_121857/+ASM1_ora_1296_i121857.trc
ORA-00600: internal error code, arguments: [kfgCanRepartner04], [16], [65535], [100], [], [], [], [], [], [], [], []

2016-09-02 17:25:32.150783 : kfgbSendWithPin kfgbmt=3

5) This problem is due to ASM is not finding and detecting the following disks (because they are not defined in the /opt/oracle/extapi/asmappl.config configuration file), thus the ORA-00600: [kfgCanRepartner04] error is generated:

path=/dev/mapper/HDD_E0_S16_2522793145p1
path=/dev/mapper/HDD_E0_S16_2522793145p2

6) The /opt/oracle/extapi/asmappl.config configuration file confirms that the correct disk names are not present:

Node #1: /opt/oracle/extapi/asmappl.config (incorrect values):

disk /dev/mapper/HDD_E0_S16_1092445448p1 0 16 1
disk /dev/mapper/HDD_E0_S16_1092445448p2 0 16 2

Node #2: /opt/oracle/extapi/asmappl.config (incorrect values):

disk /dev/mapper/HDD_E0_S16_1092445448p1 0 16 1
disk /dev/mapper/HDD_E0_S16_1092445448p2 0 16 2

7) The correct names should be as follows:

Node #1:

# !ls -ltr /dev/mapper/HDD_E0_S16*
brw-rw---- 1 grid asmadmin 252, 26 Aug 31 18:02 /dev/mapper/HDD_E0_S16_2522793145
brw-rw---- 1 grid asmadmin 252, 32 Aug 31 18:04 /dev/mapper/HDD_E0_S16_2522793145p2
brw-rw---- 1 grid asmadmin 252, 29 Aug 31 18:15 /dev/mapper/HDD_E0_S16_2522793145p1

Correct values:

disk /dev/mapper/HDD_E0_S16_2522793145p1 0 16 1
disk /dev/mapper/HDD_E0_S16_2522793145p2 0 16 2

Node #2:

Correct values:

disk /dev/mapper/HDD_E0_S16_2522793145p1 0 16 1
disk /dev/mapper/HDD_E0_S16_2522793145p2 0 16 2

Solution

ASM will not allow to add new disks until the inconsistencies with the exiting disk members (e.g. /dev/mapper/HDD_E0_S16_#p#) in the /opt/oracle/extapi/asmappl.config file are fixed as follows:

1) Please update the /opt/oracle/extapi/asmappl.config file as follows (on both nodes) as root OS user:

Replace the next lines:

disk /dev/mapper/HDD_E0_S16_1092445448p1 0 16 1
disk /dev/mapper/HDD_E0_S16_1092445448p2 0 16 2

With the next lines:

disk /dev/mapper/HDD_E0_S16_2522793145p1 0 16 1
disk /dev/mapper/HDD_E0_S16_2522793145p2 0 16 2

2) Then, the disks were successfully added back to the RECO and DATA diskgroup respectively:

SQL> alter diskgroup DATA add failgroup HDD_E0_S17_1092437096p1 disk '/dev/mapper/HDD_E0_S17_1092437096p1' name HDD_E0_S17_1092437096p1 FORCE rebalance power 32;

Diskgroup altered.

SQL> alter diskgroup RECO add failgroup HDD_E0_S17_1092437096p2 disk '/dev/mapper/HDD_E0_S17_1092437096p2' name HDD_E0_S17_1092437096p2 FORCE rebalance power 32;

Diskgroup altered.

SQL> select * from v$asm_operation;

GROUP_NUMBER OPERA PASS STAT POWER ACTUAL SOFAR EST_WORK
------------ ----- --------- ---- ---------- ---------- ---------- ----------
EST_RATE EST_MINUTES ERROR_CODE CON_ID
---------- ----------- -------------------------------------------- ----------
1 REBAL RESYNC DONE 32 32 0 0
0 0 0

1 REBAL REBALANCE RUN 32 32 1120 115688
4779 23 0

1 REBAL COMPACT WAIT 32 32 0 0
0 0 0

Other example:

If the next error is reported:
======================================================
ORA-15037: disk 'o/192.168.10.3/RECO_CD_08_pexa01cel01' is smaller than mimimum of 16 MBs
======================================================

In order to obtain accurate diagnostic, please perform the next action plan:

1) Connected as grid OS user to the first Exadata compute node (+ASM).

2) All the logs in the /u01/app/grid/diag/asm/+asm/+ASM1/trace/ directory needs to be moved to another location.

3) Then, reproduce the problem as follows:

a) Connect to the +ASM1 instance as sysasm and execute the next steps:
======================================================

SQL> alter system set events'trace[KGF] disk highest';

SQL> alter system set events'trace[KFD] disk highest';

SQL> alter system set events'trace[ASM] disk highest';

SQL> alter system set events '15037 trace name errorstack level 3; name systemstate level 267';

SQL> alter diskgroup RECO_PEXA01 add failgroup pexa01cel01 disk
'o/192.168.10.3/RECO_CD_09_pexa01cel01' SIZE 197632 ,
'o/192.168.10.3/RECO_CD_10_pexa01cel01' SIZE 197632 ,
'o/192.168.10.3/RECO_CD_05_pexa01cel01' SIZE 197632 ,
'o/192.168.10.3/RECO_CD_07_pexa01cel01' SIZE 197632 ,
'o/192.168.10.3/RECO_CD_01_pexa01cel01' SIZE 197632 ,
'o/192.168.10.3/RECO_CD_06_pexa01cel01' SIZE 197632 ,
'o/192.168.10.3/RECO_CD_11_pexa01cel01' SIZE 197632 ,
'o/192.168.10.3/RECO_CD_00_pexa01cel01' SIZE 197632 ,
'o/192.168.10.3/RECO_CD_02_pexa01cel01' SIZE 197632 ,
'o/192.168.10.3/RECO_CD_03_pexa01cel01' SIZE 197632 ,
'o/192.168.10.3/RECO_CD_04_pexa01cel01' SIZE 197632 ,
'o/192.168.10.3/RECO_CD_08_pexa01cel01' SIZE 197632 rebalance power 32;

SQL> alter system set events'trace[KGF] off';

SQL> alter system set events'trace[KFD] off';

SQL> alter system set events'trace[ASM] off';

SQL> alter system set events '15037 trace name errorstack off';
======================================================

4) Then please provide entire /u01/app/grid/diag/asm/+asm/+ASM1/trace/ directory (asm1-trace.tar.Z):
======================================================
# tar -cvf asm1-trace.tar /u01/app/grid/diag/asm/+asm/+ASM1/trace/
======================================================
# compress asm1-trace.tar
======================================================

Community Discussions ODA

Still have questions? Use the communities window below to search for similar discussions or start a new discussion on this subject. (Window is the live community not a screenshot)

Click here to open in main browser window

Attachments

This solution has no attachment