SuperCluster: 12.1.0.2 ASM XDMG process causes hang in PR_P_LOCK on ORA-15311: process terminated due to fenced I/O

Asset ID:	1-72-2166445.1
Update Date:	2016-07-28
Keywords:

Solution Type Problem Resolution Sure

Solution 2166445.1 : SuperCluster: 12.1.0.2 ASM XDMG process causes hang in PR_P_LOCK on ORA-15311: process terminated due to fenced I/O

Applies to:

Oracle SuperCluster M6-32 Hardware - Version All Versions and later
Solaris SPARC Operating System - Version 11.1 to 11.2 [Release 11.0]
Oracle SuperCluster T5-8 Full Rack - Version All Versions and later
SPARC SuperCluster T4-4 - Version All Versions and later
Oracle SuperCluster M7 Hardware - Version All Versions and later
Oracle Solaris on SPARC (64-bit)
Zone Stuck in shutting down due to XDMG exiting wrong and holding lock or possible global zone (LDom hang)

Symptoms

Solaris zones and or Logical Domains can hang . A core file obtained will show

==== user (LWP_SYS) thread: 0xc4010c3c1700 PID: 29194 ====
cmd: asm_xdmg_+ASM6
fmri: lrc:/etc/rc3_d/S96ohasd
t_wchan: 0x2a10b7bf5d8 sobj: condition var (from
genunix:vmtask_run_xjob+0x1f0)
t_procp: 0xc4013eb1e010
p_as: 0xc400be67f2f0 size: 351559680 RSS: 326246400
a_hat: 0xc400be265000
cnum: CPU48:4/46021
cpusran:
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,2
9,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,5
5,56,57,58,59,60,61,62,63
p_zone: 0x106affd8 (global)
t_stk: 0x2a10b7bfad0 sp: 0x2a10b7bec51 t_stkbase: 0x2a10b7b8000
t_pri: 60 (TS) pctcpu: 0.000000
t_transience: 0 t_wkld_flags: 0
t_lwp: 0xc4010466b2a0 t_tid: 1
machpcb: 0x2a10b7bfad0
lwp_ap: 0x2a10b7bfbc0
t_mstate: LMS_SLEEP ms_prev: LMS_SYSTEM
ms_state_start: 16 hours 8 minutes 30.525219929 seconds earlier
ms_start: 23 hours 23 minutes 6.660387809 seconds earlier
t_cpupart: 0x1041fa20(0) last CPU: 62
idle: 58110525212482 nsec (16h8m30.525212482s)
start: Sat Mar 5 11:43:25 2016
age: 84191 seconds (23 hours 23 minutes 11 seconds)
t_state: TS_SLEEP
t_flag: 0x1000 (T_LWPREUSE)
t_proc_flag: 0x104 (TP_TWAIT|TP_MSACCT)
t_schedflag: 0x8003 (TS_LOAD|TS_DONT_SWAP|TS_WKLD_PERM)
t_acflag: 3 (TA_NO_PROCESS_LOCK|TA_BATCH_TICKS)
p_flag: 0x4a004000 (SEXECED|SMSACCT|SAUTOLPG|SMSFORK)

pc: genunix:cv_wait+0x3c: call unix:swtch

void genunix:cv_wait+0x3c((kcondvar_t *), (kmutex_t *)0x2a10b7bf5d0)
int genunix:vmtask_run_xjob+0x1f0((ulong_t), (ulong_t), (vmtask_func_t),
(void *), (vmtask_undo_t *), (vmtask_ops_t *))
genunix:vmtask_run_job((ulong_t), (ulong_t)1, (vmtask_func_t)0x129c608, (void
*)0x2a10b7bf6c8, (vmtask_undo_t *)0) - frame recycled
void genunix:anon_free_pages+0x44((struct anon_hdr *)0xc4010bf4a6f0,
(ulong_t)0, (size_t)0x70000, (uint_t)1)
void genunix:segvn_free+0x90((struct seg *)0xc40134bd1b38)
void genunix:seg_free+0x28((struct seg *)0xc40134bd1b38)
int genunix:segvn_unmap+0x334((struct seg *)0xc40134bd1b38,
(caddr_t)0xffffffff7cb70000, (size_t)0x70000)
int genunix:as_unmap+0x1e4((struct as *)0xc400be67f2f0,
(caddr_t)0xffffffff7cb70000, (size_t)0x80000)
int genunix:munmap+0x50((caddr_t), (size_t)0x80000)
unix:_syscall_no_proc_exit+0x58()
-- switch to user thread's user stack --

ASM XDMG Trace files will show a termination due to fencing

Reconnect: Attempts: 1 Last TS: 23486775047 Last Use TS: 22936303758 ctime:
22936303759 is_idle: 0 has_open_disks: Yes
2016-03-05 20:50:06.925218 : Exadata operations failed at oss_open failed with err: I/O request fenced (221)
error 15311 detected in background process
OPIRIP: Uncaught error 447. Error stack:
ORA-447: fatal error in background process
ORA-15311: process terminated due to fenced I/O

Changes

None but impacts only 12.1.0.2

Cause

Bug 22882992 : PR_P_LOCK BEING HELD BY ASM_XDMG_+ASM1

Solution

Obtain a backport for Bug 22882992 : PR_P_LOCK BEING HELD BY ASM_XDMG_+ASM1. This is not fixed in any PSU yet. This patch should be considered mandatory.

The following dtrace can help identify the cause as well. But given that the patch is mandatory should not be necessary

1) create dtrace script as
dtrace.sh:
#!/usr/sbin/dtrace -s
syscall::munmap:entry
/pid == $target/
{
ustack(32,1024);
}
chmod +x dtrace.sh
2)
Find xdmg pid after ASM instance startup and run the script
./dtrace.sh -p <xdmg pid> > dtrace.out &

References

<NOTE:1452277.1> - SuperCluster Critical Issues

Attachments

This solution has no attachment