Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2166445.1
Update Date:2016-07-28
Keywords:

Solution Type  Problem Resolution Sure

Solution  2166445.1 :   SuperCluster: 12.1.0.2 ASM XDMG process causes hang in PR_P_LOCK on ORA-15311: process terminated due to fenced I/O  


Related Items
  • Solaris Operating System
  •  
  • Oracle SuperCluster T5-8 Full Rack
  •  
  • SPARC SuperCluster T4-4 Full Rack
  •  
  • Oracle SuperCluster M7 Hardware
  •  
  • Oracle Database - Enterprise Edition
  •  
  • Oracle SuperCluster T5-8 Half Rack
  •  
  • SPARC SuperCluster T4-4 Half Rack
  •  
  • Oracle SuperCluster M6-32 Hardware
  •  
  • SPARC SuperCluster T4-4
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>SPARC SuperCluster>DB: SuperCluster_EST
  •  
  • _Old GCS Categories>ST>Server>Engineered Systems>SPARC SuperCluster>Install
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Applies to:

Oracle SuperCluster M6-32 Hardware - Version All Versions and later
Solaris SPARC Operating System - Version 11.1 to 11.2 [Release 11.0]
Oracle SuperCluster T5-8 Full Rack - Version All Versions and later
SPARC SuperCluster T4-4 - Version All Versions and later
Oracle SuperCluster M7 Hardware - Version All Versions and later
Oracle Solaris on SPARC (64-bit)
Zone Stuck in shutting down due to XDMG exiting wrong and holding lock or possible global zone (LDom hang)

Symptoms

Solaris zones and or Logical Domains can hang . A core file obtained will show

==== user (LWP_SYS) thread: 0xc4010c3c1700 PID: 29194 ====
 cmd: asm_xdmg_+ASM6
 fmri: lrc:/etc/rc3_d/S96ohasd
 t_wchan: 0x2a10b7bf5d8 sobj: condition var (from
 genunix:vmtask_run_xjob+0x1f0)
 t_procp: 0xc4013eb1e010
 p_as: 0xc400be67f2f0 size: 351559680 RSS: 326246400
 a_hat: 0xc400be265000
 cnum: CPU48:4/46021
 cpusran:
 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,2
 9,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,5
 5,56,57,58,59,60,61,62,63
 p_zone: 0x106affd8 (global)
 t_stk: 0x2a10b7bfad0 sp: 0x2a10b7bec51 t_stkbase: 0x2a10b7b8000
 t_pri: 60 (TS) pctcpu: 0.000000
 t_transience: 0 t_wkld_flags: 0
 t_lwp: 0xc4010466b2a0 t_tid: 1
 machpcb: 0x2a10b7bfad0
 lwp_ap: 0x2a10b7bfbc0
 t_mstate: LMS_SLEEP ms_prev: LMS_SYSTEM
 ms_state_start: 16 hours 8 minutes 30.525219929 seconds earlier
 ms_start: 23 hours 23 minutes 6.660387809 seconds earlier
 t_cpupart: 0x1041fa20(0) last CPU: 62
 idle: 58110525212482 nsec (16h8m30.525212482s)
 start: Sat Mar 5 11:43:25 2016
 age: 84191 seconds (23 hours 23 minutes 11 seconds)
 t_state: TS_SLEEP
 t_flag: 0x1000 (T_LWPREUSE)
 t_proc_flag: 0x104 (TP_TWAIT|TP_MSACCT)
 t_schedflag: 0x8003 (TS_LOAD|TS_DONT_SWAP|TS_WKLD_PERM)
 t_acflag: 3 (TA_NO_PROCESS_LOCK|TA_BATCH_TICKS)
 p_flag: 0x4a004000 (SEXECED|SMSACCT|SAUTOLPG|SMSFORK)

 pc: genunix:cv_wait+0x3c: call unix:swtch

 void genunix:cv_wait+0x3c((kcondvar_t *), (kmutex_t *)0x2a10b7bf5d0)
 int genunix:vmtask_run_xjob+0x1f0((ulong_t), (ulong_t), (vmtask_func_t),
 (void *), (vmtask_undo_t *), (vmtask_ops_t *))
 genunix:vmtask_run_job((ulong_t), (ulong_t)1, (vmtask_func_t)0x129c608, (void
 *)0x2a10b7bf6c8, (vmtask_undo_t *)0) - frame recycled
 void genunix:anon_free_pages+0x44((struct anon_hdr *)0xc4010bf4a6f0,
 (ulong_t)0, (size_t)0x70000, (uint_t)1)
 void genunix:segvn_free+0x90((struct seg *)0xc40134bd1b38)
 void genunix:seg_free+0x28((struct seg *)0xc40134bd1b38)
 int genunix:segvn_unmap+0x334((struct seg *)0xc40134bd1b38,
 (caddr_t)0xffffffff7cb70000, (size_t)0x70000)
 int genunix:as_unmap+0x1e4((struct as *)0xc400be67f2f0,
 (caddr_t)0xffffffff7cb70000, (size_t)0x80000)
 int genunix:munmap+0x50((caddr_t), (size_t)0x80000)
 unix:_syscall_no_proc_exit+0x58()
 -- switch to user thread's user stack --

 

ASM XDMG Trace files will show a termination due to fencing

 

Reconnect: Attempts: 1 Last TS: 23486775047 Last Use TS: 22936303758 ctime:
22936303759 is_idle: 0 has_open_disks: Yes
2016-03-05 20:50:06.925218 : Exadata operations failed at oss_open failed  with err: I/O request fenced (221)
error 15311 detected in background process
OPIRIP: Uncaught error 447. Error stack:
ORA-447: fatal error in background process
ORA-15311: process terminated due to fenced I/O

Changes

None but impacts only 12.1.0.2

Cause

Bug 22882992 : PR_P_LOCK BEING HELD BY ASM_XDMG_+ASM1
 

Solution

Obtain a backport for Bug 22882992 : PR_P_LOCK BEING HELD BY ASM_XDMG_+ASM1. This is not fixed in any PSU yet. This patch should be considered mandatory.

 

The following dtrace can help identify the cause as well. But given that the patch is mandatory should not be necessary

1) create dtrace script as
 dtrace.sh:
 #!/usr/sbin/dtrace -s
 syscall::munmap:entry
 /pid == $target/
 {
 ustack(32,1024);
 }
 chmod +x dtrace.sh
2)
 Find xdmg pid after ASM instance startup and run the script
 ./dtrace.sh -p <xdmg pid> > dtrace.out &

 

 

References

<NOTE:1452277.1> - SuperCluster Critical Issues

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback