Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2120400.1
Update Date:2018-02-15
Keywords:

Solution Type  Problem Resolution Sure

Solution  2120400.1 :   Flash Disk Group and Cache Performance Problems in a Clustered Configuration Including ODA X5-2; Symptoms: Very High DBWR CPU, DB FLASH cache waits (User I/O) ; Buffer Cache / Busy Waits (Concurrency) ; enq: TX-row lock contention (Concurrency)  


Related Items
  • Oracle Database - Enterprise Edition
  •  
  • Oracle Database Appliance X5-2
  •  
  • Oracle Database - Enterprise Edition
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Exadata>DB: Exadata_EST
  •  


Storing data on Flash Disks or using FLASH Cache in a clustered configuration can experience poor performance up to and including Locks or Hangs such as "library cache load lock", "write complete waits" and "latch: cache buffers lru chain during operations that use FLASH. The Oracle Database Appliance X5-2 which is a 2-node cluster configuration by default enables remote access to the FLASH disk group which can be used for storage or flash_cache functionality.
A workaround that may confirm you are hitting this issue is to disable FLASH using  ALTER SYSTEM set db_flash_cache_size = 0   (see documentation for usage). 
While this issue is normally seen on ODA implementations,this can occur on any platform that accepts and uses  Flash Cache

Applies to:

Oracle Database Appliance X5-2 - Version All Versions to All Versions [Release All Releases]
Oracle Database - Enterprise Edition - Version 11.2.0.1 to 12.2.0.1 [Release 11.2 to 12.2]
Information in this document applies to any platform.
ODA, performance problems, CPU 100%, Hang, FLASH,ODI TRUNCATE,LOCK,RAC,DBWR,SPIN,CHKPT, Lock, outage

Symptoms

This note discusses two independent forms of FLASH usage:

  • Flash Cache which is a database feature available in 12c. In-memory database functionality uses flash cache.
  • FLASH DISK GROUP: The ODA X5-2 introduced a hybrid HDD and SSD topology>
       T
    he FLASH Diskgroup on the ODA X5-2 is built on NVMe shared drives(4) which are accessible on the remote as well as the local node.

 

 

Symptoms 

This note will discuss a collection of symptoms that have a common source:
     Database Flash Cache enabled on two or more nodes.
The FLASH / buffer cache symptoms discussed share one factor:                     Disabling FLASH CACHE will workaround the problem.
Storing data in the FLASH DISK GROUP in the ODA X5-2 can encounter similar symptoms even if FLASH CACHE is not explicitly enabled.

Using Database Flash Cache does not mean you will hit this problem with your usage.
There are potentially two or more bugs which have similar symptoms but unique fixes, but all share the pre-requisites of:

  • Flash Enabled or storage on shared FLASH disks
  • Two or more nodes ¹

This can occur on any platform that uses Flash Cache on more than the local node,
When using the DB Smart Flash Cache feature in ODA X5-2 the remote FLASH CACHE is automatically accessible to the opposing node
Otherwise, this problem typically requires a RAC configuration with database Flash cache enabled and accessible for two or more nodes.

 

(Doc ID 2120400.1). Formerly Labeled     -- ODA X5-2: Lock, Hang for Checkpoint (CKPT) or DBwriter (DBWR), 'GC CURRENT REQUEST' or Intermittent Bad Performance, High CPU and / or Very Poor IO 

* The Oracle Database Appliance (ODA) X5-2 by default automatically enables remote access to the FLASH DISK GROUP* on both nodes if cache size <> 0:
    regardless of the instance type: even if using Enterprise Edition (EE) which is a single-instance database.

   Please upgrade to the newest version of the ODA if using FLASH on X5-2: There are also several single-patches and merges available or in the process of being created.

 
Note:     Oracle Database Appliance X5-2 introduces four additional 400 GB SSDs in slot numbers 16-19 that can be used to host database files, or they can be used as a database flash cache in addition to the buffer cache.

An ASM diskgroup named +FLASH with Normal Redundancy is provisioned on these SSDs. All of the storage in the +FLASH diskgroup is allocated to an ASM Dynamic Volume (flashdata), and formatted as an ACFS file system.
Storage in this flashdata file system is then made available as an ACFS file system and is used to create database flash cache files that accelerate read operations.

     The file that contains the flash cache is automatically created for each database and is specified using the database init.ora parameter db_flash_cache_file.
By default, flash_cache_file_size is set to 3 times the size of SGA, up to 196 GB, unless there is not enough space, in which case the size parameter is set to 0.
Changing the flash_cache_file_size parameter requires restarting the database in order to use the newly sized flash cache.

Rediscovery

RAC or Clustered* configuration experiences very poor performance², up to and including complete hangs;


Pre-requisite

db_flash_cache_size <> 0
X5-2 ODA  or RAC / Cluster configuration
Database flash cached enabled on two or more nodes
or remote Storage enabled in the Flash Diskgroup



Avoids symptoms


 Disable db flash cache

SYMPTOMS

High CPU 

  • DBWR 90% up to 100%
  • DBWR for parallel slave processes
  • CKPT sessions WAITING between nodes with DBWR process
  • Very high CPU Elapsed time for SQL

Flash metrics confirm active usage during slow performance time windows:
NOTE - The existence of these metrics does not indicate a problem in progress:
          These AWR metrics will confirm Flash Cache usage:

  • flash cache inserts       
  • flash cache eviction: invalidated
  • flash cache insert skip: DBWR overloaded ***
  • flash cache insert skip: not current
  • flash cache insert skip: not useful

    Also - the following can grow substantially during the problem period
  • free buffer inspected ***

     There  may be indicators of the problem but DBWR is usually an easier metric to monitor at this point

RAC Node instability after enabling FLASH on both nodes and restarting the instances.
- Including general node instability with very high IO times

Other Waits

  • Cache buffers lru chain" and "cache buffers chains"
  • GC Buffer; GC QUIESCE' or 'GC CURRENT REQUEST' waits
  • 'ENQ: RO - FAST OBJECT REUSE
  •  ENQ: TM - CONTENTION
  • "library cache load lock"
  • "write complete waits"
  • "latch: cache buffers lru chain

 ¹ Note that the ODA X5-2 can implicitly use both nodes of the FLASH for Single Instance database as well as for RAC[one].
 ² A similar symptom of RAC slower performance may be seen in RAC which is not related to flash cache problems discussed in this note:

    - RAC PERF: CR TEST FOREGROUNDS NOW USE INDIRECT SENDS CAUSING 25% LESS THROUGHPUT (bug 22894949)

22894949 has fixes / PSEs for multiple platforms and versions


 

Changes

Pre-Requisites

  • Using a Clustered or RAC configuration *
  • Flash is enabled on both nodes
  • RDBMS 11.2.0.3.x , 11.2.0.4.x, and higher including 12.1.2.x.**

This problem is most frequently seen when moving from a non-RAC configuration to the ODA X5-2
                         This problem is not seen on V1, X3-2 or X4-2 ODAs due to no FLASH Cache usage

* Note that the ODA can implicitly use both nodes of the FLASH for RAC or RACone.
** Not confirmed or tested against 10.x

NOTE:   While this document was originally written for the ODA X5-2, the bug is generic to multiple nodes + FLASH usage on RDBMS 11.2.0.3.x and higher including 12.x
Several RDBMS fixes now exist for this problem and others are in progress.
We recommend applying this single-patches on your 12.x RDBMS versions for now and will include this in the next ODA Patch Bundle.

 

Cause

 Internal Bugs for the most common issues (May not be comprehensive or up to date):

 20048359 - PARALLEL QUERY CONSUMES CPU TIME ON RAC WITH DATABASE SMART FLASH CACHE
 22083366 - LNX64-112-CMT: CKPT BLOCKING SESSION DURING INDEX REBUILD USING FLASH CACHE

 21422580*   - DBW USING 100% CPU WITH ENQ: RO - FAST OBJECT REUSE DURING TRUNCATE OR DROP TABLE
 21794615** - LNX64-122-RAC-CDB:LMS HUNG ON "LATCH: GC ELEMENT',LMD ORA-600[KJMSCNDSCQ:TIMEOUT

Solution

Apply the most current Patch Bundle:

Then, as appropriate: Apply merge or single-patch bug fixes for the following known issues:
-
Fixed on or by 12.1.2.9.0 currently released 12/7/2016

 20048359 - PARALLEL QUERY CONSUMES CPU TIME ON RAC WITH DATABASE SMART FLASH CACHE
 22083366 - LNX64-112-CMT: CKPT BLOCKING SESSION DURING INDEX REBUILD USING FLASH CACHE

Patch Set Exceptions (PSEs) are available for he following versions
    *21422580   - DBW USING 100% CPU WITH ENQ: RO - FAST OBJECT REUSE DURING TRUNCATE OR DROP TABLE
    *12.1.2.9.0     does not include fix for bug:21422580

 
** 21794615 - LNX64-122-RAC-CDB:LMS HUNG ON "LATCH: GC ELEMENT',LMD ORA-600[KJMSCNDSCQ:TIMEOUT
     fixed in 12.1.2.10 - Scheduled for Early 2017 includes patchset exceptions ( one-offs for the following the following RDBMS versions)

--

The fix is applied as a RDBMS Bug fix: Most current RDBMS versions will have fixes for the above bugs.
Merges either exist or are in progress for the above bugs also.
If you believe you may be encountering this issue try the workaround first to help confirm identification, then open a SR requesting patches / or information on fixed versions.

Workaround: Disable Flash on one or both nodes

      ALTER SYSTEM set db_flash_cache_size = 0   (see documentation for usage)

DB_FLASH_CACHE_SIZE specifies the size of the Database Smart Flash Cache (flash cache).
This parameter may only be specified at instance startup.
You can dynamically change this parameter to 0 (disabling the flash cache) after the database is started.
You can re-enable flash cache by setting this parameter to the original value when the database was started.

Please refer to the docs.oracle.com for latest usage     http://docs.oracle.com/database/121/REFRN/GUID-9B2E54F0-F2C0-4C3A-88B7-C2F64F32376F.htm#REFRN10316

* Note that the ODA can implicitly use both nodes of the FLASH for RAC or RACone or EE.

Potential Workaround:

Conditions which can workaround some Flash Cache problems include altering the Flash Cache mode in a cluster configuration.

The hidden parameter _GCS_CLUSTER_FLASH_CACHE_MODE can change how database Flash Cache is handled on and between Nodes in a cluster configuration

Here are the allowable setting of _gcs_cluster_flash_cache_mode:

0 - disable cluster flash cache
1 - enable cluster flash cache - master instance will allow non-master instance to read its flash cache buffers. Non-master instance will not use flash cache to cache local buffers.
3 - allow both master and non-master instances to use flash cache. Master flash cache can be shared between master and non-master instance. Non-master flash cache can only be access by local instance not remote (or master instance).


  _gcs_cluster_flash_cache_mode=3     -- regular flash cache should avoid several of the problem scenarios

  _gcs_cluster_flash_cache_mode=0     -- is another second option. After testing with level 3 we also recommend testing level 0

 

The above potential workaround was added by request for bug 26197067 - but not tested
- To confirm if this is another workaround based on feedback

RELATED Bugs and Symptoms
===========


18628484 - TST&PERF:HIGH 'ENQ:RO - FAST OBJECT REUSE' WAIT TIME WHEN TRUNCATING TABLE ON RAC
Symptoms:
  • high 'ENQ: RO - FAST OBJECT REUSE' ;
  • blocking session is the process CKPT ; 
  • DBWR is the top process on one instance 

    "...We have seen two issues. One is DBWR not picking up the write permission granted to start the write for a while.  This issue is looking at by buffer cache.
      From DBW trace, PI buffers not getting written
     N/A -"...There is another issue from long latency with IPC. This is running RDS/IP on exadata. We have seen long delay occasionally for IPC messages from 4 seconds or more between two nodes.his is not DLM issue, but possible RDS driver or environment issue [Exadata] ..."

    Per checking v$session/v$session_blockers, we found this event occurs while truncate tables on instance 1, the
    blocking session is the process CKPT on instance 2, and the  process DBWR on instance 2 is the top one.

    Diagnostic:
    Do CKPT and DBWR trace using the parameter _obj_ckpt_tracing = 1 and _dbwr_tracing=26.
    The result is  *1_* is the trace of instance 1 and *2_* is trace of another node.
19347458 - SESSIONS WAITING INDEFINITELY FOR 'GC CURRENT REQUEST'
Symptoms:
  • gc buffer busy acquire waiting on two sessions between two nodes
  • Hit cache fusion lock hang when flash cache is enabled.
  • Session hang as the request was dropped at master due to rapidly wrapped lock sequence.
  • not consistently reproducible, In previous tests, it took up to 10 hours with the workload running continuously for the sessions to hang.

SQL> select inst_id, sid, event, blocking_instance, blocking_session, seconds_in_wait, p1,p2,p3
        from gv$session
         where seconds_in_wait > 0
         and event like 'gc %'

See both nodes GC waits

Requires
cluster_database=true
Flash enabled

19565714 - DROPPING A TABLE STORED IN FLASH CACHE TAKES TOO LONG, WITH DBWR AT 100% CPU -- Related to bug 22083366 --- included in DBPSU 12.1.0.2.161018 and DBBP 12.1.0.2.161018
Symptoms:

DIAGNOSTIC ANALYSIS:
--------------------
DBW processes were near 100% CPU:

PID   USER PR NI VIRT RES SHR S   %CPU  %MEM  TIME+ COMMAND

4981 oracle 20 0  18.4g 33m 21m R  100.0   0.0  172:09.84 ora_dbw1_tst1
4998 oracle 20 0  18.4g 34m 23m R  100.0   0.0  172:08.60 ora_dbw4_tst1
4976 oracle 20 0  18.4g 36m 24m R    99.8   0.0  172:15.84 ora_dbw0_tst1
4985 oracle 20 0  18.4g 33m 21m R    99.8   0.0  172:11.11 ora_dbw2_tst1
...
...
Short stacks from DBW processes:

ksedsts()+647<-ksdxfstk()+58<-ksdxcb()+901<-sspuser()+227<-__sighandler()<-kcbo_service_ockpt()+1265<-kcbbdrv()+8841<-ksbabs()+539<-ksbrdp()+1209
<-opirip()+1475<-opidrv()+610<-sou2o()+145<-opimai_real()+270<-ssthrdmain()+412<-main()+236<-__libc_start_main()+244


10046 trace files show DBWR processes making very little progress:

*** 2014-09-02 12:08:16.369
WAIT #0: nam='db file parallel write' ela= 6125 requests=1 interrupt=0
timeout=2147483647 obj#=-1 tim=3158416120020
WAIT #0: nam='db file async I/O submit' ela= 2 requests=1 interrupt=0
timeout=0 obj#=-1 tim=3158416151066
WAIT #0: nam='db file parallel write' ela= 4849 requests=1 interrupt=0
timeout=2147483647 obj#=-1 tim=3158416155959

    "...Tried the same test with different configurations:( and the problem does not reproduce )
     - Single instance, cluster_database=FALSE
     - RAC 2 nodes, workload on node 1, node 2 idle
...
      In both cases, the drop statement completed in less than 30 seconds vs. 26+ hours.
      DBWR processes were at 100% CPU utilization during that time..."

Requires
Cluster configuration
Flash enabled

20048359 - Poor RAC performance and excessive CPU usage using database flash cache ---included in DBPSU 12.1.0.2.5 DBBP 12.1.0.2.12
Symptoms:

free buffer inspected' increases dramatically -- This represents number of examined buffer to reuse but failed to reuse.

CPU time is occupied for 85% of the overall execution time.
CPU time is used by parallel slave processes.

QC. 12 parallel slave processes are running.
These processes got/released the latch "cache buffers lru chain" and "cache buffers chains".

"...DBWR cannot keep up with foregground requests to move SCURRENT buffer to flash cache..."

Requires
Cluster Configuration
Flash enabled


20229502 -WAITS FOR 'GC QUIESCE' AND 'ENQ: TM - CONTENTION' WITH FLASH CACHE
Symptoms:

The "gc quiesce" wait is high with high flash cache activities and occasional high wait on TM enqueue blocked by flash cache wait.

 
Top wait events:

Event                                         Total Wait   Wait  % DB   Wait  
                                   Waits     Time (sec)   Avg(ms) time  Class
------------------------------ ----------- ---------- ---------- ------ -----------

db file sequential read            691,990   4717.9        6.82   55.6  User I/O
enq: TM - contention                    16   2748.1   171758.77   32.4  Application
db flash cache single block      1,526,730    481.6        0.32    5.7  User I/O
DB CPU                                        428.8                5.1
gc quiesce                         19,303     153.1        7.93    1.8  Cluster
gc cr grant 2-way                 284,869        64        0.22     .8  Cluster
gc cr grant cluster flash cach    268,546        45        0.17     .5  Cluster
free buffer waits                     772       5.8        7.55     .1  Configuration
 

  

DIAGNOSTIC ANALYSIS:
--------------------
The waits for 'enq: TM - contention' were caused by long 'gc current request' waits.
waiting for 'gc current request'

Requires
Cluster Configuration:
Flash enabled
 

21137784 - DBW0 SPINS IF IT CAN NOT SHRINK AN LE TO FLASH LOCK ---- Fix pulled in favor of 21794615
Symptoms:
  • DBWR might spin with flash cache in RAC
  • dbw0 spins if it can not shrink an LE to Flash lock.

*** 2015-05-22T13:42:59.198974+17:00
KCL: L83: cannot shrink: not master
GLOBAL CACHE ELEMENT DUMP (address: 0x61f62130):
id1: 0xae8e id2: 0x1 pkey: OBJ#0,0,18 block: (1/44686)
--
*** 2015-05-22T13:43:00.620439+17:00
KCL: L83: cannot shrink: not master
GLOBAL CACHE ELEMENT DUMP (address: 0x61f62130):
id1: 0xae8e id2: 0x1 pkey: OBJ#0,0,18 block: (1/44686)
--
*** 2015-05-22T13:43:02.437729+17:00
KCL: L83: cannot shrink: not master
GLOBAL CACHE ELEMENT DUMP (address: 0x61f62130):
id1: 0xae8e id2: 0x1 pkey: OBJ#0,0,18 block: (1/44686)

Requires
Cluster Configuration
Flash enabled
 

21422580 - DBW USING 100% CPU WITH ENQ: RO - FAST OBJECT REUSE DURING TRUNCATE OR DROP TABLE

Symptoms:
  • Continuously consuming close to 100% DBW background processes.
Linux OSWbb
zzz .....
top -...

PID     USER  PR NI VIRT RES SHR  S %CPU %MEM TIME+ COMMAND

34745 oracle 20 0 2415m 33m 23m R   99.8  0.0 8179:44 ora_dbw0_...  <<<<<<<<<<<
34749 oracle 20 0 2413m 31m 21m R   98.8  0.0 8181:48 ora_dbw1_... <<<<<<<<<<<


Both the dbw processes consuming 100% all the time..
.
##############################################################################
##
Procwatcher Top CPU Consumers Report
##############################################################################
##
Thu Jul 23 08:30:52 CEST 2015: Procwatcher Top CPU Consumers:
PID     USER  PR NI VIRT RES SHR S   %CPU %MEM TIME+ COMMAND
67906 oracle 20 0 2422m 38m 23m R 99.9 0.0 39181:20 ora_dbw0_
67910 oracle 20 0 2421m 37m 21m R 97.9 0.0 42113:02 ora_dbw1_
##############################################################################
##
Thu Jul 23 08:31:31 CEST 2015: Procwatcher Top CPU Consumers:
PID     USER PR NI VIRT RES SHR S   %CPU %MEM TIME+ COMMAND
67906 oracle 20 0 2422m 38m 23m R 100.0 0.0 39181:59 ora_dbw0_
67910 oracle 20 0 2421m 37m 21m R   99.8 0.0 42113:41 ora_dbw1_

The AWR indicate there is hardly any workload in the database.
The physical read and write activities were very low.

.
ksedsts()+244<-ksdxfstk()+58<-ksdxcb()+918<-sspuser()+224<-__sighandler()<-_IO_vfprintf()+3355<-vsnprintf()+154<-00007FFC0F908735
 .
ksedsts()+244<-ksdxfstk()+58<-ksdxcb()+918<-sspuser()+224<-__sighandler()<-lseek()+16<-sdbgrfsf_seek_file()+37<-dbgtfdFileWrite()+2339<-dbgtfdFileAccessCbk(+357 <-dbgtfPutStr()+592 <-dbktPri()+149<-ksdwrf()+551 <-ksdwrfn()+187<-kcbbdrv()+22416<-ksbabs()+874<-ksbrdp()+1068<-opirip()+1488<-opidrv()+616<-sou2o()+145<-opimai_real()+270<-ssthrdmain()+412<-main()+236<-__libc_start_main()+244
 .
ksedsts()+244<-ksdxfstk()+58<-ksdxcb()+918<-sspuser()+224<-__sighandler()<-write()+16<-sdbgrfwf_write_file()+64<-dbgtfdFileWrite()+674<-dbgtfdFileAccessCbk()+357<-dbgtfPutStr()+592<-dbktPri()+149<-ksdwrf()+551
<-ksdwrfn()+187<-kcbbdrv()+12541<-ksbabs()+874<-ksbrdp()+1068<-opirip()+1488<-opidrv()+616<-sou2o()+145<-opimai_real()+270<-ssthrdmain()+412<-main()+236<-__libc_start_main()+244
----------
.
And from the wait chain data, we can see checkpoint process on both the instances blocking sessions from acquiring RO enqueue.
.
Eg:
.
----------
PROC 21181 : Current Process: 21181 SID: 482 SER#: 13637 INST osabc
INST #: 1
PROC 21181 : Blocking Process: 37002 from Instance 1 Number of
waiters: 0
PROC 21181 : Final Blocking Process: 37002 from Instance 1 Program:
oracle@oda1base1 (CKPT) <<<<

ksedsts()+244<-ksdxfstk()+58<-ksdxcb()+918<-sspuser()+224<-__sighandler()<-kcbbcwr()+134<-kcbbdrv()+5078<-ksbabs()+874<-ksbrdp()+1068<-opirip()+1488<-opidrv
()+616<-sou2o()+145<-opimai_real()+270<-ssthrdmain()+412<-main()+236<-__libc_start_main()+244

Requires
Cluster Configuration
Flash enabled
21540885 - DBW HIGH CPU USAGE WITH ENQ: RO - FAST OBJECT REUSE  --- dup of 21422580
Symptoms:
  • dbwrs busy with writes
  • dbwr high cpu (top)

"...From traces, during reported time dbwrs seem busy with writes they don't look stuck or spinning. T
here are many buffers which are not getting written on first request due to fusion write permission issue on PI buffers.
-
  KCBB: going to write
  KCBB: rwrite status 1
  KCBB: write permission pending, requests 6, last successful write 0 seconds ago
..
Also, during the next occurrence could you please ask customer to get output of following queries?

Diagnostic

   1. select ckpt_priority, ckpt_flags, count(*)
      from x$activeckpt
      where ckpt_type=2
      group by ckpt_priority, ckpt_flags;

2. select * from x$kcbwds;

3. select * from x$kcbbes; (every 20 secs. about 15 times)


zzz ***Tue ....

top - 13:36:02 up 161 days, 16:00, 3 users, load average: 4.46, 4.40, 4.36
Tasks: 1056 total, 5 running, 1051 sleeping, 0 stopped, 0 zombie
Cpu(s): 35.9%us, 17.0%sy, 0.0%ni, 46.4%id, 0.7%wa, 0.0%hi, 0.0%si,0.0%st
Mem: 264286692k total, 203135524k used, 61151168k free, 1928424k buffers
Swap: 25165820k total, 11099360k used, 14066460k free, 46073292k cached
.
PID      USER PR NI  VIRT    RES  SHR  S %CPU %MEM TIME+ COMMAND
20470 oracle 20  0  2420m  46m  27m  R   99.7     0.0  4:25.34 ora_dbw1_....

Bug 22848234 has been marked as a potential duplicate of this bug

Requires
Cluster configuration
Flash enabled
21794615 - LNX64-122-RAC-CDB:LMS HUNG ON "LATCH: GC ELEMENT',LMD ORA-600[KJMSCNDSCQ:TIMEOUT

...stress testing an 8-nodes database...

Symptoms:

Instance LMS stuck on 'latch: gc element',

  • LMON might crash the instance during normal shutdown.
  • Instance crashed.
  • lmd0 hit ora-600[kjmscndscq:timeout]
  • 'latch: gc element'
  • 'latch: cache buffers chains'
  • LMS (ospid: ###) has not moved for ## sec
Alert_test3.log
...
  LMS2 (ospid: 8096) waits for event 'latch: gc element' for 89 secs.
  LMS2 (ospid: 8096) waits for           latch 'gc element' for 89 secs.
  LMS7 (ospid: 8116) waits for event 'latch: cache buffers chains' for 243 secs.
  LMS7 (ospid: 8116) waits for latch 'cache buffers chains' for 243 secs.
  LMSA (ospid: 8128) waits for event 'latch: cache buffers chains' for 400 secs.
  LMSA (ospid: 8128) waits for latch 'cache buffers chains' for 400 secs.


test3_lmd0_8765.trc
...
(incident=12345) (PDBNAME=CDB$ROOT):
ORA-00600: internal error code, arguments: [kjmscndscq:timeout], [480], [240], [0], [0], [4], [8224], [], [], [], [], []

Incident details in:
/ora12/app/gridbase/diag/rdbms/test/test3/incident/incdir_5631685/test3_lmd0_......trc
*** 20...:10:04.267967+17:00 (CDB$ROOT(1))

==============================
     LMS2 (ospid: 8096) has not moved for 90 sec (1441188604.1441188514)
     : heartbeat check status 6 (no-heartbeat) (threshold 70 sec)


DBW1 attempted to close a fusion lock with latch held.

Process DBW1 (ospid: ####): latch state and session wait history

Waiting For:
   0x6802ede10 Child 'gc element' (level=3, SGA latch, child#=19)
    Holder Location: kcl2.h LINE:4755 ID:kcl_le2flash:
Holder Context: 0
Waiter Location: kcl2.h LINE:4602 ID:kclantilock_1:
Latch State:
state=busy (exclusive) [value=0x100000000000002e]


STACK TRACE:
------------
Short stack dump:

ksedsts()+409<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+215<-__sighandler()<-semop()+7<-skgpwwait()+200<-kslgess()+2789<-
ksl_get_shared_latch_int()+3318<-kclantilock()+501<-kjblprmexp()+994<-kjbmprmexp()+1668<-kjmxmpm()+936<-kjmpbmsg()+6259<-
kjmsm()+26026<-ksbrdp()+1065<-opirip()+545<-opidrv()+594<-sou2o()+145<-opimai_real()+205<-ssthrdmain()+412<-main()+262<-__libc_start_main()+253

Also

Current Wait Stack:
1: waiting for 'gcs drm freeze in enter server mode'
...
ksedsts()+409<-ksdxfstk()+71<-ksdxcb()+912<-sspuser()+215<-__sighandler()<-semtimedop()+10<-skgpwwait()+200<-ksliwat()+2192<-kslwaitctx()+196<-
kjsdrmwt()+1172<-kjsmbesmi()+10365<-kjbclose()+3304<-kcl_le2flash()+2371<-kcbbic_l2()+4361<-kcbbiop_l2()+1157<-kcbbdrv()+8619<-ksbabs()+665<-
ksbrdp()+1065<-opirip()+545<-opidrv()+594<-sou2o()+145<-opimai_real()+205<-ssthrdmain()+412<-main()+262<-__libc_start_main()+253
...

Requires

Cluster configuration
Flash enabled



22083366 - LNX64-112-CMT: CKPT BLOCKING SESSION DURING INDEX REBUILD USING FLASH CACHE  --- Base Bug 19347458

Symptoms:
  • CKPT is blocking other sessions
  • "library cache load lock"
  • "write complete waits"
  • "latch: cache buffers lru chain"
  • Checkpoint hang which is blocked by flash buffers.
  • Customer can't reproduce the hang if he disabled the flash cache.
    It seems to be independent of the size of the redo logs.
Requires
Cluster Configuration
Flash enabled
22084935 - TRUNCATE STATEMENT HANGING ON GC CURRENT REQUEST WAIT EVENT
Symptoms:
  • Truncate statement hanging on "gc current request"
  • Waiting for 'gc current request'
  • We had to restart the complete database every two days
  • Can't reproduce the hang after disabling the flash cache.

ksedsts()+244<-ksdxfstk()+58<-ksdxcb()+918<-sspuser()+224<-__sighandler()<-__poll()+47<-ssskgxp_poll()+63<-sskgxp_selectex()+423<-skgxpiwait()+3894<-skgxpwaiti()+1900<-skgxpwait()+178<-
ksxpwait()+2585<-ksliwat()+9356<-kslwaitctx()+161<-ksxprcv_int()+6182<-ksxprcvimdwctx()+59<-ipc_wait_ctx()+359<-kcllkopb()+2133<-kclgrlk()+1574<-kcbzib()+31106<-kcbgcur()+22133<-
ktbgcur()+130<-kdifrb()+645<-kkbtis_trunc_idx_seg()+1333<-kkbindl()+745<-kkbtrndotbl()+571<-kkbtrndo1()+321<-kkbtrn()+2389<-opiexe()+23473<-opiosq0()+4559<

Requires
Cluster Configuration
Flash enabled
23614682 - TRUNCATE HUNG FOR 'GC CURRENT REQUEST' AFTER APPLYING 19347458  -- dup of bug 22083366
Symptoms:
  • waiting for 'gc current request'
  • A process hung waiting for 'gc current request'.
    Resource dump on the master instance showed it was blocked by a flash shadow.
  • Truncate Table
  • wait times: snap=1263 min 54 sec, exc=1263 min 54 sec, total=1263 min 54 sec <<<<<<<<<<
    wait times: max=infinite
  • request was blocked by a flash shadow on the master instance (instance 2)
  • GCS RESOURCE END
    -> GCS SHADOW END

Requires
Cluster configuration
Flash enabled
 

 

The following bug 22894949 is added to this note as MERGE requests should also make sure that the fix is also included:
Reason: Problems encountered are more likely on RAC and regarding poor performance.

Bug 22894949 - RAC PERF: CR TEST FOREGROUNDS NOW USE INDIRECT SENDS CAUSING 25% LESS THROUGHPUT

ODA BASED - Here are more ODA X5-2 filed bugs which expand on the above similar bugs

Bug:20248397 - ODA - FLASH CACHE REMOTE SERVICE MISBEHAVE ON RAC DATABASE
Bug:20780960 - FOR ODA ONLY: WHEN ONE INSTANCE CRASHES ITS FLASH CACHE IS LOST
Bug:21171684 - FOR ODA ONLY: RESERVE MEMORY FOR A DEAD INSTANCE'S FLASH CACHE
Bug:21175460 - FOR ODA ONLY: RECOVER A DEAD INSTANCE'S FLASH CACHE 

OVERVIEW

 When using the DB Smart Flash Cache feature in ODA, parameter '_cluster_flash_cache_slave_file' is automatically set to point  to the remote instance's DB flash cache.
- Slightly understated: You do NOT need a second instance such as RAC to see tis problem: ODA USES BOTH NODES FLASH REGARDLESS OF DATABASE TYPE

@ INTERNAL PROBLEM DESCRIPTION:

@ When using the DB Smart Flash Cache feature in ODA, instead of
@ manually setting parameter '_cluster_flash_cache_slave_file' on
@ each instance, it will now be automatically set to point to the
@ remote instance's DB flash cache.
@
@ This reduces the number of parameters the customer needs to set.

@ INTERNAL FIX DESCRIPTION:

@ In short, when 'ODA' is used, after both instances are up, a new
@ message ('KJX_CFC_SND') is sent to the remote instance. This message
@ contains the list of 'db_flash_cache_file' filenames. When the
@ remote instance receives this message, it calls function 'kspsetv0()'
@ to set parameter '_cluster_flash_cache_slave_file' with these values.


 

 

STATE of FLASH PATCH UPDATE  -- 12/8/2016 [CL]

Pending - to be fixed in 12.1.2.10

21794615 - LNX64-122-RAC-CDB:LMS HUNG ON "LATCH: GC ELEMENT',LMD ORA-600[KJMSCNDSCQ:TIMEOUT

  Refer to Bug Tree - https://bug.oraclecorp.com/pls/bug/webbug_reports.bugtreecom?bug_no_param=21794615

       - Fixed Ver: 12.1.0.2.170117DBBP < Not yet created

While several BLRshave been created including the following - no actual patch has been created
-- No confirmed fix available for download, just requests and In-Progress

*** ARU aru-bug_us
@ *** (DEL: Patch Automation Attributes -> Status - APF-INPROGRESS)
@ *** (CHG: Patch Automation Attributes -> NULL->Status - APF-MERGE)

11/07/16 Fixed->12.1.0.2.160419DBPSU)   - yet no patch
09/12/16 Fixed->11.2.0.4.8DBPSU)            - yet no patch
09/19/16 Fixed->11.2.0.4.7DBPSU)            - yet no patch

 


 

 The following bugs are fixed as of 12.1.2.9.0 plus most have various PSEs, but no confirmation of conflict-free = check for conflicts

20048359 - PARALLEL QUERY CONSUMES CPU TIME ON RAC WITH DATABASE SMART FLASH CACHE

Refer to Bug Tree    https://bug.oraclecorp.com/pls/bug/webbug_reports.bugtreecom?bug_no_param=20048359

- Fixed Ver: 12.1.0.2.5DBPSU - Should be inclusive from this version forward

20332536 - (O) - 93 - PSE FOR BASE BUG 20048359 ON TOP OF DATABASE PSU 12.1.0.2.2 FOR LINUX X86-64 [22
20774586 - (O) - 93 - PSE FOR BASE BUG 20048359 ON TOP OF DATABASE PSU 12.1.0.2.3 FOR LINUX X86-64 [22
...etc
Plus several PSEs for 11.2.0.4.x


21422580 - DBW USING 100% CPU WITH ENQ: RO - FAST OBJECT REUSE DURING TRUNCATE OR DROP TABLE

Refer to Bug Tree    https://bug.oraclecorp.com/pls/bug/webbug_reports.bugtreecom?bug_no_param=21422580

         - Fixed Ver: 12.1.0.2.12DBBP - Should be inclusive from this version forward


23743629 - (O) - 93 - PSE FOR BASE BUG 21422580 ON TOP OF 11.2.0.4.0 FOR LINUX X86-64 [226] (PSE #5260
23291459 - (O) - 93 - PSE FOR BASE BUG 21422580 ON TOP OF 12.1.0.2.0 FOR LINUX X86-64 [226] (PSE #5190
24621458 - (O) - 93 - PSE FOR BASE BUG 21422580 ON TOP OF DATABASE BP 12.1.0.2.160719 FOR LINUX X86-64
...etc


22083366 - LNX64-112-CMT: CKPT BLOCKING SESSION DURING INDEX REBUILD USING FLASH CACHE

         - Refer to Bug Tree https://bug.oraclecorp.com/pls/bug/webbug_reports.bugtreecom?bug_no_param=22083366

    - Fixed Ver: 12.1.0.2.160927FA-DBBP

24603992 - (Z) - 35 - CI BACKPORT OF BUG 22083366 FOR INCLUSION IN FA DATABASE BP 12.1.0.2.160927
24354097 - (Z) - 35 - CI BACKPORT OF BUG 22083366 FOR INCLUSION IN DATABASE BP 12.1.0.2.161018
24009671 - (Z) - 35 - CI BACKPORT OF BUG 22083366 FOR INCLUSION IN DATABASE PSU 12.1.0.2.161018
...etc

Bug Key symptom(s) Abstract Fixed
20229502 WAITS FOR 'GC QUIESCE'
'ENQ: TM - CONTENTION'    
20248397        
19565714 DBWR AT 100% CPU   Drop Table Slow  
21171684        
21794615 'LATCH: GC ELEMENT' ORA-600[KJMSCNDSCQ:TIMEOUT    
20780960 INSTANCE CRASH      
22083366 CKPT BLOCKING SESSION   INDEX REBUILD  
22894949     25% LESS THROUGHPUT  
20048359     PARALLEL QUERY  
21175460        
21967137 ENQ: RO - FAST OBJECT REUSE      
22090426 ENQ: RO - FAST OBJECT REUSE CONTENTION   DROP or TRUNCATE  
22488837 'ENQ: RO - FAST OBJECT REUSE'   TRUNCATE PARTITION HANG  
22848234 'ENQ: RO - FAST OBJECT REUSE'   TRUNCATE HANG  
22848234 'ENQ: RO - FAST OBJECT REUSE' BLOCKED BY CKPT    
23191293 'ENQ: RO - FAST OBJECT REUSE' DBW HIGH CPU    
24325501 DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL HANG   HANG  
19849781 ORA-600 [KJBRCHKPKEYWAIT:TIMEOUT      
23614682 'GC CURRENT REQUEST'   AFTER APPLYING 19347458  
20048359 excessive CPU Poor RAC performance    
1981471.1     Problems In a RAC Database Using Smart Flash Cach  
2104793.1     Truncate Table  
2075831.1  Instance May Crash ORA-600 [kjbrchkpkeywait:timeout]    
         

 

 

 

 

References

<BUG:23191293> - DBW HIGH CPU USAGE WITH ENQ: RO - FAST OBJECT REUSE ODA 12.1.0.2.3
<BUG:24325501> - DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL HANGS
<BUG:19565714> - DROPPING A TABLE STORED IN FLASH CACHE TAKES TOO LONG, WITH DBWR AT 100% CPU
<BUG:21967137> - WAIT ON ENQ: RO - FAST OBJECT REUSE
<BUG:20229502> - WAITS FOR 'GC QUIESCE' AND 'ENQ: TM - CONTENTION' WITH FLASH CACHE
<BUG:21171684> - FOR ODA ONLY: RESERVE MEMORY FOR A DEAD INSTANCE'S FLASH CACHE
<BUG:21794615> - LNX64-122-RAC-CDB:LMS HUNG ON "LATCH: GC ELEMENT',LMD ORA-600[KJMSCNDSCQ:TIMEOUT
<BUG:20248397> - ODA - FLASH CACHE REMOTE SERVICE MISBEHAVE ON RAC DATABASE
<BUG:20780960> - FOR ODA ONLY: WHEN ONE INSTANCE CRASHES ITS FLASH CACHE IS LOST
<BUG:22083366> - LNX64-112-CMT: CKPT BLOCKING SESSION DURING INDEX REBUILD USING FLASH CACHE
<BUG:22090426> - ENQ: RO - FAST OBJECT REUSE CONTENTION DURING "DROP" OR "TRUNCATE" OPERATION.
<BUG:22894949> - RAC PERF: CR TEST FOREGROUNDS NOW USE INDIRECT SENDS CAUSING 25% LESS THROUGHPUT
<BUG:23519538> - MERGE REQUEST ON TOP OF DATABASE PSU 11.2.0.4.8 FOR BUGS 22083366 22894949
<BUG:20048359> - PARALLEL QUERY CONSUMES CPU TIME ON RAC WITH DATABASE SMART FLASH CACHE
<BUG:21175460> - FOR ODA ONLY: RECOVER A DEAD INSTANCE'S FLASH CACHE
<NOTE:2091770.1> - ODA (Oracle Database Appliance Machine ) Critical Issues
<BUG:22488837> - TRUNCATE PARTITION HANG WITH 'ENQ: RO - FAST OBJECT REUSE' FOR 259 MIN 18 SEC
<BUG:22848234> - TRUNCATE HANG WITH 'ENQ: RO - FAST OBJECT REUSE' BLOCKED BY CKPT
<NOTE:1981471.1> - Performance Problems In a RAC Database Using Smart Flash Cache
<BUG:22894949> - RAC PERF: CR TEST FOREGROUNDS NOW USE INDIRECT SENDS CAUSING 25% LESS THROUGHPUT
<BUG:20054368> - ODA: GLOBAL FLASH CACHE QUERY HANGS AFTER STARTING THE SECOND INSTANCE
<BUG:22120580> - ACFS MOUNTS HANG DURING CROSS PLATFORM RESTORE TO ACFS MOUNTS
<BUG:22174238> - STATFS/OFSGENSTATFS TAKES GBM DLM LOCK UNNECESSARILY, CAUSING LOCK CONTENTION
<BUG:22198176> - KSMALLOC CAN SPIN INDEFINITELY WITH KMALLOC FAILURE ON 3.8 CAUSING HANGS
<BUG:21675080> - TIMEOUT PANIC AS THINGS ARE REALLY SLOW - POSSIBLE DEADLOCK
<NOTE:2104793.1> - Truncate Table Hanging on "gc current request" Wait
<NOTE:2075831.1> - ODA X5-2: When Using DB Flash Cache on RAC, Instance May Crash With ORA-600 [kjbrchkpkeywait:timeout]
<BUG:19849781> - ORA-600 [KJBRCHKPKEYWAIT:TIMEOUT], WRONG PKEY/OBJD ON IOT
<BUG:23614682> - TRUNCATE HUNG FOR 'GC CURRENT REQUEST' AFTER APPLYING 19347458
<NOTE:20048359.8> - Bug 20048359 - Poor RAC performance and excessive CPU usage using database flash cache

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback