![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 2120400.1 : Flash Disk Group and Cache Performance Problems in a Clustered Configuration Including ODA X5-2; Symptoms: Very High DBWR CPU, DB FLASH cache waits (User I/O) ; Buffer Cache / Busy Waits (Concurrency) ; enq: TX-row lock contention (Concurrency)
Storing data on Flash Disks or using FLASH Cache in a clustered configuration can experience poor performance up to and including Locks or Hangs such as "library cache load lock", "write complete waits" and "latch: cache buffers lru chain during operations that use FLASH. The Oracle Database Appliance X5-2 which is a 2-node cluster configuration by default enables remote access to the FLASH disk group which can be used for storage or flash_cache functionality. A workaround that may confirm you are hitting this issue is to disable FLASH using ALTER SYSTEM set db_flash_cache_size = 0 (see documentation for usage). While this issue is normally seen on ODA implementations,this can occur on any platform that accepts and uses Flash Cache Applies to:Oracle Database Appliance X5-2 - Version All Versions to All Versions [Release All Releases]Oracle Database - Enterprise Edition - Version 11.2.0.1 to 12.2.0.1 [Release 11.2 to 12.2] Information in this document applies to any platform. ODA, performance problems, CPU 100%, Hang, FLASH,ODI TRUNCATE,LOCK,RAC,DBWR,SPIN,CHKPT, Lock, outage SymptomsThis note discusses two independent forms of FLASH usage:
Symptoms Using Database Flash Cache does not mean you will hit this problem with your usage.
This can occur on any platform that uses Flash Cache on more than the local node,
(Doc ID 2120400.1). Formerly Labeled -- ODA X5-2: Lock, Hang for Checkpoint (CKPT) or DBwriter (DBWR), 'GC CURRENT REQUEST' or Intermittent Bad Performance, High CPU and / or Very Poor IO * The Oracle Database Appliance (ODA) X5-2 by default automatically enables remote access to the FLASH DISK GROUP* on both nodes if cache size <> 0: An ASM diskgroup named +FLASH with Normal Redundancy is provisioned on these SSDs. All of the storage in the +FLASH diskgroup is allocated to an ASM Dynamic Volume (flashdata), and formatted as an ACFS file system. The file that contains the flash cache is automatically created for each database and is specified using the database init.ora parameter db_flash_cache_file. Rediscovery
SYMPTOMS High CPU
Flash metrics confirm active usage during slow performance time windows:
RAC Node instability after enabling FLASH on both nodes and restarting the instances.
¹ Note that the ODA X5-2 can implicitly use both nodes of the FLASH for Single Instance database as well as for RAC[one]. - RAC PERF: CR TEST FOREGROUNDS NOW USE INDIRECT SENDS CAUSING 25% LESS THROUGHPUT (bug 22894949) 22894949 has fixes / PSEs for multiple platforms and versions
ChangesPre-Requisites
This problem is most frequently seen when moving from a non-RAC configuration to the ODA X5-2 * Note that the ODA can implicitly use both nodes of the FLASH for RAC or RACone. NOTE: While this document was originally written for the ODA X5-2, the bug is generic to multiple nodes + FLASH usage on RDBMS 11.2.0.3.x and higher including 12.x
Several RDBMS fixes now exist for this problem and others are in progress. We recommend applying this single-patches on your 12.x RDBMS versions for now and will include this in the next ODA Patch Bundle.
CauseInternal Bugs for the most common issues (May not be comprehensive or up to date): 20048359 - PARALLEL QUERY CONSUMES CPU TIME ON RAC WITH DATABASE SMART FLASH CACHE SolutionApply the most current Patch Bundle: Then, as appropriate: Apply merge or single-patch bug fixes for the following known issues: 20048359 - PARALLEL QUERY CONSUMES CPU TIME ON RAC WITH DATABASE SMART FLASH CACHE Patch Set Exceptions (PSEs) are available for he following versions
*21422580 - DBW USING 100% CPU WITH ENQ: RO - FAST OBJECT REUSE DURING TRUNCATE OR DROP TABLE
*12.1.2.9.0 does not include fix for bug:21422580 ** 21794615 - LNX64-122-RAC-CDB:LMS HUNG ON "LATCH: GC ELEMENT',LMD ORA-600[KJMSCNDSCQ:TIMEOUT fixed in 12.1.2.10 - Scheduled for Early 2017 includes patchset exceptions ( one-offs for the following the following RDBMS versions) -- The fix is applied as a RDBMS Bug fix: Most current RDBMS versions will have fixes for the above bugs. Workaround: Disable Flash on one or both nodes ALTER SYSTEM set db_flash_cache_size = 0 (see documentation for usage) DB_FLASH_CACHE_SIZE specifies the size of the Database Smart Flash Cache (flash cache).
This parameter may only be specified at instance startup.
You can dynamically change this parameter to 0 (disabling the flash cache) after the database is started. You can re-enable flash cache by setting this parameter to the original value when the database was started. Please refer to the docs.oracle.com for latest usage http://docs.oracle.com/database/121/REFRN/GUID-9B2E54F0-F2C0-4C3A-88B7-C2F64F32376F.htm#REFRN10316 * Note that the ODA can implicitly use both nodes of the FLASH for RAC or RACone or EE. Potential Workaround: Conditions which can workaround some Flash Cache problems include altering the Flash Cache mode in a cluster configuration. Here are the allowable setting of _gcs_cluster_flash_cache_mode: 0 - disable cluster flash cache
1 - enable cluster flash cache - master instance will allow non-master instance to read its flash cache buffers. Non-master instance will not use flash cache to cache local buffers. 3 - allow both master and non-master instances to use flash cache. Master flash cache can be shared between master and non-master instance. Non-master flash cache can only be access by local instance not remote (or master instance).
_gcs_cluster_flash_cache_mode=0 -- is another second option. After testing with level 3 we also recommend testing level 0
The above potential workaround was added by request for bug 26197067 - but not tested RELATED Bugs and Symptoms 18628484 - TST&PERF:HIGH 'ENQ:RO - FAST OBJECT REUSE' WAIT TIME WHEN TRUNCATING TABLE ON RAC Symptoms:
19347458 - SESSIONS WAITING INDEFINITELY FOR 'GC CURRENT REQUEST'
Symptoms:
SQL> select inst_id, sid, event, blocking_instance, blocking_session, seconds_in_wait, p1,p2,p3 from gv$session where seconds_in_wait > 0 and event like 'gc %' See both nodes GC waits Requires cluster_database=true Flash enabled 19565714 - DROPPING A TABLE STORED IN FLASH CACHE TAKES TOO LONG, WITH DBWR AT 100% CPU -- Related to bug 22083366 --- included in DBPSU 12.1.0.2.161018 and DBBP 12.1.0.2.161018 Symptoms: DIAGNOSTIC ANALYSIS: -------------------- DBW processes were near 100% CPU: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4981 oracle 20 0 18.4g 33m 21m R 100.0 0.0 172:09.84 ora_dbw1_tst1 ksedsts()+647<-ksdxfstk()+58<-ksdxcb()+901<-sspuser()+227<-__sighandler()<-kcbo_service_ockpt()+1265<-kcbbdrv()+8841<-ksbabs()+539<-ksbrdp()+1209
*** 2014-09-02 12:08:16.369 "...Tried the same test with different configurations:( and the problem does not reproduce ) 20048359 - Poor RAC performance and excessive CPU usage using database flash cache ---included in DBPSU 12.1.0.2.5 DBBP 12.1.0.2.12
Symptoms: free buffer inspected' increases dramatically -- This represents number of examined buffer to reuse but failed to reuse. "...DBWR cannot keep up with foregground requests to move SCURRENT buffer to flash cache..." 20229502 -WAITS FOR 'GC QUIESCE' AND 'ENQ: TM - CONTENTION' WITH FLASH CACHE Symptoms: The "gc quiesce" wait is high with high flash cache activities and occasional high wait on TM enqueue blocked by flash cache wait. Top wait events: Event Total Wait Wait % DB Wait DIAGNOSTIC ANALYSIS: Cluster Configuration: Flash enabled 21137784 - DBW0 SPINS IF IT CAN NOT SHRINK AN LE TO FLASH LOCK ---- Fix pulled in favor of 21794615 Symptoms:
*** 2015-05-22T13:42:59.198974+17:00 KCL: L83: cannot shrink: not master GLOBAL CACHE ELEMENT DUMP (address: 0x61f62130): id1: 0xae8e id2: 0x1 pkey: OBJ#0,0,18 block: (1/44686) -- *** 2015-05-22T13:43:00.620439+17:00 KCL: L83: cannot shrink: not master GLOBAL CACHE ELEMENT DUMP (address: 0x61f62130): id1: 0xae8e id2: 0x1 pkey: OBJ#0,0,18 block: (1/44686) -- *** 2015-05-22T13:43:02.437729+17:00 KCL: L83: cannot shrink: not master GLOBAL CACHE ELEMENT DUMP (address: 0x61f62130): id1: 0xae8e id2: 0x1 pkey: OBJ#0,0,18 block: (1/44686) Requires Cluster Configuration Flash enabled 21422580 - DBW USING 100% CPU WITH ENQ: RO - FAST OBJECT REUSE DURING TRUNCATE OR DROP TABLE Symptoms:
zzz ..... top -... PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 34745 oracle 20 0 2415m 33m 23m R 99.8 0.0 8179:44 ora_dbw0_... <<<<<<<<<<< Both the dbw processes consuming 100% all the time.. . ############################################################################## ## Procwatcher Top CPU Consumers Report ############################################################################## ## Thu Jul 23 08:30:52 CEST 2015: Procwatcher Top CPU Consumers: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 67906 oracle 20 0 2422m 38m 23m R 99.9 0.0 39181:20 ora_dbw0_ 67910 oracle 20 0 2421m 37m 21m R 97.9 0.0 42113:02 ora_dbw1_ ############################################################################## ## Thu Jul 23 08:31:31 CEST 2015: Procwatcher Top CPU Consumers: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 67906 oracle 20 0 2422m 38m 23m R 100.0 0.0 39181:59 ora_dbw0_ 67910 oracle 20 0 2421m 37m 21m R 99.8 0.0 42113:41 ora_dbw1_ The AWR indicate there is hardly any workload in the database. The physical read and write activities were very low. . ksedsts()+244<-ksdxfstk()+58<-ksdxcb()+918<-sspuser()+224<-__sighandler()<-_IO_vfprintf()+3355<-vsnprintf()+154<-00007FFC0F908735 . ksedsts()+244<-ksdxfstk()+58<-ksdxcb()+918<-sspuser()+224<-__sighandler()<-lseek()+16<-sdbgrfsf_seek_file()+37<-dbgtfdFileWrite()+2339<-dbgtfdFileAccessCbk(+357 <-dbgtfPutStr()+592 <-dbktPri()+149<-ksdwrf()+551 <-ksdwrfn()+187<-kcbbdrv()+22416<-ksbabs()+874<-ksbrdp()+1068<-opirip()+1488<-opidrv()+616<-sou2o()+145<-opimai_real()+270<-ssthrdmain()+412<-main()+236<-__libc_start_main()+244 . ksedsts()+244<-ksdxfstk()+58<-ksdxcb()+918<-sspuser()+224<-__sighandler()<-write()+16<-sdbgrfwf_write_file()+64<-dbgtfdFileWrite()+674<-dbgtfdFileAccessCbk()+357<-dbgtfPutStr()+592<-dbktPri()+149<-ksdwrf()+551 <-ksdwrfn()+187<-kcbbdrv()+12541<-ksbabs()+874<-ksbrdp()+1068<-opirip()+1488<-opidrv()+616<-sou2o()+145<-opimai_real()+270<-ssthrdmain()+412<-main()+236<-__libc_start_main()+244 ---------- . And from the wait chain data, we can see checkpoint process on both the instances blocking sessions from acquiring RO enqueue. . Eg: . ---------- PROC 21181 : Current Process: 21181 SID: 482 SER#: 13637 INST osabc INST #: 1 PROC 21181 : Blocking Process: 37002 from Instance 1 Number of waiters: 0 PROC 21181 : Final Blocking Process: 37002 from Instance 1 Program: oracle@oda1base1 (CKPT) <<<< ksedsts()+244<-ksdxfstk()+58<-ksdxcb()+918<-sspuser()+224<-__sighandler()<-kcbbcwr()+134<-kcbbdrv()+5078<-ksbabs()+874<-ksbrdp()+1068<-opirip()+1488<-opidrv ()+616<-sou2o()+145<-opimai_real()+270<-ssthrdmain()+412<-main()+236<-__libc_start_main()+244 Requires Cluster Configuration Flash enabled 21540885 - DBW HIGH CPU USAGE WITH ENQ: RO - FAST OBJECT REUSE --- dup of 21422580
Symptoms:
"...From traces, during reported time dbwrs seem busy with writes they don't look stuck or spinning. T here are many buffers which are not getting written on first request due to fusion write permission issue on PI buffers. - KCBB: going to write KCBB: rwrite status 1 KCBB: write permission pending, requests 6, last successful write 0 seconds ago .. Also, during the next occurrence could you please ask customer to get output of following queries? Diagnostic 1. select ckpt_priority, ckpt_flags, count(*) from x$activeckpt where ckpt_type=2 group by ckpt_priority, ckpt_flags; 2. select * from x$kcbwds; 3. select * from x$kcbbes; (every 20 secs. about 15 times) zzz ***Tue .... top - 13:36:02 up 161 days, 16:00, 3 users, load average: 4.46, 4.40, 4.36 Tasks: 1056 total, 5 running, 1051 sleeping, 0 stopped, 0 zombie Cpu(s): 35.9%us, 17.0%sy, 0.0%ni, 46.4%id, 0.7%wa, 0.0%hi, 0.0%si,0.0%st Mem: 264286692k total, 203135524k used, 61151168k free, 1928424k buffers Swap: 25165820k total, 11099360k used, 14066460k free, 46073292k cached . PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 20470 oracle 20 0 2420m 46m 27m R 99.7 0.0 4:25.34 ora_dbw1_.... Bug 22848234 has been marked as a potential duplicate of this bug Requires Cluster configuration Flash enabled 21794615 - LNX64-122-RAC-CDB:LMS HUNG ON "LATCH: GC ELEMENT',LMD ORA-600[KJMSCNDSCQ:TIMEOUT
...stress testing an 8-nodes database... Symptoms: Instance LMS stuck on 'latch: gc element',
... LMS2 (ospid: 8096) waits for event 'latch: gc element' for 89 secs. LMS2 (ospid: 8096) waits for latch 'gc element' for 89 secs. LMS7 (ospid: 8116) waits for event 'latch: cache buffers chains' for 243 secs. LMS7 (ospid: 8116) waits for latch 'cache buffers chains' for 243 secs. LMSA (ospid: 8128) waits for event 'latch: cache buffers chains' for 400 secs. LMSA (ospid: 8128) waits for latch 'cache buffers chains' for 400 secs. test3_lmd0_8765.trc ... (incident=12345) (PDBNAME=CDB$ROOT): ORA-00600: internal error code, arguments: [kjmscndscq:timeout], [480], [240], [0], [0], [4], [8224], [], [], [], [], [] Incident details in: /ora12/app/gridbase/diag/rdbms/test/test3/incident/incdir_5631685/test3_lmd0_......trc *** 20...:10:04.267967+17:00 (CDB$ROOT(1)) ==============================
Process DBW1 (ospid: ####): latch state and session wait history Waiting For: Current Wait Stack: 22083366 - LNX64-112-CMT: CKPT BLOCKING SESSION DURING INDEX REBUILD USING FLASH CACHE --- Base Bug 19347458 Symptoms:
Cluster Configuration Flash enabled 22084935 - TRUNCATE STATEMENT HANGING ON GC CURRENT REQUEST WAIT EVENT
Symptoms:
ksedsts()+244<-ksdxfstk()+58<-ksdxcb()+918<-sspuser()+224<-__sighandler()<-__poll()+47<-ssskgxp_poll()+63<-sskgxp_selectex()+423<-skgxpiwait()+3894<-skgxpwaiti()+1900<-skgxpwait()+178<- Cluster Configuration Flash enabled 23614682 - TRUNCATE HUNG FOR 'GC CURRENT REQUEST' AFTER APPLYING 19347458 -- dup of bug 22083366
Symptoms:
Requires Cluster configuration Flash enabled
The following bug 22894949 is added to this note as MERGE requests should also make sure that the fix is also included: Bug 22894949 - RAC PERF: CR TEST FOREGROUNDS NOW USE INDIRECT SENDS CAUSING 25% LESS THROUGHPUT ODA BASED - Here are more ODA X5-2 filed bugs which expand on the above similar bugs Bug:20248397 - ODA - FLASH CACHE REMOTE SERVICE MISBEHAVE ON RAC DATABASE @ INTERNAL PROBLEM DESCRIPTION: @ When using the DB Smart Flash Cache feature in ODA, instead of @ INTERNAL FIX DESCRIPTION: @ In short, when 'ODA' is used, after both instances are up, a new
STATE of FLASH PATCH UPDATE -- 12/8/2016 [CL] Refer to Bug Tree - https://bug.oraclecorp.com/pls/bug/webbug_reports.bugtreecom?bug_no_param=21794615 - Fixed Ver: 12.1.0.2.170117DBBP < Not yet created While several BLRshave been created including the following - no actual patch has been created *** ARU aru-bug_us 11/07/16 Fixed->12.1.0.2.160419DBPSU) - yet no patch
The following bugs are fixed as of 12.1.2.9.0 plus most have various PSEs, but no confirmation of conflict-free = check for conflicts 20048359 - PARALLEL QUERY CONSUMES CPU TIME ON RAC WITH DATABASE SMART FLASH CACHE Refer to Bug Tree https://bug.oraclecorp.com/pls/bug/webbug_reports.bugtreecom?bug_no_param=20048359 - Fixed Ver: 12.1.0.2.5DBPSU - Should be inclusive from this version forward 20332536 - (O) - 93 - PSE FOR BASE BUG 20048359 ON TOP OF DATABASE PSU 12.1.0.2.2 FOR LINUX X86-64 [22
Refer to Bug Tree https://bug.oraclecorp.com/pls/bug/webbug_reports.bugtreecom?bug_no_param=21422580 - Fixed Ver: 12.1.0.2.12DBBP - Should be inclusive from this version forward
- Refer to Bug Tree https://bug.oraclecorp.com/pls/bug/webbug_reports.bugtreecom?bug_no_param=22083366 - Fixed Ver: 12.1.0.2.160927FA-DBBP 24603992 - (Z) - 35 - CI BACKPORT OF BUG 22083366 FOR INCLUSION IN FA DATABASE BP 12.1.0.2.160927
References<BUG:23191293> - DBW HIGH CPU USAGE WITH ENQ: RO - FAST OBJECT REUSE ODA 12.1.0.2.3<BUG:24325501> - DBMS_AUDIT_MGMT.CLEAN_AUDIT_TRAIL HANGS <BUG:19565714> - DROPPING A TABLE STORED IN FLASH CACHE TAKES TOO LONG, WITH DBWR AT 100% CPU <BUG:21967137> - WAIT ON ENQ: RO - FAST OBJECT REUSE <BUG:20229502> - WAITS FOR 'GC QUIESCE' AND 'ENQ: TM - CONTENTION' WITH FLASH CACHE <BUG:21171684> - FOR ODA ONLY: RESERVE MEMORY FOR A DEAD INSTANCE'S FLASH CACHE <BUG:21794615> - LNX64-122-RAC-CDB:LMS HUNG ON "LATCH: GC ELEMENT',LMD ORA-600[KJMSCNDSCQ:TIMEOUT <BUG:20248397> - ODA - FLASH CACHE REMOTE SERVICE MISBEHAVE ON RAC DATABASE <BUG:20780960> - FOR ODA ONLY: WHEN ONE INSTANCE CRASHES ITS FLASH CACHE IS LOST <BUG:22083366> - LNX64-112-CMT: CKPT BLOCKING SESSION DURING INDEX REBUILD USING FLASH CACHE <BUG:22090426> - ENQ: RO - FAST OBJECT REUSE CONTENTION DURING "DROP" OR "TRUNCATE" OPERATION. <BUG:22894949> - RAC PERF: CR TEST FOREGROUNDS NOW USE INDIRECT SENDS CAUSING 25% LESS THROUGHPUT <BUG:23519538> - MERGE REQUEST ON TOP OF DATABASE PSU 11.2.0.4.8 FOR BUGS 22083366 22894949 <BUG:20048359> - PARALLEL QUERY CONSUMES CPU TIME ON RAC WITH DATABASE SMART FLASH CACHE <BUG:21175460> - FOR ODA ONLY: RECOVER A DEAD INSTANCE'S FLASH CACHE <NOTE:2091770.1> - ODA (Oracle Database Appliance Machine ) Critical Issues <BUG:22488837> - TRUNCATE PARTITION HANG WITH 'ENQ: RO - FAST OBJECT REUSE' FOR 259 MIN 18 SEC <BUG:22848234> - TRUNCATE HANG WITH 'ENQ: RO - FAST OBJECT REUSE' BLOCKED BY CKPT <NOTE:1981471.1> - Performance Problems In a RAC Database Using Smart Flash Cache <BUG:22894949> - RAC PERF: CR TEST FOREGROUNDS NOW USE INDIRECT SENDS CAUSING 25% LESS THROUGHPUT <BUG:20054368> - ODA: GLOBAL FLASH CACHE QUERY HANGS AFTER STARTING THE SECOND INSTANCE <BUG:22120580> - ACFS MOUNTS HANG DURING CROSS PLATFORM RESTORE TO ACFS MOUNTS <BUG:22174238> - STATFS/OFSGENSTATFS TAKES GBM DLM LOCK UNNECESSARILY, CAUSING LOCK CONTENTION <BUG:22198176> - KSMALLOC CAN SPIN INDEFINITELY WITH KMALLOC FAILURE ON 3.8 CAUSING HANGS <BUG:21675080> - TIMEOUT PANIC AS THINGS ARE REALLY SLOW - POSSIBLE DEADLOCK <NOTE:2104793.1> - Truncate Table Hanging on "gc current request" Wait <NOTE:2075831.1> - ODA X5-2: When Using DB Flash Cache on RAC, Instance May Crash With ORA-600 [kjbrchkpkeywait:timeout] <BUG:19849781> - ORA-600 [KJBRCHKPKEYWAIT:TIMEOUT], WRONG PKEY/OBJD ON IOT <BUG:23614682> - TRUNCATE HUNG FOR 'GC CURRENT REQUEST' AFTER APPLYING 19347458 <NOTE:20048359.8> - Bug 20048359 - Poor RAC performance and excessive CPU usage using database flash cache Attachments This solution has no attachment |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|