![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Problem Resolution Sure Solution 2149887.1 : SuperCluster: Transient Threads Can Lead to Instance Crashes, Node Evictions and Random Database or Application Performance Issues
This Document describes a SuperCluster critical issue . If classified as a critical issue the item specified as the solution is considered mandatory. Applies to:Oracle SuperCluster T5-8 Half Rack - Version All Versions and laterOracle SuperCluster T5-8 Full Rack - Version All Versions and later SPARC SuperCluster T4-4 - Version All Versions and later SPARC SuperCluster T4-4 Half Rack - Version All Versions and later Oracle SuperCluster M6-32 Hardware - Version All Versions and later Oracle Solaris on SPARC (64-bit) SymptomsNode evictions due to heart beat timeouts with no apparent problem on the IB fabric. Instance evictions due to delayed lms or lmd ping acknowledgements with no apparent interconnect problems. Unexplained delays in java or shell script code on very busy application nodes.
ChangesIssues seems to be more prevalent in Solaris 11.2 and 11.3 and is triggered frequently by excessive CPU saturation. Cause Unpublished Bug 17697871 With the introduction of workload characterization optimization for threads, threads can be marked as TRANSIENT if CPU utilization does not exceed a certain threshold. This intends to identify threads that use low CPU resources. The transience counter ( t_transience kthread_t field) gets incremented each time a thread consumes less than thread_transience_pct (0.02% default) of a CPU's resources. If that counter reaches 10, the thread is flagged as TRANSIENT. CPUs running TRANSIENT threads are also flagged as CPU_DISP_TRANSIENT. This causes some optimizations to be triggered within the scheduler. Usually idle CPUs will steal threads in dispatch queues of other busy CPUs. But, in this case, dispatch queues of a CPU flagged as CPU_DISP_TRANSIENT are not considered for processing, even if there are idle CPUs, since Solaris makes the assumption that the transient thread will leave the CPU soon. The problem appears when a transient thread running on a CPU starts to behave as non-transient. This can lead to threads staying too long in dispatch queues for that CPU, causing latency bubbles and possible scheduling issues. SolutionEdit your /etc/system file in every global zone to set the following settings and reboot: set thread_transience_kernel=0
set thread_transience_user=0
When a future QFSDP is released that has the fix this document will be edited to reflect.
References<NOTE:1424503.2> - Information Center: SuperCluster<NOTE:2088923.1> - Oracle SuperCluster Application Domain and Zones Best Practices <NOTE:2004702.1> - Oracle SuperCluster Best Practices <NOTE:1625975.1> - On-proc TRANSIENT Threads Can Delay Runnable Threads Leading to Cluster Node Evictions <BUG:17697871> - SUNBT7199390 RUNNABLE THREAD OCCASIONALLY STAYS IN RUN QUEUE FOR TOO LONG Attachments This solution has no attachment |
||||||||||||
|