![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1498896.1 : Tuning for SPARC SuperCluster and Solaris X86-64 Exadata (X2-2) RDS issues contributing to RDS Latency, RAC Node Evictions,Intermittent spikes in cluster waits and ORA-27300 MTU errors
In this Document
Applies to:Exadata Database Machine X2-2 Full Rack - Version All Versions to All Versions [Release All Releases]Exadata Database Machine X2-2 Half Rack - Version All Versions to All Versions [Release All Releases] SPARC SuperCluster T4-4 - Version All Versions to All Versions [Release All Releases] Oracle Database - Enterprise Edition - Version 11.2.0.3 to 11.2.0.3 [Release 11.2] Oracle SuperCluster T5-8 Hardware - Version All Versions to All Versions [Release All Releases] Oracle Solaris on x86-64 (64-bit) Oracle Solaris on SPARC (64-bit) This tuning is only appropriate for NUMA systems such as the T4-4 compute node in SPARC SuperCluster and Solaris x86-64 on X2-2. The tuning is not appropriate for general purpose LDoms within the SPARC SuperCluster. Symptoms
ScopeThis tuning is only appropriate for NUMA systems such as the T4-4 compute node in SPARC SuperCluster and Solaris x86-64 on X2-2. The tuning is not appropriate for general purpose LDoms within the SPARC SuperCluster. ProblemThere have been instances of split-brain and node evictions with some customer SuperClusterand Solaris X86-64 on X2-2 systems. These issues appear to be workload related and can affect some customers more than others. To determine if your customer is hitting this issue; review the symptoms below.
Symptoms
diskmon.log2012-10-14 00:30:58.012: [ DISKMON][11154:12] SKGXP:[100f97ba0.1139]{0}: SKGXP_DO_HEART_BEAT_RESP: NO HB PENDING source: 0 (max 2) in response from 192.168.20.9 mhbr 192.168.20.0/9 2012-10-14 00:30:58.013: [ DISKMON][11154:12] SKGXP:[100f97ba0.1140]{0}: SSKGXPT 101105b60 flags 0x2 { WRITE } sockno 122 IP 192.168.20.9 RDS 20143 lerr 0 2012-10-14 00:30:58.013: [ DISKMON][11154:12] SKGXP:[100f97ba0.1141]{0}: SSKGXPT 101105b90 flags 0x2 { WRITE } sockno 123 IP 192.168.20.9 RDS 20143 lerr 0 2012-10-14 00:30:58.013: [ DISKMON][11154:12] SKGXP:[100f97ba0.1142]{0}: SKGXPID 1105b14 vers 0 conproto 1 flags 8 magic 4c89 . . . 2012-10-14 00:30:58.037: [ DISKMON][11154:12] dskm_ant_rsc_monitor_start: rscnam: o/192.168.20.9 rsc: 1010b83e0 state: UNREACHABLE reconn_attempts: 7 last_reconn_ts: 1350199850 2012-10-14 00:30:58.037: [ DISKMON][11154:12] dskm_queue_tcpmon_request: posting 2012-10-14 00:30:58.037: [ DISKMON][11154:12] dskm_post_tcpmon_thrd 2012-10-14 00:30:58.037: [ DISKMON][11154:3] dskm_tcpmon_thrd_main: posted, poll returned with retcode = 45 2012-10-14 00:30:58.037: [ DISKMON][11154:3] dskm_tcpmon_thrd_main: Got a request with type 2, cellname = o/192.168.20.9, cellname length 15, cell incarnation = 0 2012-10-14 00:30:58.048: [ DISKMON][11154:12] dskm_health_check_ssb2: Checking if Cell o/192.168.20.9 is UNREACHABLE from all the nodes . . . 2012-10-14 00:31:02.271: [ DISKMON][11154:4] dskm_get_evt_mbr: member 2 signaled the event 2012-10-14 00:31:02.280: [ DISKMON][11154:4] dskm_cell_health_resp1: Encounter a split-brain with node 2, suicide self....
Panic string / System corefileThe system corefile generated from the resultant panic should be examined to determine if other factors are at play. The panic string typically generated is:
panic[cpu64]/thread=30167a04700: forced crash dump initiated at user request 000002a11f863930 genunix:kadmin+5a0 (0, 0, 10, 125c400, 5, 1) %l0-3: 000000000125c420 000000000125c400 0000000000000004 0000000000000004 %l4-7: 0000000000000208 0000000000000010 0000000000000004 0000000000000004 000002a11f863a00 genunix:uadmin+1c0 (1, 604e7775a98, 0, 1, 5, 5) %l0-3: 00000000fd4a0000 000000000000fd4a 0000000000000004 000003002a5b6000 %l4-7: 00005a2cf7153e2e 0000000000000000 0000000000000000 0000000000000000
rds-ping latencies && ib_tx_ring_fullLarge latencies in rds-ping times (1000's of usec) are observed on the system around the time of the event. These are recorded in OSWatcher data. Normal response time is around 100usec between database nodes.$ rds-ping -c 10 -I 192.168.10.10 192.168.10.8 1: 7701 usec 2: 6634 usec 3: 8448 usec 4: 5395 usecib_tx_ring_full will increment rapidly over a short period prior to the event. Note: ib_tx_ring_full is cumulative since boot. Large values may not necessarily indicate a problem, but a steady rapid increase over a short period is indicative of problems.
# date; rds-info -c | egrep 'tx_ring_full' Mon Oct 15 07:44:26 PDT 2012 ib_tx_ring_full 102047725 # date; rds-info -c | egrep 'tx_ring_full' Mon Oct 15 07:44:26 PDT 2012 ib_tx_ring_full 102049072
ORA-27300 errorsThe instance alert log shows evidence of ORA-00603: ORACLE server session terminated by fatal error ORA-27504: IPC error creating OSD context ORA-27300: OS system dependent operation:mtu select abnormal return failed with status: 0 ORA-27301: OS failure message: Error 0 ORA-27302: failure occurred at: skgxpvfymtu
Changes
NA Cause
These events are caused by latencies in RDSv3 communication between nodes of the SuperCluster. It has been determined that these latencies can be caused by Oracle RT processes starving rdsv3 worker threads of cpu time. The current remediation for this issue is to disable numa object binding within the kernel. This will allow the rdsv3 worker threads to be scheduled on an alternate cpu in the system.. SolutionOn SuperCluster you should make sure you have installed and run the ssctuner service. It will set this and all other /etc/system best practices. The manul tuning approach applies to Solaris exadata as well as other Solaris 11.0 and 11.1 RAC systems outside of engineered systems. Tuning
Please note that even though the text is marked as internal only but the tuning steps may be delivered to the customers via a Service Request. A reboot it required after making these changes SuperCluster To effect this change; /etc/system should be updated on all 'Exa' domains in the SuperCluster. i.e all domains running the 11gR2 database as part of the Exadata stack. After /etc/system has been updated a reboot is required. exclude:nxge set numaio_bind_objects=0 It's also recommended at this time to disable intrd. The following is persistent across reboots. # svcadm disable intrd Please note the numaio_bind_objects =0 is no longer a valid parameter in Solaris 11.2 . When SuperCluster goes to this version ssctuner will remove this setting. Please not for Database in zones these settings are all done in the global zone except for FX-60 thread priority. All work is still accomplished with ssctuner. Also for LMS and LGWR FX-60 is the default thread priority for LMS and LGWR Solaris Exadata To effect this change; /etc/system should be updated on all compute nodes in the Solaris Exadata. set numaio_bind_objects=0 It's also recommended at this time to disable intrd. The following is persistent across reboots. # svcadm disable intrd
References<BUG:15821624> - SUNBT7203790 RDS STALLS ON SPARC SUPER CLUSTER<BUG:15748320> - SUNBT7100788 MULTI-CPU BINDING FOR NUMA I/O <BUG:12951619> - DATABASE TO USE CRITICAL THREADS FEATURE IN SOLARIS <NOTE:1903388.1> - SuperCluster - ssctuner reference document <NOTE:1424503.2> - Information Center: SuperCluster Attachments This solution has no attachment |
||||||||||||||||||||||||||||||||||||||
|