![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1916992.1 : SuperCluster - GI and DB homes linked with UDP instead of RDS lead to CSSD reporting "has a disk HB, but no network HB" and "CSSD aborting from thread GMClientListener"
In this Document
Created from <SR 3-9391073151> Applies to:Oracle Database - Enterprise Edition - Version 11.2.0.3 and laterSolaris SPARC Operating System - Version 11.1 to 11.2 [Release 11.0] Oracle SuperCluster M6-32 Hardware - Version All Versions and later Oracle SuperCluster T5-8 Full Rack - Version All Versions and later SPARC SuperCluster T4-4 - Version All Versions and later Oracle Solaris on SPARC (64-bit) Oracle SuperCluster and version. Grid Infrastructure and/or Database Homes installed without using Java Once Command (JOC) SymptomsRAC CRS services on one or many nodes shutting down intermittently and not able to restart. OCSSD Log [ CSSD][5]clssgmShutDown: Received abortive shutdown request from client.
One node will usually remain up , typically the master node. On that node the following command will start to show rapidly accumulating Indle connections on the private interconnect. Typically you will start to see the other RAC nodes evict when the command below reaches around 2200 idle connections. Please not that if the GI/DB in question is in Oracle Soalris Zones then you have to run the netstat command below from within the local zone (non global zone). If the GI/DB in question is at the LDom level you run it from the global zone.
netstat -an |grep Idle| grep 192| wc-l
While Idles are acucmulating run the following dtrace probe for udp_bind, the call that allocates udp sockets, for a few seconds and then control + C it. dtrace -n 'udp_bind:entry{@x[pid,execname,ustack()] = count();}'
If you see multiple entries for skgcp functions then that is indicative of the problem a few every now and again is not bad but more than 10 or so in the matter of a few seconds is bad. In this case I retured over 100 matching calls in 5 seconds. 43317 oracle libc.so.1`_so_bind+0x4 libskgxp11.so`sskgxp_createport+0x2fc libskgxp11.so`_$o1cexiH0.skgxpicini+0x770 libskgxp11.so`skgxpcini_with_stats+0x174 oracle`ksxposdcini+0x32e0 oracle`ksxppluginosd+0x1308 oracle`ksxp_open+0x58c oracle`ksucrp+0x9f0 oracle`opiino+0x5b4 oracle`opiodr+0x48c oracle`opidrv+0x408 oracle`sou2o+0x58 oracle`opimai_real+0x1f8 oracle`ssthrdmain+0x13c oracle`main+0x13c oracle`_start+0x17c
ChangesThe environment has Grid Infrastructure and Database Homes that CauseThere were non Java One Command (JOC) homes were installed and then not linked with RDS. RAC by default is installed with UDP. SolutionVerify the condition by setting the ORACL_HOME form the home you are investigating and run the skgxpinfo command to see if it reports rds or udp $ORACLE_HOME/bin/skgxpinfo
Alternatively you can find the information in the ASM and/or Database alert logs grep 'cluster interconnect IPC version' /<path_to_oracle_base>/diag/rdbms/<sid_name>/<instance_name>/trace/alert*.log
The supported ones will reflect cluster interconnect IPC version:Oracle RDS/IP (generic) The un-supported ones will reflect "cluster interconnect IPC version:Oracle UDP/IP (generic)" If either the Grid Infrastructure or Database Homes are shoing UDP you will need to relink them with RDS. Please note this is an offline operation
1) As the ORACLE_HOME/GI_HOME owner, stop all resources (database, listener, ASM etc) that's running from the home. When stopping database, use NORMAL or IMMEDIATE option.
Please note that exachk will catch this condition for all Oracle Homes known to the OCR. If the home iss not known to the OCR then the databases running out of these homes need to be indicated by passing the dbnames flag into exachk. This is well documented in the Exachk Users Guide which comes down with the software. Also note as a safety net we are adding an enhancement to SSCTUNER to check for this condition as well.
References<BUG:19375096> - SSCTUNER SHOULD CHECK THAT DB HOMES ARE LINKED AGAINST RDS, INCLUDING DB ZONES.<BUG:19341923> - ASM_XDMG_+ASM2 PROCESS HANGING IN MUNMAP <BUG:19362035> - CSS ABORTS ON SECOND NODE AS ABORTING FROM THREAD GMCLIENTLISTENER <BUG:17997507> - 11.2.0.4: XDMG PROCESS EXITS WITHOUT CLOSING SKGXP CONTEXT WHEN ORA-15311 IS SEE <NOTE:1374110.1> - Top 5 issues for Instance Eviction <NOTE:1676719.1> - Clusterware do not start on ALL nodes after reboot <NOTE:330358.1> - Oracle Clusterware 10gR2/ 11gR1/ 11gR2/ 12cR1 Diagnostic Collection Guide Attachments This solution has no attachment |
||||||||||||||||||||
|