Asset ID: |
1-72-2166436.1 |
Update Date: | 2016-07-28 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
2166436.1
:
SuperCluster : RAC : CRS not able to rejoin the cluster following node eviction or reboot due to CSSD.
Related Items |
- Solaris Operating System
- Oracle SuperCluster T5-8 Full Rack
- SPARC SuperCluster T4-4 Full Rack
- Oracle SuperCluster M7 Hardware
- Oracle Database - Enterprise Edition
- Oracle SuperCluster T5-8 Half Rack
- SPARC SuperCluster T4-4 Half Rack
- Oracle SuperCluster M6-32 Hardware
- SPARC SuperCluster T4-4
|
Related Categories |
- PLA-Support>Eng Systems>Exadata/ODA/SSC>SPARC SuperCluster>DB: SuperCluster_EST
- _Old GCS Categories>ST>Server>Engineered Systems>SPARC SuperCluster>Install
|
In this Document
Applies to:
Oracle SuperCluster M6-32 Hardware - Version All Versions and later
Solaris SPARC Operating System - Version 11.1 to 11.2 [Release 11.0]
Oracle SuperCluster T5-8 Full Rack - Version All Versions and later
SPARC SuperCluster T4-4 - Version All Versions and later
Oracle SuperCluster M7 Hardware - Version All Versions and later
Oracle Solaris on SPARC (64-bit)
Grid Infrastructure on Oracle SuperCluster. Evictions. CRS Startup. CSSD. GIPCD.
Symptoms
RAC Node CRS and CCSD will not restart following a node eviction.
OCSSD log: will have entries similar to
[ CSSD][28]clssnmvDHBValidateNcopy: node 1, nodename , has a disk HB, but no network HB, DHB has rcfg 329583179, wrtcnt, 50671015, LATS 2293246942, lastSeqNo 50671012, uniqueness 1434089571, timestamp 1446423142/3810007342
GIPCD log: will have entries similar to
[GIPCDMON][7] gipcdMonitorCssCheck: Failure querying CSS NodeList ret 3
[GIPCDMON][7] gipcdMonitorFailZombieNodes: Forcing zombie failure, node nodename, now 0, last 2267807392,
[GIPCDNDE][6] gipcdNodeDisconnect: Deleting information for remote con host(nodename), id (0000000000000000,0000000000000912)
[GIPCDNDE][6] gipcdNodeDisconnect: Deleting information of all clients on remote endps (0000000000000000,0000000000000912), (0000000000000000, 0000000000000a0d)
[GIPCDCLT][5] gipcdDeleteInterfaces: No interface object exist in the map for haname(1a02-b661-3eaa-f863)
# crsctl status res -t -init
will reflect that cssd has not started
pstack of cssd process , before it times out, will look similar to
pstack main stack being looped through
------------ lwp# 36 / thread# 36 ---------------
ffffffff7ed473a4 lwp_park (0, 0, 0)
ffffffff7ed40ba4 cond_wait_queue (100398050, 100c7edb0, 0, 0,
ffffffff7ee8cfc0, 0) + 4c
ffffffff7ed41198 cond_wait (100398050, 100c7edb0, deadbeef, 1092bc, 1, 1) +
10
ffffffff57967708 sltspcwait (100399f10, 100cd3dd8, 100cd3dc0, 1, 0, 0) + 8
ffffffff59c1e260 clsucvwait (100399f10, 100cd3dd8, 100cd3dc0, 0, 0,
ffffffff7ee86000) + 28
000000010009f710 clssgmProcDeadClntq (10191bd10, 100cd3d88, 1,
ffffffff7e509a40, 0, fffc00) + 1e8
000000010009f4a8 clssgmDeathChkThread (10191bd10, ff000000,
ffffffff7ee8cfc0, 1, 0, fffc00) + 190
00000001000117f8 clssscthrdmain (10191bd10, 0, 0, 100011718, 0, 1) + e0
ffffffff7ed47328 _lwp_start (0, 0, 0, 0, 0, 0)
------------ lwp# 37 / thread# 37 ---------------
ffffffff7ed4b794 portfs (5, 60, 103aea010, 0, 0, 0)
ffffffff59ebec20 sgipcwEpollWaitHelper (0, 1003a0610, 0, 0, 0, 0) + 2b8
ffffffff59eb8e4c sgipcwWait (0, 1000, 0, 1003a0610, 1000, 0) + 3fc
ffffffff59d62f7c gipcWaitOsd (0, 1000, 1003a0610, 0, 0, ffffffff79fe0c4c) +
18c
ffffffff59d4c5a0 gipcInternalWaitEpoll (ffffffff79fe1298, 10191b450,
ffffffff79fe1860, 200, ffffffff79fe185c, ffffffff) + 1278
ffffffff59d46420 gipcInternalWait (ffffffff, 10191b450, 100105533,
1000ddb78, bbe, ffffffff) + 1c40
ffffffff59ce60b8 gipcWaitF (ffffffff, 1000ddb78, 100105533, 1000ddb78, bbe,
200) + b58
00000001000150d4 clssscSelect (10191b590, 100cd3a20, 977, 100060d88, 0,
fffc00) + 9c
000000010005aa6c clssgmPeerListener (100cd3a20, ff000000, ffffffff7ee8cfc0,
1, 0, fffc00) + 1d34
00000001000117f8 clssscthrdmain (10191b590, 0, 0, 100011718, 0, 1) + e0
ffffffff7ed47328 _lwp_start (0, 0, 0, 0, 0, 0)
------------ lwp# 38 / thread# 38 ---------------
ffffffff7ed473a4 lwp_park (0, ffffffff79dfb540, 0)
ffffffff7ed40ba4 cond_wait_queue (100f73510, 10118bc80, ffffffff79dfb540, 0,
0, 0) + 4c
ffffffff7ed41080 cond_wait_common (100f73510, 10118bc80, ffffffff79dfb540,
2946, 0, 0) + 28c
ffffffff7ed41250 __cond_timedwait (100f73510, 10118bc80, ffffffff79dfb6b8,
0, ffffffff79dfb540, 0) + 60
ffffffff7ed41314 cond_timedwait (100f73510, 10118bc80, ffffffff79dfb6b8, 0,
0, 1d12880) + 14
ffffffff57967814 sltspctimewait (20, 101188998, 101188980, 1, 0, 1) + f4
ffffffff59c1e2ec clsucvtimewait (100399f10, 101188998, 101188980, 3e8,
10000000, 0) + 34
00000001000ab000 clssnmWaitThread (10191a910, 101188010, 2, 3e8, 0, 2) + 2f8
00000001000a7d98 clssnmPollingThread (1000faaa8, 10118cb4c, 2ed, 2ed, 0,
fffc00) + 770
00000001000117f8 clssscthrdmain (10191a910, 0, 0, 100011718, 0, 1) + e0
ffffffff7ed47328 _lwp_start (0, 0, 0, 0, 0, 0)
Changes
None
Cause
Bug 21327402 - OCSSD spins while waiting for partial send completion <Document 21327402.8>
Solution
Fix for Bug 21327402 which is provided in the 12.1.0.2.GIPSU:160419 and 11.2.0.4.GIPSU:160419 and beyond . First included in the APR 2016 QFSDP.
References
<NOTE:1452277.1> - SuperCluster Critical Issues
Attachments
This solution has no attachment