Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1200103.1
Update Date:2013-12-20
Keywords:

Solution Type  Problem Resolution Sure

Solution  1200103.1 :   Completed addnode in Exadata DB Machine but CRS fails to start after running root.sh  


Related Items
  • Exadata X3-2 Hardware
  •  
  • Exadata X3-8 Hardware
  •  
  • Exadata Database Machine X2-2 Hardware
  •  
  • Oracle Exadata Hardware
  •  
  • Oracle Database - Enterprise Edition
  •  
  • Exadata Database Machine V2
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Exadata>DB: Exadata_EST
  •  




Created from <SR 3-2057093531>

Applies to:

Oracle Exadata Hardware - Version 11.2.0.1 and later
Oracle Database - Enterprise Edition - Version 11.2.0.1 and later
Exadata Database Machine V2 - Version All Versions and later
Exadata Database Machine X2-2 Hardware - Version All Versions and later
Exadata X3-2 Hardware - Version All Versions and later
Linux x86-64

Symptoms


Performing an addnode operation in Exadata.

Last step of the addnode requires you to run root.sh on that added node, CRS however fails to startup

Clusterware CSSD log shows ->


2010-09-02 11:23:40.730: [GIPCGMOD][4054673488]gipcmodGipcPassInitializeNetwork: using host information 192.168.14.2
2010-09-02 11:23:40.730: [ CSSD][4054673488]clssnmOpenGIPCEndp: listening on gipc://dsdwdb02:nm_dsdw#192.168.14.2#16220
2010-09-02 11:23:40.730: [ CSSD][4054673488]clssnmInitNMInfo: Initializing uniqueness 0
2010-09-02 11:23:40.730: [ CSSD][4054673488]clssnmReadDiscoveryProfile: voting file discovery string(o/*/*)
2010-09-02 11:23:40.730: [ CSSD][4054673488]clssnkInit: NK generic layer initializing.
2010-09-02 11:23:40.731: [ CSSD][4054673488]clssscGetParameterOLR: OLR fetch for parameter GIPC NM trclvl (12) failed with rc 21
2010-09-02 11:23:41.042: [ SKGFD][1138522432]ERROR: -8(OS Error 1 (if_not_found,skgxpvaddr9,requested interface 192.168.14.1 not found. Check output from ifconfig command,Error 0)
)
2010-09-02 11:23:41.042: [ SKGFD][1138522432]ERROR: -10(OSS Operation oss_initialize failed with error 4 [Network initialization failed]
)
2010-09-02 11:23:41.042: [ CSSD][1138522432]clsssnmvDDiscThread: Unable to create clsf context
2010-09-02 11:23:41.042: [ CSSD][1138522432]###################################
2010-09-02 11:23:41.042: [ CSSD][1138522432]clssscExit: CSSD aborting from thread clssnmvDDiscThread
2010-09-02 11:23:41.042: [ CSSD][1138522432]###################################
2010-09-02 11:23:41.042: [ CSSD][1138522432]

Changes

Additional task for Exadata when performing the addnode is to ensure the relevant cell parameters are copied across to the remote node.

Files in /etc/oracle/cell/network-config need to be correct.

Cause

Extract from the ocssd.log on the added node highlights an interface lookup that fails ->


2010-09-02 11:23:41.042: [ SKGFD][1138522432]ERROR: -8(OS Error 1 (if_not_found,skgxpvaddr9,requested interface 192.168.14.1 not found. Check output from ifconfig command,Error 0))
2010-09-02 11:23:41.042: [ SKGFD][1138522432]ERROR: -10(OSS Operation oss_initialize failed with error 4 [Network initialization failed]


GPNP does pass the correct interface however the lookup for the local cellinit.ora file dont match what is expected from ifconfig.


Confirm the entry in file '/etc/oracle/cell/network-config/cellinit.ora'
match's the bond0 address on the DB machine for the added node.

E.g.


% cat /etc/oracle/cell/network-config/cellinit.ora
192.168.14.1/22

% ifconfig bond0
bond0 Link encap:InfiniBand HWaddr 80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00
inet addr:192.168.14.2 Bcast:192.168.15.255 Mask:255.255.252.0
inet6 addr: fe80::221:2800:13e:8d9b/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST MTU:65520 Metric:1
RX packets:42772 errors:0 dropped:0 overruns:0 frame:0
TX packets:4757 errors:0 dropped:33 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:3501294 (3.3 MiB) TX bytes:1020611 (996.6 KiB)


You can see that 192.168.14.1/22 does not match the inet addr value!

Solution

1). Shutdown CRS on the problem Node 2

% crsctl stop crs -f

2). Modify -> /etc/oracle/cell/network-config/cellinit.ora

ensure the address match's bond0 for the node as it should read ->

192.168.14.2/22

3). Start CRS

% crsctl start crs

4). Confirm with 'crsctl stat res' & alert.log's for CRS that re-configuration did complete successfully.


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback