Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-79-1665754.1
Update Date:2018-01-08
Keywords:

Solution Type  Predictive Self-Healing Sure

Solution  1665754.1 :   ODA (Oracle Database Appliance): GI Patching Troubleshooting  


Related Items
  • Oracle Database Appliance
  •  
  • Oracle Database Appliance Software
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Database Appliance>DB: ODA_EST
  •  




In this Document
Purpose
Scope
Details
 GI Patching Troubleshooting
 Acronyms, Terms and Procedures Used in This Note
 Relevant Log and Trace location
 Location of OAK GI patching needed log/trace
 Location of runInstaller(OUI) needed log/trace
 Location of needed GI log/trace
 Case Studies
 1. GI update is failing due to invalid response file
 2. Out of place upgrade failing
 3. During GI update the rootupgrade.sh did not complete successfully (due to OC4J resource failed to start)
 4. During GI update the rootupgrade.sh did not complete successfully (due to OC4J resource failed to stop)
 5. During GI update the rootupgrade.sh did not complete successfully (due to ASM not able to start up succesfully)
 6. Successfully GI upgrade but ASM is crashing with ODA-600 [kfdJoin3]
 7. GI upgrade failure after a previous failure
References


Applies to:

Oracle Database Appliance - Version All Versions to All Versions [Release All Releases]
Oracle Database Appliance Software - Version 2.2.0.0 to 2.10.0.0
Information in this document applies to any platform.
***Checked for relevance on 03-AUG-2016***

Purpose

There are cases for which the GI patching fails and the customer will need to reapply the GI patching.
How to debug where the GI patching fail, where are the logs, how the inventory.xml should be, it is the intent of this article

Scope

Basic Grid Infrastructure & oakcli knowledge is required.

Details

GI Patching Troubleshooting

Acronyms, Terms and Procedures Used in This Note

Refer to note 1374275.1 for abbreviations, acronyms, terms and Procedures used in this note.

Relevant Log and Trace location

Location of OAK GI patching needed log/trace

Relevant OAK GI patching phase logs/traces are the following

- output of the comand (from both nodes)

oakcli show version -detail

and

oakcli show disk

 - Files from /opt/oracle/oak/log/<nodename>/patch/<patch_version>/*

ie:
/opt/oracle/oak/log/odanode1/patch/2.9.0.0.0/*

- Files from /opt/oracle/oak/pkgrepos/System/<patch_version>/bin/tmp/*

ie:
/opt/oracle/oak/pkgrepos/System/2.9.0.0.0/bin/tmp/*

Files from /opt/oracle/oak/onecmd/tmp timestamp with the patch date

ls -lt /opt/oracle/oak/onecmd/tmp | grep '<timestamp path date>'

ie:
ls -lt /opt/oracle/oak/onecmd/tmp | grep 'Apr 24'
Location of runInstaller(OUI) needed log/trace

- Collect the following file: /u01/app/oraInventory/ContentsXML/inventory.xml

- From both node collect the output of the command:

opatch lsinventory -detail
Location of needed GI log/trace

- Files from /u01/app/<GI version>/grid/cfgtoollogs/* from both old and new GI homes:

ie:
/u01/app/11.2.0.3/grid/cfgtoollogs/*
/u01/app/11.2.0.4/grid/cfgtoollogs/*

- Files  from /u01/app/<GI version>/grid/log/<nodename>/* from both old and new GI homes:

ie:
/u01/app/11.2.0.3/grid/log/odanode1/*
/u01/app/11.2.0.4/grid/log/odanode1/*
Note:
starting with GI 12.1 the logs are under
    /u01/app/grid/crsdata/<nodename>/*
    /u01/app/grid/diag/crs/<nodename>/crs/trace/*

  

- Files from /u01/app/<GI version>/grid/install/* from new GI homes: 

ie:
/u01/app/11.2.0.4/grid/install/*

- File from /u01/app/<GI version>/grid/inventory/* from both old and new GI homes:

ie:
/u01/app/11.2.0.3/grid/inventory/*
/u01/app/11.2.0.4/grid/inventory/*

  

- Output of the following command:

cat /etc/oracle/olr.loc

- The ASM alert.log from both nodes:

ie:
(node1) /u01/app/grid/diag/asm/+asm/+ASM1/trace/
(node2) /u01/app/grid/diag/asm/+asm/+ASM2/trace/

 

Note: to help you on collecting the above logs/traces within one command, you can issue the attached bash script on both nodes: GIupdiag.sh

# ./GIupdiag.sh -h
Usage:
GIupDiag.sh
GIupDiag.sh <Patching Date format YYYYMMDD>

i.e.:
    GIupDiag.sh 20120928                # it will collect log/trace above that day

Note:
    default <Patching Date> is 20110101 (a big dump is expected)

GIupdiag.sh will create a compressed file under /tmp/GIupDiag_<hostname>_<timestemp>.tar.gz 

  

Case Studies

1. GI update is failing due to invalid response file

During the GI update you are getting an error message like the following:

(...)
INFO   : Running on the local node: /bin/su grid -c /tmp/opfile
INFO   : Background process 28421 (node: zaoda101) gets done with the exit code 0
INFO   : This is root, will become grid and run: /bin/su grid -c /usr/bin/ssh -l grid zaoda102 /tmp/opfile
INFO   : Background process 28490 (node: zaoda102) gets done with the exit code 0
Inventory load failed... OPatch cannot load inventory for the given Oracle Home.
Possible causes are:
   Oracle Home dir. path does not exist in Central Inventory
   Oracle Home is a symbolic link
   Oracle Home inventory is corrupted
LsInventorySession failed: OracleHomeInventory gets null oracleHomeInfo
Inventory load failed... OPatch cannot load inventory for the given Oracle Home.
Possible causes are:
   Oracle Home dir. path does not exist in Central Inventory
   Oracle Home is a symbolic link
   Oracle Home inventory is corrupted
LsInventorySession failed: OracleHomeInventory gets null oracleHomeInfo
Inventory load failed... OPatch cannot load inventory for the given Oracle Home.
Possible causes are:
   Oracle Home dir. path does not exist in Central Inventory
   Oracle Home is a symbolic link
   Oracle Home inventory is corrupted
LsInventorySession failed: OracleHomeInventory gets null oracleHomeInfo
ERROR: 2012-06-12 08:10:24: Failed to upgrade the GI to 11.2.0.3.2(13696216,13696251)
...

and on  /u01/app/<GI version>/grid/cfgtoollogs/

INFO: Createing properties map - in ExtendedPropertyFileFormat.loadPropertiesMap()
Jun 12, 2012 8:08:34 AM oracle.install.commons.util.exception.DefaultErrorAdvisor$AbstractErrorAdvisor getDetailedMessage
SEVERE: [FATAL] [INS-10105] The given response file /opt/oracle/oak/pkgrepos/System/2.2.0.0.0/bin/tmp/grid.rsp is not valid.
   CAUSE: Syntactically incorrect response file. Either unexpected variables are specified or expected variables are not specified in the response file.
   ACTION: Refer the latest product specific response file template
   SUMMARY:
       - cvc-datatype-valid.1.2.1: '1521,1522' is not a valid value for 'integer'.
cvc-type.3.1.3: The value '1521,1522' of element 'oracle.install.crs.config.gpnp.scanPort' is not valid.

oracle.install.commons.base.driver.common.InstallerException: [INS-10105] The given response file /opt/oracle/oak/pkgrepos/System/2.2.0.0.0/bin/tmp/grid.rsp is not valid.
        at oracle.install.commons.base.driver.common.Installer.validateResponseFile(Installer.java:375)
        at oracle.install.commons.base.driver.common.Installer.run(Installer.java:327)
        at oracle.install.ivw.common.util.OracleInstaller.run(OracleInstaller.java:106)
        at oracle.install.commons.util.Application.startup(Application.java:891)
        at oracle.install.commons.flow.FlowApplication.startup(FlowApplication.java:165)
        at oracle.install.commons.flow.FlowApplication.startup(FlowApplication.java:182)
        at oracle.install.commons.base.driver.common.Installer.startup(Installer.java:348)
        at oracle.install.ivw.crs.driver.CRSConfigWizard.startup(CRSConfigWizard.java:84)
        at oracle.install.ivw.crs.driver.CRSConfigWizard.main(CRSConfigWizard.java:91)
Caused by: java.lang.Exception: cvc-datatype-valid.1.2.1: '1521,1522' is not a valid value for 'integer'.
cvc-type.3.1.3: The value '1521,1522' of element 'oracle.install.crs.config.gpnp.scanPort' is not valid.

        at oracle.install.commons.util.XmlSupport.validate(XmlSupport.java:110)
        at oracle.install.commons.bean.xml.XmlBeanStoreFormat.validate(XmlBeanStoreFormat.java:201)
        at oracle.install.commons.bean.xml.PropertyFileFormat.validate(PropertyFileFormat.java:144)
        at oracle.install.commons.base.driver.common.Installer.validateResponseFile(Installer.java:373)
        ... 8 more
Jun 12, 2012 8:08:34 AM oracle.install.commons.util.exception.DefaultErrorAdvisor$AbstractErrorAdvisor advise
INFO: Advice is ABORT
Jun 12, 2012 8:08:34 AM oracle.install.commons.util.exception.DefaultExceptionHandler handleException
SEVERE: Unconditional Exit
Jun 12, 2012 8:08:34 AM oracle.install.commons.util.ExitStatusSet add
INFO: Adding ExitStatus FAILURE to the exit status set

In this case you should follow the note ODA (Oracle Database Appliance): GI update is failing with oraInventory corruption (Doc ID 1466664.1)

2. Out of place upgrade failing

The "Out of place" upgrade has the following procedure:

Step  1)    Create a new  GI home.
Step  2)    Run clone.pl from  new GI home.
Step  3)    Stop the crs stack.
Step  4)    Run  gi_home/crs/config/config.sh  from new GI home
Step  5)    Run rootupgrade.sh from new GI home.
Step  6)    update the inventory using:
            runInstaller -updateNodeList ORACLE_HOME=new_gi_home CRS=\"false\" -local
            runInstaller -updateNodeList ORACLE_HOME=new_gi_home CRS=\"true\"  -local
Step  7)    Detach the old gi_home  detachHome.sh -silent -local

In same cases if something wrong happens during step 2, 4 or 5, the inventory is updated with the new GI home, detaching the old home.
- step 2 could failed due to previous failed install: directory pre-existed.
- step 4 could failed due to two scan listener ports, invalid grid.rsp (see above)
- step 5 could failed due to OCR inaccessible
In this condition we still have the old GI running but the inventory si pointing to the new GI home.


You should verify:
- which is the active GI version issuing the command (from both nodes):

crsctl query crs activeversion

- which is the GI version software issuing the command (from both nodes):

crsctl query crs softwareversion

- check the inventory entry if it's poiting to the right GI home (on both nodes)

see /u01/app/oraInventory/ContentsXML/inventory.xml

Example, the active CRS is 11.2.0.3.6 but the inventory is poiting to the new GI home that's 11.2.0.4.0:

# crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.3.6]


# cat  /u01/app/oraInventory/ContentsXML/inventory.xml
<?xml version="1.0" standalone="yes" ?>
<!-- Copyright (c) 1999, 2011, Oracle. All rights reserved. -->
<!-- Do not modify the contents of this file by hand. -->
<INVENTORY>
<VERSION_INFO>
   <SAVED_WITH>11.2.0.3.0</SAVED_WITH>
   <MINIMUM_VER>2.1.0.6.0</MINIMUM_VER>
</VERSION_INFO>
<HOME_LIST>
<HOME NAME="OraGrid11gR4" LOC="/u01/app/11.2.0.4/grid" TYPE="O" IDX="1" CRS="true">
   <NODE_LIST>
      <NODE NAME="rwsoda309c1n1"/>
      <NODE NAME="rwsoda309c1n2"/>
   </NODE_LIST>
</HOME>
<HOME NAME="OraDb11204_home1" LOC="/u01/app/oracle/product/11.2.0.4/dbhome_1" TYPE="O" IDX="2">
   <NODE_LIST>
      <NODE NAME="rwsoda309c1n1"/>
      <NODE NAME="rwsoda309c1n2"/>
   </NODE_LIST>
</HOME>
<HOME NAME="OraDb11203_home1" LOC="/u01/app/oracle/product/11.2.0.3/dbhome_1" TYPE="O" IDX="3">
   <NODE_LIST>
      <NODE NAME="rwsoda309c1n1"/>
      <NODE NAME="rwsoda309c1n2"/>
   </NODE_LIST>
</HOME>
</HOME_LIST>
<COMPOSITEHOME_LIST>
</COMPOSITEHOME_LIST>
</INVENTORY>

Note also as into the inventory the OLD home can be marked as "REMOVED":

<HOME NAME="OraGrid11gR3" LOC="/u01/app/11.2.0.3/grid" TYPE="O" IDX="1" REMOVED="T"/>

In case we are failing on step 5 (failing rootupgrade.sh) due to OCR inaccessible:

crsctl query crs activeversion failed with PROC-26 Error while accessing the physical storage
ORA-29701: unable to connect to Cluster Synchronization Service

You could try a reboot of both node. If this does not help, we should investigate if there is any disk mode_status missing, header_status unknown, mount_status offline/closed. In this case you should collect the output of the following commands:

oakcli show disk

SQL query against ASM instance executed as grid as sysasm:

set pages 40000
set lines 300
col PATH for a40
SELECT GROUP_NUMBER,DISK_NUMBER,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS,STATE,OS_MB,TOTAL_MB,FREE_MB,NAME,FAILGROUP,PATH FROM V$ASM_DISK order by path;

Moreover you should provide the log/trace described into "Location of OAK GI patching log/trace" section.

3. During GI update the rootupgrade.sh did not complete successfully (due to OC4J resource failed to start)

During the GI update process you are getting an error message like:

(...)

INFO: 2014-08-14 06:31:22: Running root scripts
ERROR  : Ran '/usr/bin/ssh -l root oda2
/u01/app/12.1.0.2/grid/rootupgrade.sh' and it returned code(25) and output
is:
         Check
/u01/app/12.1.0.2/grid/install/root_oda2_2014-08-14_06-43-45.log for the
output of root script

error at<Command = /usr/bin/ssh -l root oda2
/u01/app/12.1.0.2/grid/rootupgrade.sh> and errnum=<25>
ERROR  : Command = /usr/bin/ssh -l root oda2
/u01/app/12.1.0.2/grid/rootupgrade.sh did not complete successfully. Exit
code 25 #Step -1#
Exiting...

..........done

INFO: GI patching summary on node: zaoda1

INFO: GI patching summary on node: zaoda2

INFO: Running post-install scripts
..........done

INFO: Started Oakd
INFO: Setting up the SSH
..........done

and checking into the above log, in this example "/u01/app/12.1.0.2/grid/install/root_oda2_2014-08-14_06-43-45.log" you are observing the following:

(...)
Started to upgrade the Oracle Clusterware. This operation may take a few
minutes.
Started to upgrade the OCR.
Started to upgrade the CSS.
The CSS was successfully upgraded.
Started to upgrade Oracle ASM.
Started to upgrade the CRS.
The CRS was successfully upgraded.
Successfully upgraded the Oracle Clusterware.
Oracle Clusterware operating version was successfully set to 12.1.0.2.0
2014/08/14 06:52:35 CLSRSC-479: Successfully set Oracle Clusterware active
version
^[[0m
2014/08/14 06:52:38 CLSRSC-476: Finishing upgrade of resource types
^[[0m
2014/08/14 06:53:05 CLSRSC-482: Running command: 'upgrade model  -s
11.2.0.4.0 -d 12.1.0.2.0 -p last'
^[[0m
2014/08/14 06:53:05 CLSRSC-477: Successfully completed upgrade of resource
types
^[[0m
2014/08/14 07:03:19 CLSRSC-1003: Failed to start resource OC4J
^[[0m
2014/08/14 07:03:20 CLSRSC-1007: Failed to start OC4J resource
^[[0m
Died at /u01/app/12.1.0.2/grid/crs/install/crsupgrade.pm line 4214.
The command '/u01/app/12.1.0.2/grid/perl/bin/perl
-I/u01/app/12.1.0.2/grid/perl/lib -I/u01/app/12.1.0.2/grid/crs/install


At this time the GI could be upgraded anyway:

# /u01/app/12.1.0.2/grid/bin/crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [12.1.0.2.0]
# /u01/app/12.1.0.2/grid/bin/crsctl query crs softwareversion
Oracle Clusterware version on node [zaoda1] is [12.1.0.2.0]

and also "oakcli show version -detail" is showing the right entry:

# oakcli show version -detail |grep GI
              GI_HOME                   12.1.0.2.0                Up-to-date    

In this case you should modify the inventory '/u01/app/oraInventory/ContentsXML/inventory.xml'

# cat /u01/app/oraInventory/ContentsXML/inventory.xml
<?xml version="1.0" standalone="yes" ?>
<!-- Copyright (c) 1999, 2014, Oracle and/or its affiliates.
All rights reserved. -->
<!-- Do not modify the contents of this file by hand. -->
<INVENTORY>
<VERSION_INFO>
   <SAVED_WITH>12.1.0.2.0</SAVED_WITH>
   <MINIMUM_VER>2.1.0.6.0</MINIMUM_VER>
</VERSION_INFO>
<HOME_LIST>
<HOME NAME="OraGrid11gR4" LOC="/u01/app/11.2.0.4/grid" TYPE="O" IDX="1" CRS="true">
   <NODE_LIST>
      <NODE NAME="zaoda1"/>
      <NODE NAME="zaoda2"/>
   </NODE_LIST>
</HOME>
<HOME NAME="OraDb11204_home1" LOC="/u01/app/oracle/product/11.2.0.4/dbhome_1"
TYPE="O" IDX="2">
   <NODE_LIST>
      <NODE NAME="zaoda1"/>
      <NODE NAME="zaoda2"/>
   </NODE_LIST>
</HOME>
<HOME NAME="OraGrid12102" LOC="/u01/app/12.1.0.2/grid" TYPE="O" IDX="3">
   <NODE_LIST>
      <NODE NAME="zaoda1"/>
      <NODE NAME="zaoda2"/>
   </NODE_LIST>
</HOME>
</HOME_LIST>
<COMPOSITEHOME_LIST>
</COMPOSITEHOME_LIST>
</INVENTORY>

as the CRS active home should be the new one, in this case the "OraGrid12102" and the old one should be signed as removed. The right inventory should be as following:

# cat /u01/app/oraInventory/ContentsXML/inventory.xml
<?xml version="1.0" standalone="yes" ?>
<!-- Copyright (c) 1999, 2014, Oracle and/or its affiliates.
All rights reserved. -->
<!-- Do not modify the contents of this file by hand. -->
<INVENTORY>
<VERSION_INFO>
   <SAVED_WITH>12.1.0.2.0</SAVED_WITH>
   <MINIMUM_VER>2.1.0.6.0</MINIMUM_VER>
</VERSION_INFO>
<HOME_LIST>
<HOME NAME="OraGrid11gR4" LOC="/u01/app/11.2.0.4/grid" TYPE="O" IDX="1" REMOVED="T">
   <NODE_LIST>
      <NODE NAME="zaoda1"/>
      <NODE NAME="zaoda2"/>
   </NODE_LIST>
</HOME>
<HOME NAME="OraDb11204_home1" LOC="/u01/app/oracle/product/11.2.0.4/dbhome_1"
TYPE="O" IDX="2">
   <NODE_LIST>
      <NODE NAME="zaoda1"/>
      <NODE NAME="zaoda2"/>
   </NODE_LIST>
</HOME>
<HOME NAME="OraGrid12102" LOC="/u01/app/12.1.0.2/grid" TYPE="O" IDX="3" CRS="true">
   <NODE_LIST>
      <NODE NAME="zaoda1"/>
      <NODE NAME="zaoda2"/>
   </NODE_LIST>
</HOME>
</HOME_LIST>
<COMPOSITEHOME_LIST>
</COMPOSITEHOME_LIST>
</INVENTORY>   
4. During GI update the rootupgrade.sh did not complete successfully (due to OC4J resource failed to stop)

You where performing the GI upgrade from version 11.2.0.3.6 (ODA v. 2.6) to 12.1.0.2 and you are getting the following error mesages (/opt/oracle/oak/log/<hostname>/patch/12.1.2.0.0/gidbupdate_xxxx.log):

(...)
2014-10-28 09:55:14: Running config.sh
2014-10-28 09:55:14: INFO : Building up the config.sh response file...
2014-10-28 09:55:15: INFO : This is root, will become grid and run: /bin/su grid -c /usr/bin/ssh -l grid srv-odabenn1 /opt/oracle/oak/onecmd/tmp/gridconfig.sh
2014-10-28 09:55:15: INFO : Running on the local node: /bin/su grid -c /opt/oracle/oak/onecmd/tmp/gridconfig.sh
2014-10-28 09:56:25: INFO:  Running root scripts
2014-10-28 09:56:25: INFO : Running /u01/app/12.1.0.2/grid/rootupgrade.sh on srv-odabenn1
2014-10-28 09:56:26: INFO : Running as root: /usr/bin/ssh -l root srv-odabenn1 /u01/app/12.1.0.2/grid/rootupgrade.sh
2014-10-28 10:11:21: ERROR : Ran '/usr/bin/ssh -l root srv-odabenn1 /u01/app/12.1.0.2/grid/rootupgrade.sh' and it returned code(25) and output is:
         Check /u01/app/12.1.0.2/grid/install/root_srv-odabenn1_2014-10-28_09-56-26.log for the output of root script

2014-10-28 10:11:21: ERROR : Command = /usr/bin/ssh -l root srv-odabenn1 /u01/app/12.1.0.2/grid/rootupgrade.sh did not complete successfully. Exit code 25 #Step -1#

and checking into the above log, in this example "/u01/app/12.1.0.2/grid/install/root_srv-odabenn1_2014-10-28_09-56-26.log" you are observing the following:  

(...)
clscfg: EXISTING configuration version 5 detected.
clscfg: version 5 is 11g Release 2.
Successfully taken the backup of node specific configuration in OCR.
Successfully accumulated necessary OCR keys.
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
2014/10/28 10:11:17 CLSRSC-1009: failed to stop resource OC4J
[0m
2014/10/28 10:11:18 CLSRSC-1006: Failed to create the wallet APPQOSADMIN or associated users during upgrade.
[0m
Died at /u01/app/12.1.0.2/grid/crs/install/crsupgrade.pm line 4094.
The command '/u01/app/12.1.0.2/grid/perl/bin/perl -I/u01/app/12.1.0.2/grid/perl/lib -I/u01/app/12.1.0.2/grid/crs/install /u01/app/12.1.0.2/grid/crs/install/rootcrs.pl  -upgrade' execution failed

at this point on node 0

$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.3.0]

$  crsctl query crs softwareversion
Oracle Clusterware version on node [srv-odabenn1] is [12.1.0.2.0]

and on node 1:

$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [11.2.0.3.0]

$  crsctl query crs softwareversion
Oracle Clusterware version on node [srv-odabenn1] is [11.2.0.3.0] 

Then you need manually upgrade the GI:

1. stop the running Databases (you may need to use sqlplus and not srvctl)

2. complete the GI upgrade on node 0:

as root
# cd <12.1.0.1 GRID_HOME>
# ./rootupgrade.sh

3. check the CRS active version on node 0

$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [12.1.0.2.0]

$  crsctl query crs softwareversion
Oracle Clusterware version on node [srv-odabenn1] is [12.1.0.2.0]

4. perform the GI upgrade on node 1

as root
# cd <12.1.0.1 GRID_HOME>
# ./rootupgrade.sh 

5. check the CRS active version on node 1

$ crsctl query crs activeversion
Oracle Clusterware active version on the cluster is [12.1.0.2.0]

$  crsctl query crs softwareversion
Oracle Clusterware version on node [srv-odabenn1] is [12.1.0.2.0]

6. change the inventory.xml on both nodes, the 12.1.0.2 GridHome will be the CRS="true" 

# cat /u01/app/oraInventory/ContentsXML/inventory.xml
<?xml version="1.0" standalone="yes" ?>
<!-- Copyright (c) 1999, 2014, Oracle and/or its affiliates.
All rights reserved. -->
<!-- Do not modify the contents of this file by hand. -->
<INVENTORY>
<VERSION_INFO>
   <SAVED_WITH>12.1.0.2.0</SAVED_WITH>
   <MINIMUM_VER>2.1.0.6.0</MINIMUM_VER>
</VERSION_INFO>
<HOME_LIST>
<HOME NAME="OraGrid11gR3" LOC="/u01/app/11.2.0.3/grid" TYPE="O" IDX="1" REMOVED="T">
   <NODE_LIST>
      <NODE NAME="zaoda1"/>
      <NODE NAME="zaoda2"/>
   </NODE_LIST>
</HOME>
<HOME NAME="OraDb11204_home1" LOC="/u01/app/oracle/product/11.2.0.4/dbhome_1"
TYPE="O" IDX="2">
   <NODE_LIST>
      <NODE NAME="zaoda1"/>
      <NODE NAME="zaoda2"/>
   </NODE_LIST>
</HOME>
<HOME NAME="OraGrid12102" LOC="/u01/app/12.1.0.2/grid" TYPE="O" IDX="3" CRS="true">
   <NODE_LIST>
      <NODE NAME="zaoda1"/>
      <NODE NAME="zaoda2"/>
   </NODE_LIST>
</HOME>
</HOME_LIST>
<COMPOSITEHOME_LIST>
</COMPOSITEHOME_LIST>
</INVENTORY> 

7. Check "/opt/oracle/oak/install/oakdrun" on both nodes, you should have the following entry:

# cat /opt/oracle/oak/install/oakdrun
start

 
 

5. During GI update the rootupgrade.sh did not complete successfully (due to ASM not able to start up succesfully)

During the GI update process you are getting an error message like:

(...)
SUCCESS: All nodes in /opt/oracle/oak/onecmd/tmp/db_nodes are pingable and
alive.
INFO: 2014-07-24 01:39:40: Installing GI clone
INFO: 2014-07-24 01:51:49: Running root scripts
ERROR  : Ran '/usr/bin/ssh -l root jupiter1
/u01/app/11.2.0.4/grid/rootupgrade.sh' and it returned code(25) and output
is:
         Check
/u01/app/11.2.0.4/grid/install/root_oda1_2014-07-24_01-51-50.log for the
output of root script

error at<Command = /usr/bin/ssh -l root oda1
/u01/app/11.2.0.4/grid/rootupgrade.sh> and errnum=<25>
ERROR  : Command = /usr/bin/ssh -l root oda1
/u01/app/11.2.0.4/grid/rootupgrade.sh did not complete successfully. Exit
code 25 #Step -1#
Exiting...

............done 

and checking into the related log, in this example "/u01/app/11.2.0.4/grid/install/root_oda1_2014-07-24_01-51-50.log" you are observing the following:  

(...)

ASM upgrade has initialized on first node.


OLR initialization - successful
Replacing Clusterware entries in inittab
Start of resource "ora.asm" failed
CRS-2672: Attempting to start 'ora.drivers.acfs' on 'oda1'
CRS-2676: Start of 'ora.drivers.acfs' on 'oda1' succeeded
CRS-2672: Attempting to start 'ora.asm' on 'oda1'
CRS-5017: The resource action "ora.asm start" encountered the following
error:
ORA-03113: end-of-file on communication channel
Process ID: 14459
Session ID: 143 Serial number: 1
. For details refer to "(:CLSN00107:)" in
"/u01/app/11.2.0.4/grid/log/oda1/agent/ohasd/oraagent_grid/oraagent_grid.log".
CRS-2674: Start of 'ora.asm' on 'oda1' failed
CRS-2679: Attempting to clean 'ora.asm' on 'oda1'
CRS-2681: Clean of 'ora.asm' on 'oda1' succeeded
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'oda1'
CRS-2677: Stop of 'ora.drivers.acfs' on 'oda1' succeeded
CRS-4000: Command Start failed, or completed with errors.
Failed to start Oracle Grid Infrastructure stack
Failed to start ASM at /u01/app/11.2.0.4/grid/crs/install/crsconfig_lib.pm
line 1340.
/u01/app/11.2.0.4/grid/perl/bin/perl -I/u01/app/11.2.0.4/grid/perl/lib
-I/u01/app/11.2.0.4/grid/crs/install
/u01/app/11.2.0.4/grid/crs/install/rootcrs.pl execution failed 

At this time, we were not able to complete rootupgrade.sh script, and you won't be able to start GI from old (i.e. 11.2.0.3) GI home. In order to restore the Grid Infrastructure you need issue the following command from the new GI home (in this particular case is 11.2.0.4):

<new GI home>/crs/install/rootcrs.pl -downgrade -force -oldcrshome <old gi home path> -version <old gi version>

i.e.:
/u01/app/11.2.0.4/grid/crs/install/rootcrs.pl -downgrade -force -oldcrshome /u01/app/11.2.0.3/grid -version 11.2.0.3.0

Check the OCR and inspect the ocrdump key [SYSTEM.version.hostnames] to make sure the software version for the existing GI doesn't change:

ocrdump -stdout -keyname SYSTEM.version.hostnames |grep ORATEXT
ORATEXT : 11.2.0.4.0
ORATEXT : 11.2.0.4.0

If the OCR gets changed, it can be resorted from the OCR backup files which are located at <old_GIHOME>/grid/cdata/<cluster_name>:

ocrconfig -restore <old_GIHOME>/grid/cdata/<cluster_name>

ie:
ocrconfig -restore /u01/app/11.2.0.3/grid/cdata/oda1-c/backup00.ocr

(backup00.ocr  day.ocr  week.ocr)

After the "-downgrade" command issued above, the failing node is having the Grid Infrastructure not configured. You should execute rmnode/addnode from the working one.
Such commands should be able to make the right settings w/o to do them manually and making the GI up&running:

- From the working node that you are not deleting, run the following command from the Grid_home/bin directory as root to delete the node from the cluster:

crsctl delete node -n node_to_be_deleted

- To add the node back

$ ./addNode.sh "CLUSTER_NEW_NODES={<node_name>}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={<node VIP hostname>}"

Once you have fixed the reason for which ASM was not able to startup anymore, you can proceed with the GI infrastructure update one more time.

 

6. Successfully GI upgrade but ASM is crashing with ODA-600 [kfdJoin3]

In this case study the GI upgrade went fine but ASM is crashing with ora-600 [kfdJoin3]. As reported on Note 888888.1 this is a know issue and the solution is quite simple. You should update the three files:

1.) "opt/oracle/oak/onecmd/asmapplconf_header.txt"
2.) "opt/oracle/oak/onecmd/asmapplconf_header_V2_J2.txt"
3.) "/opt/oracle/extapi/asmappl.config file"

In all the above files, you should set the value for attr max_disk_count from 500 to100. Then ASM will be able to startup.

 

7. GI upgrade failure after a previous failure

There was a failure in update the Grid Infrastructure ("--gi"). trying to run the upgrade one more time, it is failing with following error :

ERROR  : Ran '/usr/bin/ssh -l root odanode1 /opt/oracle/oak/onecmd/tmp/giclonepl.sh' and it returned code(255) and output is:
        rm -f oracle dbv tstshm maxmem orapwd dbfsize cursize genoci extproc extproc32 hsalloci hsots hsdepxa dgmgrl dumpsga mapsga osh sbttest expdp impdp imp exp sqlldr rman   /u01/app/11.2.0.4/grid/rdbms/lib/dg4odbc mkpatch /u01/app/11.2.0.4/grid/rdbms/lib/dg4adbs /u01/app/11.2.0.4/grid/rdbms/lib/dg4db2 /u01/app/11.2.0.4/grid/rdbms/lib/dg4ifmx /u01/app/11.2.0.4/grid/rdbms/lib/dg4ims  /u01/app/11.2.0.4/grid/rdbms/lib/dg4msql   /u01/app/11.2.0.4/grid/rdbms/lib/dg4sybs /u01/app/11.2.0.4/grid/rdbms/lib/dg4tera /u01/app/11.2.0.4/grid/rdbms/lib/dg4vsam nid adrci wrc extjob extjobo jssu genezi kfod amdu kfed grdcscan uidrvci diskmon setasmgid renamedg orion skgxpinfo /u01/app/11.2.0.4/grid/rdbms/lib/ksms.s /u01/app/11.2.0.4/grid/rdbms/lib/ksms.o
(if /u01/app/11.2.0.4/grid/bin/skgxpinfo | grep rds;\
then \
make -f  /u01/app/11.2.0.4/grid/rdbms/lib/ins_rdbms.mk ipc_rds; \
else \
make -f  /u01/app/11.2.0.4/grid/rdbms/lib/ins_rdbms.mk ipc_g; \
fi)
make[1]: Entering directory `/u01/app/11.2.0.4/grid/rdbms/lib'
rm -f /u01/app/11.2.0.4/grid/lib/libskgxp11.so
cp /u01/app/11.2.0.4/grid/lib//libskgxpg.so /u01/app/11.2.0.4/grid/lib/libskgxp11.so
make[1]: Leaving directory `/u01/app/11.2.0.4/grid/rdbms/lib'
 - Use stub SKGXN library
cp /u01/app/11.2.0.4/grid/lib/libskgxns.so /u01/app/11.2.0.4/grid/lib/libskgxn2.so
/usr/bin/ar cr /u01/app/11.2.0.4/grid/rdbms/lib/libknlopt.a /u01/app/11.2.0.4/grid/rdbms/lib/kcsm.o

Background process 3868 (node: odanode2) gets done with the exit code 255
Background process 3845 (node: odanode1) gets done with the exit code 255
ERROR  : Failure in copying /opt/oracle/oak/onecmd/tmp/giclonepl.sh to DB nodes and executing it as root in parallel
Exiting..
 

Due to previous failure, the new grid home was created already and the new upgrade process (clone GI home) is failing.

From giclonepl.sh-xxxx.log you can see similar error as given below after failure.

You can find the log of this install session at:
 /u01/app/oraInventory/logs/cloneActions2014-01-11_12-31-24PM.log
OUI-10197:Unable to create a new Oracle Home at /u01/app/11.2.0.4/grid. Oracle Home already exists at this location. Select another location.
SEVERE:OUI-10197:Unable to create a new Oracle Home at /u01/app/11.2.0.4/grid. Oracle Home already exists at this location. Select another location.
tall2014-01-11_12-31-24PM/oui/jlib/ojmisc.jar:/tmp/OraInstall2014-01-11_12-31-24PM/oui/jlib/xml.jar:/tmp/OraInstall2014-01-11_12-31-24PM/oui/jlib/srvm.jar:/tmp/OraInstall2014-01-11_12-31-24PM/oui/jlib/srvmasm.jar: 

To resolve the issue follow the steps given below.

1. Remove new Grid_Home on both the nodes

rm -fr /u01/app/<new GI home>

ie
rm -fr /u01/app/11.2.0.4

2. Replace the central inventory ("/u01/app/oraInventory/ContentsXML/inventory.xml") on both nodes from last backup, or execute on both nodes the following commands (change nodename1,nodename2 with your ODA host nodes name):

export NEW_GI_HOME=/u01/app/11.2.0.4/grid
export ORACLE_HOME=$NEW_GI_HOME
$OLD_HOME/oui/bin/runInstaller -detachHome -silent -local ORACLE_HOME=$NEW_GI_HOME
.
export OLD_GI_HOME=/u01/app/11.2.0.3/grid
export ORACLE_HOME=$OLD_GI_HOME
$OLD_HOME/oui/bin/runInstaller -attachHome -silent -local ORACLE_HOME=$OLD_GI_HOME ORACLE_HOME_NAME=OraGrid11gR3 "CLUSTER_NODES=nodename1,nodename2" CRS=true

3. Rerun update --GI  command.

oakcli update –patch <patch_number> –gi

 

 

 

References

<NOTE:1374275.1> - Oracle Clusterware (GI or CRS) Related Abbreviations, Acronyms and Procedures
<NOTE:888888.1> - Oracle Database Appliance - 12.1.2 and 2.X Supported ODA Versions & Known Issues
<NOTE:1557502.1> - ODA (Oracle Database Appliance) troubleshooting and solutions for ORA-600 [kfdjoin3] causing ASM startup failure after patching to 2.5 or 2.6
<BUG:18292186> - LNX64-112-CMT: OUT-OF-PLACE PATCHING GI, LOG AND ERROR CHECKING IMPROVEMENT
<BUG:18276205> - ODA: FAILED TO UPGRADE GI HOME.
<BUG:19444164> - LNX64-112-CMT: GI UPGRADE FAILURE DUE TO OC4J
<NOTE:1053393.1> - How to Update Inventory to Set/Unset "CRS=true" Flag for Oracle Clusterware Home
<BUG:18149174> - 2.2.0.0.0 NOT UPGRADED GI_HOME 11.2.0.2.5(13343424, 11.2.0.3.2(13696216,1334344
<NOTE:1466664.1> - ODA (Oracle Database Appliance): GI update is failing with oraInventory corruption
<NOTE:1056322.1> - Troubleshoot Grid Infrastructure/RAC Database installer/runInstaller Issues
<BUG:18456643> - AMDU ERRORS OUT IF A DISK WHICH WAS NOT PART OF A DISKGROUP WAS NOT READABLE
<BUG:19280211> - ODA: OAKCLI UPDATE -PATCH 2.10.0.0.0 --GI FAILED WITH ASM ERRORS
<NOTE:1364947.1> - How to Proceed When Upgrade to 11.2 Grid Infrastructure Cluster Fails
<NOTE:1513912.1> - TFA Collector - Tool for Enhanced Diagnostic Gathering
<BUG:14151562> - LNX64-112-CMT: NEED TO CLEANUP THE INVENTORY ENTRIES IF GI/DB UPGRADE IS FAILED.
<BUG:18276205> - ODA: FAILED TO UPGRADE GI HOME.

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback