![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||
Solution Type Problem Resolution Sure Solution 2005694.1 : How to Drop and Re-create the /U01 on Exadata Without Performing a Full Bare Metal Restore
Created from <SR 3-10544010481> Applies to:Exadata X5-2 Hardware - Version All Versions and laterExadata X3-2 Hardware - Version All Versions and later Oracle Exadata Hardware - Version 11.2.0.3 and later Exadata Database Machine X2-2 Hardware - Version All Versions and later Exadata X4-2 Hardware - Version All Versions and later Information in this document applies to any platform. SymptomsWhen attempting to resize the /u01 filesystem, the /u01 filesystem became corrupted with clusterware showing as not usable. The output of fsck shows corrupted inode tables. CauseAttempt to re-size /u01 file system caused corruption of /u01. Damage is non-repairable. This requires dropping and re-creation of /u01 file system. SolutionDrop and re-create the /u01 file system per the process below: Nodes:Bad node dm01db01 <<< Failing node Good Node dm01db02 <<< Surviving node
If the /u01 file system is damaged, but able to be mounted, backup whatever data is possible. Step 1: Remove the Failed Database Server from the Cluster1. Disable the listener that runs on the failed database server: [oracle@surviving]$ srvctl disable listener -n dm01db01 [oracle@surviving]$ srvctl stop listener -n dm01db01 PRCC-1017 : LISTENER was already stopped on dm01db01
2. Delete the Oracle Home from the Oracle inventory: [oracle@surviving]$ cd ${ORACLE_HOME}/oui/bin [oracle@surviving]$ ./runInstaller –updateNodeList ORACLE_HOME=/u01/app/oracle/product/11.2.0/dbhome_1 "CLUSTER_NODES=dm01db02"
Starting Oracle Universal Installer... Checking swap space: must be greater than 500 MB. Actual 16383 MB Passed The inventory pointer is located at /etc/oraInst.loc The inventory is located at /u01/app/oraInventory 'UpdateNodeList' was successful.
3. Verify that the failed database server is unpinned: [oracle@surviving]$ olsnodes -s -t dm01db01 Inactive Unpinned dm01db02 Active Unpinned
4. Stop the VIP Resources for the failed database server and delete: [root@surviving]# srvctl stop vip -i dm01db01-vip PRCC-1016 : dm01db01-vip.acme.com was already stopped [root@surviving]# srvctl remove vip -i dm01db01-vip Please confirm that you intend to remove the VIPs dm01db01-vip (y/[n]) y
5. Delete the node from the cluster: [root@surviving]# crsctl delete node -n dm01db01 CRS-4661: Node dm01db01 successfully deleted.
6. Update the Oracle Inventory: [oracle@surviving]$ cd ${ORACLE_HOME}/oui/bin [oracle@surviving]$ ./runInstaller -updateNodeList ORACLE_HOME=/u01/app/11.2.0/grid "CLUSTER_NODES=dm01db02" CRS=TRUE Starting Oracle Universal Installer... Checking swap space: must be greater than 500 MB. Actual 16383 MB Passed The inventory pointer is located at /etc/oraInst.loc The inventory is located at /u01/app/oraInventory 'UpdateNodeList' was successful.
7. Verify the node deletion is successful: [oracle@surviving]$ cluvfy stage -post nodedel -n dm01db01 -verbose Performing post-checks for node removal Checking CRS integrity... The Oracle clusterware is healthy on node "dm01db02" CRS integrity check passed Result: Node removal check passed Post-check for node removal was successful
Step 2: Drop and Re-create /u01/dev/mapper/VGExaDb-LVDbOra1 (* * * *% ) /u01
Note: when you're creating the filesystem, check the filesystem version on a healthy node first to determine if it's ext3 or ext4 - then create the same version. The more recent factory versions ship with ext4.
Step 3: Add Node back to cluster:Clone Oracle Grid Infrastructure to the Replacement Database Server 1. Verify the hardware and operating system installations with the Cluster Verification Utility (CVU): [oracle@surviving]$ cluvfy stage -post hwos -n dm01db01,dm01db02 –verbose
At the end of the report, you should see the text: “Post-check for hardware and operating system setup was successful.”
2. Verify peer compatibility: [oracle@surviving]$ cluvfy comp peer -refnode dm01db02 -n dm01db01 –orainv oinstall -osdba dba | grep -B 3 -A 2 mismatched Compatibility check: Available memory [reference node: dm01db02] Node Name Status Ref. node status Comment ------------ ----------------------- ----------------------- ---------- dm01db01 31.02GB (3.2527572E7KB) 29.26GB (3.0681252E7KB) mismatched Available memory check failed Compatibility check: Free disk space for "/tmp" [reference node: dm01db02] Node Name Status Ref. node status Comment ------------ ----------------------- ---------------------- ---------- dm01db01 55.52GB (5.8217472E7KB) 51.82GB (5.4340608E7KB) mismatched Free disk space check failed If the only components that failed are related to physical memory, swap space, and disk space, then it is safe to continue.
3. Perform requisite checks for node addition: [oracle@surviving]$ cluvfy stage -pre nodeadd -n dm01db01 -fixup -fixupdir /home/oracle/fixup.d If the only component that fails is related to swap space, then it is safe to continue.
4. Add the replacement database server into the cluster: NOTE: addnode.sh may error out with files that are only readable by root giving error similar to MOS 1526405.1 so following the workaround for these files and rerun the addnode.sh again. [oracle@surviving]$ cd /u01/app/11.2.0/grid/oui/bin/ [oracle@surviving]$ ./addnode.sh -silent "CLUSTER_NEW_NODES={dm01db01}" "CLUSTER_NEW_VIRTUAL_HOSTNAMES={dm01db01-vip}" This initiates the OUI to copy the clusterware software to the replacement database server.
WARNING: A new inventory has been created on one or more nodes in this session. However, it has not yet been registered as the central inventory of this system. To register the new inventory please run the script at '/u01/app/oraInventory/orainstRoot.sh' with root privileges on nodes 'dm01db01'.
If you do not register the inventory, you may not be able to update or patch the products you installed.
The following configuration scripts need to be executed as the "root" user in each cluster node: /u01/app/oraInventory/orainstRoot.sh #On nodes dm01db01 /u01/app/11.2.0/grid/root.sh #On nodes dm01db01
To execute the configuration scripts: a) Open a terminal window. b) Log in as root. c) Run the scripts on each cluster node After the scripts are finished, you should see the following informational messages: The Cluster Node Addition of /u01/app/11.2.0/grid was successful. Please check '/tmp/silentInstall.log' for more details.
5. Run the orainstRoot.sh and root.sh scripts for the replacement database server: NOTE: orainstRoot.sh will not need to be run if only /u01 was created and / filesystem was unchanged or restored because oraInst.loc and oratab files still exist. [root@replacement]# /u01/app/oraInventory/orainstRoot.sh Creating the Oracle inventory pointer file (/etc/oraInst.loc) Changing permissions of /u01/app/oraInventory. Adding read,write permissions for group. Removing read,write,execute permissions for world. Changing groupname of /u01/app/oraInventory to oinstall. The execution of the script is complete. [root@replacement]# /u01/app/11.2.0/grid/root.sh Check /u01/app/11.2.0/grid/install/root_dm01db01.acme.com_2010-03-10_17-59-15.log for the output of root script The output file created above will report that the LISTENER resource on the replaced database server failed to start. This is the expected output: PRCR-1013 : Failed to start resource ora.LISTENER.lsnr PRCR-1064 : Failed to start resource ora.LISTENER.lsnr on node dm01db01 CRS-2662: Resource 'ora.LISTENER.lsnr' is disabled on server 'dm01db01' start listener on node=dm01db01 ... failed
6. Reenable the listener resource that was stopped and disabled [root@replacement]# /u01/app/11.2.0/grid/bin/srvctl enable listener -l LISTENER -n dm01db01 [root@replacement]# /u01/app/11.2.0/grid/bin/srvctl start listener -l LISTENER -n dm01db01
Step 4: Clone Oracle Database Homes to Replacement Database Server1. Add the RDBMS ORACLE_HOME on the replacement database server: [oracle@surviving]$ cd /u01/app/oracle/product/11.2.0/dbhome_1/oui/bin/ [oracle@surviving]$ ./addnode.sh -silent "CLUSTER_NEW_NODES={dm01db01} These commands initiate the OUI (Oracle Universal Installer) to copy the Oracle Database software to the replacement database server. However, to complete the installation, you must run the root scripts on the replacement database server after the command completes.
WARNING: The following configuration scripts need to be executed as the “root” user in each cluster node. /u01/app/oracle/product/11.2.0/dbhome_1/root.sh #On nodes dm01db01
To execute the configuration scripts: Open a terminal window. Log in as root. Run the scripts on each cluster node. After the scripts are finished, you should see the following informational messages: The Cluster Node Addition of /u01/app/oracle/product/11.2.0/dbhome_1 was successful. Please check '/tmp/silentInstall.log' for more details.
2. Run the following scripts on the replacement database server: [root@replacement]# /u01/app/oracle/product/11.2.0/dbhome_1/root.sh Check /u01/app/oracle/product/11.2.0/dbhome_1/install/root_dm01db01.acme.com_2010-03- 10_18-27-16.log for the output of root script
3. Validate initialization parameter files Review that file init<SID>.ora under $ORACLE_HOME/dbs reference the spfile in the ASM shared storage. Review the password file which gets copied over under $ORACLE_HOME/dbs during addnode, needs to be changed to orapw<SID>
References<NOTE:1084360.1> - Bare Metal Restore Procedure for Compute Nodes on an Exadata Environment<NOTE:1664897.1> - EXT3 File system Error "EXT3-fs error (device dm-5): ext3_lookup: deleted inode referenced" Attachments This solution has no attachment |
||||||||||||
|