Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1570658.1
Update Date:2013-07-23
Keywords:

Solution Type  Problem Resolution Sure

Solution  1570658.1 :   Exalogic: Recovering from PSU Update Failure During Compute Node Update.  


Related Items
  • Exalogic Elastic Cloud X3-2 Hardware
  •  
Related Categories
  • PLA-Support>Eng Systems>Exalogic/OVCA>Oracle Exalogic>MW: Exalogic Core
  •  


Exalogic: Recovering from PSU Update Failure during compute node update.

In this Document
Symptoms
Cause
Solution


Created from <SR 3-7552023301>

Applies to:

Exalogic Elastic Cloud X3-2 Hardware - Version All Versions and later
Information in this document applies to any platform.

Symptoms

During PSU update on Exalogic Compute node, the patch process failed due to external datacenter issues.  Checking the status of the upgrade, it is showing that it is upgraded to the desired PSU level via imageinfo, but Exachk results are showing that the swprofile check has failed because it is missing multiple RPM's as seen below that are part of the PSU that is being applied.

[FAILURE]........Software does not match with the supported profile. See below.

[FAILURE]........Not found: ib-bonding-0.9.0-2.6.32_400.1.3.el5uek.x86_64.rpm, ib-bonding-debuginfo-0.9.0-2.6.32_400.1.3.el5uek.x86_64.rpm, kernel-ib-1.5.5-2.6.32_400.1.3.el5uek.x86_64.rpm, kernel-ib-devel-1.5.5-2.6.32_400.1.3.el5uek.x86_64.rpm, mpi-selector-1.0.3-1.x86_64.rpm, mpitests_mvapich2_gcc-3.2-923.x86_64.rpm, mpitests_mvapich_gcc-3.2-923.x86_64.rpm, mpitests_openmpi_gcc-3.2-923.x86_64.rpm, mvapich2_gcc-1.5.1-2.p1.x86_64.rpm, mvapich_gcc-1.2.0-3635.x86_64.rpm, openmpi_gcc-1.4.2-1.x86_64.rpm

 

From the Exapatch log we can see the following messages, showing the loss of connectivity and failure to complete the compute node patching:

Running /bin/rpm -e --allmatches openmpi_gcc-1.4.2-1 > /dev/null 2>&1

Connection to 172.21.254.153 closed by remote host.

2013/07/18 09:10:22 [Thread-1] 172.21.254.153: failed to run cd /opt/ebi_upgrade/`dirname BaseImage/2.0.3.0.2/scripts/ebi_patch.sh`; sh ./`basename BaseImage/2.0.3.0.2/scripts/ebi_patch.sh` --noreboot
2013/07/18 09:10:22 DURATION patch compute nodes: 3 minutes, 21 seconds. 
2013/07/18 09:10:22 DURATION exapatch: 3 minutes, 21 seconds. 

 

Cause

The cause of the issue is due to loss of connectivity with the node, but this is not limited to only the loss of connectivity . The solution is to recover from the failure and redo the patch on compute node.

Solution

To overcome the issue, we need to do the following:

If you have taken back up of the node OS before the patching process, do the following steps.  Note that exapatch usually makes a backup copy of these files in tmp folder during patching process and you can use those files as well.

- Restore .image_history file from backup
- Restore .image_id from backup
- Rerun the exapatch command

Example:

cp /var/tmp/el_conf.bak20x0x/.image_history.20x0x /var/log/init-exalogic-node/.image_history
cp /var/tmp/el_conf.bak20x0x/.image_id.20x0x /usr/lib/init-exalogic-node/.image_id


At this point, issue imageinfo and imagehistory and verify it corresponds to the pre-patch version.

Then rerun the exapatch command to apply the PSU to the compute node.


 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback