![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Solution Type Predictive Self-Healing Sure Solution 1684188.1 : Exalogic Patch Set Update (PSU) Release 2.0.6.0.2 and 2.0.6.1.2 (Linux - Virtual) for July 2014
In this Document
Applies to:Exalogic Elastic Cloud X3-2 Hardware - Version X3 to X3 [Release X3]Oracle Exalogic Elastic Cloud Software - Version 2.0.6.0.0 to 2.0.6.1.2 Oracle Exalogic Elastic Cloud X2-2 Hardware - Version X2 to X2 [Release X2] Exalogic Elastic Cloud X4-2 Hardware - Version X4 to X4 [Release X4] Linux x86-64 Oracle Virtual Server(x86-64) PurposeOracle Exalogic is an integrated hardware and software system designed to provide a complete platform for a wide range of application types and widely varied workloads. It combines optimized Oracle Fusion Middleware software like WebLogic server, JRockit, Coherence with industry-standard Sun Server and Storage hardware and InfiniBand networking. The purpose of this document is to provide specific information around July 2014 Patch Set Update (PSU) for that system. ScopeThe target audience of this document are engineers and system administrators who plan to apply the Exalogic PSU. This document provides the following:
This document will be kept up-to-date with updates to errata and known issues. DetailsPatch Download
Released: July 2014 Product Version: 2.0.6.0.2 (on X2-2/X3-2), 2.0.6.1.2 (on X4-2) for Oracle Exalogic Elastic Cloud infrastructure PATCH:18630693- EXALOGIC VIRTUAL 2.0.6.0.2 (on X2-2/X3-2), 2.0.6.1.2 (on X4-2) PATCH SET UPDATE (PSU) FOR JULY 2014 Patch Readme DocumentationRefer to the the readme documentation attached 18630693-Virtual.zip to this Document on how to upgrade the Exalogic infrastructure:
The readme content structure layout of the 18630693-Virtual.zip: 18630693-Virtual
| |- README.txt | |- README.html | |- Infrastructure/ | - docs/ | - README.html | |- Middleware/ | - Coherence/ | - 3.7.1.12/ | - README.txt | | - JRockit/ | - 1.6.0_81/ | - README.txt | | - WebLogic/ | - 10.3.6.0.8/ | - README.txt | | - Oracle Traffic Director/ | - 11.1.1.7.0/ | - README.txt
AppendicesAppendix A: Fixed Bugs ListPlease review Document ID : 1905523.1 - Exalogic Infrastructure July 2014 PSU - Fixed Bugs List. Appendix B: Patching Known IssuesUpgrading NM2 Switch Firmware on full rack Exalogic systems or multi-rack Exa systems causing Network outage
Note: The issue does not occur when Exalogic racks (one eighth, quarter, half ) are connected to Exadata racks. Symptoms Cause The NM2 Port ID space has increased four times from firmware version 2.0 to to 2.1. Therefore, port ID collisions may occur if NM2 switches with firmware version 2.0 and 2.1 are run on the same fabric. This issue can cause network outages on the InfiniBand fiber. Solution/Workaround In order to prevent this issue, manipulation of the switch 'GWInstance' value may be required to ensure unique port IDs are generated by performing the following steps: # showgwconfig
Note the GWInstance "Running Value" on each of the switches. 1. Ensure that all the GWInstance values are even numbers [root@nm2gw-ib01 ~]# setgwinstance 16
Stopping Bridge Manager.. [ OK ] Starting Bridge Manager. [ OK ] root@nm2gw-ib01 ~]# For example, if the GWInstance values of four NM2 switches are 10, 20, 30 & 40, it will be necessary to change the GWInstance value of 10 to 16. Guest vServers on X4-2 systems have version as 2.0.6.1.2 when patching from April 2014 PSUSymptoms Guest vServers on an X4-2 rack, when patched from April 2014 PSU to July 2014 PSU have version as 2.0.6.1.2 instead of 2.0.6.0.2 Cause This is caused by unpublished defect 19067873 Solution/Workaround There is no workaround or solution required for this. Both 2.0.6.1.2 and 2.0.6.0.2 are valid versions for guest vServers Enterprise Manager OpsCenter (EMOC) console is sometimes not accessible after patching EMOC vServerSymptoms EMOC console is sometimes not accessible from web browser after patching EMOC vServer through Exapatch 'ectemplates' action Cause Solution/Workaround root@computenode# /exalogic-lctools/bin/exapatch -a ecvserversshutdown
root@computenode# /exalogic-lctools/bin/exapatch -a ecvserversstartup
vServers on IPoIB-virt-admin were not accessible after the July PSU BaseTemplate upgradeSymptoms Able to login to the inaccessible vServer via xm console , no issues. However by inspecting the /var/log/egbt*20602.log (in case of X2-2/X3-2) or /var/log/egbt*20612.log (in case of X4-2) indicated no errors on the patching operation. The 'imageinfo' image version of vServer got updated as expected. Yet, the vServers were not accessible via the IPoIB-virt-admin address from a compute node. Cause Solution/Workaround The workaround for this problem is to login to the vServers via 'xm console <guid>' and reboot the vServers. After this, the vServers can be accessed.
Exalogic control services upgrade failing for July 2014 PSU patchingSymptoms: While attempting to patch Exalogic Virtual version 2.0.6.0.x with the July 2014 PSUs with the following command: /exalogic-lctools/bin/exapatch -a runExtension -p Exalogic_Control/emoc_patch_extension.py exapatch_descriptor.py
this message is observed on the command line: ERROR: EMOC-OVMM-service 10.2.76.11 is not running the expected version: 3.2.8.733
additional information in the logs shows the following error: [wldeploy] Caused by: javax.naming.AuthenticationException [Root exception is java.lang.SecurityException: User: weblogic, failed to be authenticated.]
Cause: WebLogic password was reset Soltuon :
After July 2014 PSU patching , when attempting to start guest vServers through the EMOC, they hang during the start-upSymptoms Part of the July 2014 PSU (2.0.6.0.2 ), OVM is upgraded from 3.2.1 to 3.2.8. When attempting to start any of the guest vServers from the EMOC, they are hanging during the start-up. ####<Aug 23, 2014 7:15:55 AM EST> <Error> <com.oracle.ovm.mgr.api.job.InternalJob> <XX01-elcontrol> <AdminServer> <Odof Tcp Client Thread: /127.0.0.1:54321/98> <<anonymous>> <> <0000KVz^_mCBh4SMyEvX6G1JxvB4000002> <1408742155355> <BEA-000000> <Job: AutoDiscoverTask_1408742154199, Time: 1408742154205, Internal Error (Operation) OVMAPI_6000E Internal Error: OVMAPI_4021E Server discover conflict at IP address: 192.168.23.4. The manager already has a server: XXX01cn04.XXX.XXX.XXX, at this IP address, with SMBIOS UUID: 08:00:20:ff:ff:ff:ff:ff:ff:ff:18:9e:c0:28:21:00. Cause (UNPUBLISHED) BUG:16221689
Bug 16221689 - DISCOVER ISSUE DUE TO SERVER SMBIOS UUID CHANGING AFTER UPGRADE This problem seems to occur during the Server upgrade to OVM Server 3.2.1, it has been seen during system reboots if the server hardware does not have a hardware UUID embedded and in the following scenarios:
Solution/Workaround Please review the workaround documented in the Document ID : 1531611.1 - Errors after Oracle VM Server upgrade "IllegalOperationException: OVMAPI_6000E Internal Error: If the IP address of this server has changed.
During the July 2014 PSU patching, compute nodes on Virtual racks will crash if the IB master subnet manager is not running on an NM2-GW switch on the Exalogic rackSymptoms When patching IB switches on a Virtual Exalogic rack, some or all compute nodes lose IB communication capabilities. This causes them to reboot.Cause The problem is that when an IB master subnet manager's host IB switch is being upgraded it needs to reboot. Therefore, before the upgrade, the subnet manager daemon on the NM2-GW switch is disabled. The other IB components on the rack running the subnet manager daemon negotiate a new master subnet manager. If the new master subnet manager ends up on a device other than an NM2-GW switch on the Exalogic rack, compute nodes will lose their IB connectivity and then crash due to the fact that they cannot communicate with the cluster. Solution/Workaround The workaround is to follow the solution documented in the Document ID : 1682501.1 - Setting up the subnet manager in a multirack configuration containing Exalogic/BDA and Exadata/SSC/Expansion Rack. Note that the configuration must be correct on ALL IB subnet managers on the IB fabric, and is not limited to the IB switches on the Exalogic rack.
On racks upgraded from 2.0.4.x to 2.0.6.x, the July 2014 Virtual PSU patching of compute nodes fails due to missing /proc/xen/xsd_kva SymptomsSymptoms On racks upgraded from 2.0.4.x to 2.0.6.x, the July 2014 PSU compute node patching fails. Patching takes a very long time, eventually leading to a timeout in exapatch. Observations from the corresponding ILOM show that the compute node has dropped to single-user mode. The error message in the patching log file (/var/log/ebi_20602.log) indicates that the Xen kernel is not loaded, as shown in the sample below: *ERROR: Sat Aug 23 12:47:58 CDT 2014: Xen kernel not loaded**
Cause This is specific to compute nodes that were upgraded from 2.0.4.x to 2.0.6.x using the Upgrade Kit and does not apply to 2.0.6.x fresh-install racks. The kernel is upgraded in single user mode; before the kernel is upgraded, a pre-requisite step checks for the presence of /proc/xen/xsd_kva. On compute nodes upgraded from 2.0.4.x to 2.0.6.x, the /proc/xen/xsd_kva is not loaded, leading to a failure of the pre-check which results in patching failure. Solution/Workaround Before applying the July 2014 PSU, append the following line to /etc/fstab on all compute nodes upgraded from 2.0.4.x:xenfs /proc/xen xenfs defaults 0 0
Note:
Do not add the line if it already exists in the /etc/fstab
For this change, no reboot is necessary prior to applying the July 2014 PSU to the compute nodes. If this solution/workaround was missed, and the problem occurred after patching was started and the compute node is stuck in single user mode, perform the following steps:
xenfs /proc/xen xenfs defaults 0 0
After a guest vServer kernel is patched and rebooted, the vServer is not accessibleSymptoms: After vServer kernel is patched and rebooted, the vServer is not accessible. OVM console shows hang at GRUB: Booting from Hard Disk...
GRUB _ Cause: Numerous kernel entries in boot partition Solution/Workaround: Clean up the old kernels that are no longer needed in the boot partition. 1. Take a back up of the guest vServer, which is to be patched. a. Login to a compute node, find the vServer image location with the command: find /OVS -name vm.cfg | xargs grep <guest vServer name>
For example: [root@computenode ~]# find /OVS -name vm.cfg | xargs grep VM-PTS-PVNET
/OVS/Repositories/0004fb0000030000db3286e8f9d42068/VirtualMachines/0004fb0000060000d9962b523e042f55/vm.cfg:OVM_simple_name = 'VM-PTS-PVNET' [root@computenode ~]# Here VM-PTS-PVNET is name of the guest vServer.
Note down the vm.cfg image location, here in the example, it is : /OVS/Repositories/0004fb0000030000db3286e8f9d42068/VirtualMachines/0004fb0000060000d9962b523e042f55/vm.cfg
b. Get the disk image location with the obtained vm.cfg path of the vServer using command:
grep -i disk /OVS/Repositories/<GUID>/VirtualMachines/<VM Guid>/vm.cfg
For example: [root@computenode ~]# grep -i disk /OVS/Repositories/0004fb0000030000db3286e8f9d42068/VirtualMachines/0004fb0000060000d9962b523e042f55/vm.cfg
disk = ['file:/OVS/Repositories/0004fb0000030000db3286e8f9d42068/VirtualDisks/0004fb00001200000d2006cdcd8ba4d0.img,hda,w'] [root@computenode ~]# c. Backup the vServer's image and its disk image to a share in ZFS which is mounted on the vServer.
# cp /OVS/Repositories/0004fb0000030000db3286e8f9d42068/VirtualMachines/0004fb0000060000d9962b523e042f55/vm.cfg /patches/vm_disk_images/
# cp /OVS/Repositories/0004fb0000030000db3286e8f9d42068/VirtualDisks/0004fb00001200000d2006cdcd8ba4d0.img /patches/vm_disk_images/
# uname -r
For example: # uname -r
2.6.32-400.26.3.el5uek
# grep kernel /boot/grub/grub.conf | grep -v "#" | head -2
For example: [root@vserver ~]# grep kernel /boot/grub/grub.conf | grep -v "#" | head -2
kernel /vmlinuz-2.6.32-400.26.3.el5uek ro root=/dev/VolGroup00/LogVol00 rhgb tsc=reliable nohpet nopmtimer hda=noprobe hdb=noprobe ide0=noprobe numa=off console=tty0 console=ttyS0,19200n8 kernel /vmlinuz-2.6.18-308.el5 ro root=/dev/VolGroup00/LogVol00 rhgb tsc=reliable nohpet nopmtimer hda=noprobe hdb=noprobe ide0=noprobe numa=off console=tty0 console=ttyS0,19200n8 [root@vserver ~]# Procedure to remove older kernels # grep kernel /boot/grub/grub.conf | grep -v "#" | tail -n +3 | awk '{ print $2 }' | sed 's/\/vmlinuz-//g'
b. Get the rpms associated with each such entry: # rpm -qa | grep kernel | grep <kernel version>
For example: rpm -qa | grep kernel | grep 2.6.32-200.21.1.el5uek
c. Remove each of the rpm as follows: # rpm -ev --allmatches <kernel rpm>
For example: # rpm -ev --allmatches kernel-uek-2.6.32-200.21.1.el5uek
4. Now proceed with the exapatch command to patch that vServer. 5. After successfully patching the vServer, exapatch will reboot the vServer.
6. If the vServer hangs during boot at grub prompt (can be verified using OVMM), please stop the vServer using OVMM.
a. Restore vServer's image from earlier backed up image in step 1. For example: cp /patches/vm_disk_images/0004fb0000120000cd49595b1a290cef.img /OVS/Repositories/0004fb0000030000355831c19e634db7/VirtualDisks/0004fb0000120000cd49595b1a290cef.img
b. Start the vServer using OVMM. The old vServer (before patching) will be restored. c. Please contact Oracle Exalogic Support with a Service Request for the vServer hang.
NM2 GW Switches Upgrade failing with misleading Space Error when /conf/configvalid is not set correctlyFor information on this refer to known issue under <Note 1571367.1> titled "IB Switches FW upgrade to 2.1.4-1 using exapatch fails with "pre-patch FreeSpaceCheck failed" error".
In the Exalogic Control BUI, for a given asset (NM2-GW switches, NM2-36p switch, ZFS storage heads, PDU), only one Proxy Controllers (PC) needs to be managing it. If a particular asset appears to be managed by both PC1 and PC2, it must be migrated to one of the proxy controllers. It does not matter whether it is migrated to PC1 or PC2.
NOTE:
This does not apply to Compute nodes. One or more compute nodes may appear in The Managed Assets list of both proxy controllers.
For instance, el01cn01.example.com may appear as being managed by both PC1 and PC2. This is known bug in the Ops Center, can be ignored. Exapatch Fails Upgrading the Guest vServers Due to Grub Corruption When Doing Linux Virtual PSU Upgrade from April 2014 (2.0.6.0.1) to July 2014 (2.0.6.0.2)Refer to Note 1957315.1 for details on this known issue. Xen RPM updates applied through patch 19715566 are overwritten by the July 2014 PSU.Symptoms: [root@compute-node ~]# rpm -qa | grep xen-
xen-tools-4.1.3-25.el5.94.1.3 xen-devel-4.1.3-25.el5.94.1.3 xen-debugger-4.1.3-25.el5.94.1.3 xen-pvhvm-devel-4.1.3-25.el5.94.1.3 xen-4.1.3-25.el5.94.1.3 After applying the July 2014 PSU, the following xen RPM versions are observed on the node: [root@compute-node ~]# rpm -qa | grep xen-
xen-tools-4.1.3-25.el5.94 xen-devel-4.1.3-25.el5.94 xen-debugger-4.1.3-25.el5.94.1.3 xen-pvhvm-devel-4.1.3-25.el5.94.1.3 xen-4.1.3-25.el5.94 Notice that a few RPMs have been overwritten with older versions. Cause: In July 2014 PSU, the rpm command is used with the force option to install updated RPMs included in the PSU, which does not skip updates if newer versions of the RPMs are already installed on the node. Solution/Workaround: To resolve, following upgrading to the July 2014 PSU, reapply patch 19715566 by following instructions in this MOS note:
Appendix C: Errata1. The Oracle ZFS Storage appliance on X4-2 racks is incorrectly referred to as ZFS Storage 7320 in the PSU. The storage appliance on Exalogic X4-2 racks is Oracle ZFS Storage ZS3-ES.2. The pre-requisites section of the ZFS Software Upgrade contains an incorrect of the ZFS ILOMOriginal Text: 3. Migration of assets managed by Proxy ControllersIn the troubleshooting MOS note section "Problem: components are listed under both the ProxyControllers(PC)" Original Text In the Exalogic Control BUI, if it is observed that any of the component asset is listed under both the Proxy Controllers (PC), they need to be migrated to a single PC. Updated Text For a given switch (NM2-GW switches, NM2-36p switch), only one proxy controller needs to be managing it. If a particular switch appears to be managed by both PC1 and PC2, it must be migrated to one of the proxy controllers. It does not matter whether it is migrated to PC1 or PC2. One or more compute nodes may appear in the Managed Assets list of both proxy controllers. For instance, el01cn01.example.com may appear as being managed by both PC1 and PC2. This is expected behavior; no migration is required for the compute nodes. Other non-switch assets, such as ZFS storage heads and PDU may also appear in the Managed Assets list of both proxy controllers. Migration is not required for these assets prior to applying the April 2014 PSU; migration is required only for switches. Login to EMOC BUI and navigate to "Administration" item in the left panel and find the entries for 'PC1' and 'PC2' vServers. Select a Proxy Controller say, "PC1" in the left panel. In the center panel click on the "Managed Assets" tab and set the Asset Type Filter to "Network Switches" to get list of switches managed by the 'PC1' Proxy Controller. Select the switch that you wish to migrate to the other ProxyController "PC2". Click on the icon that provides the option to "Migrate Assets". A confirmation dialog shows up, select 'Migrate' button to proceed. Once the migration completes finishes, a notification pop-up appears at the bottom right corner of the EMOC BUI, confirming the successful migration. References<NOTE:1314535.1> - Exalogic Patch Set Updates (PSU) Master Note<NOTE:1329262.1> - How to Perform a Healthcheck on Exalogic <NOTE:1449226.1> - Exachk Health-Check Tool for Exalogic Attachments This solution has no attachment |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|