[ PCA ] 2.3.x Upgrade Checklist and Prerequisites

Asset ID:	1-79-2242177.1
Update Date:	2018-05-19
Keywords:

Solution Type Predictive Self-Healing Sure

Solution 2242177.1 : [ PCA ] 2.3.x Upgrade Checklist and Prerequisites

Applies to:

Private Cloud Appliance - Version 2.0.5 and later
Private Cloud Appliance X5-2 Hardware
Linux x86-64

Purpose

This document's purpose is to outline the various checks that needs to be performed before and after the upgrade of a Private Cloud Appliance to release 2.3.x.

Scope

This document explains the different checks to perform before upgrading a Private Cloud Appliance rack to 2.3.x. Its purpose is not to document the upgrade process itself, but rather list the different aspects that need to be checked before and after performing the actual upgrade.

Details

Be aware that the Management Node upgrade to 2.3.x can take a significant amount of time and that a sufficient maintenance window has been planned accordingly – testing shows eight or more hours for the first manager node due to database export and transform.

Before upgrading

0. The following chapters should be carefully reviewed:

Oracle® Private Cloud Appliance Release Notes for Release 2.3.1
New Update Process with Proactive Support Integration
https://docs.oracle.com/cd/E83758_01/E83757/html/index.html

Oracle® Private Cloud Appliance Release Notes for Release 2.3.2
New Update Process with Proactive Support Integration
https://docs.oracle.com/cd/E83758_01/E89780/html/index.html

Oracle® Private Cloud Appliance Release Notes for Release 2.3.3
New Update Process with Proactive Support Integration
https://docs.oracle.com/cd/E83758_01/E92215/html/index.html

Oracle® Private Cloud Appliance Administrator's Guide for Release 2.3
Chapter - 7.8 Environment Pre-Upgrade Validation and Software Update to Release 2.3.x
http://docs.oracle.com/cd/E83758_01/E83754/html/admin-troubleshooting-preupgradescript.html

1. Raise a Proactive Service Request with Oracle Support.

Attach the output files from the script from step 2 and the manual data output from all other steps documented in this note to that Service Request.

Please wait for Oracle Support to review the uploaded data before starting the upgrade.

2. Run the pre-upgrade verification tools.

a) all 2.3.X releases

The pre-upgrade script is provided as part of the Oracle Private Cloud Appliance Release 2.3.X *.iso file.Download the zip files and follow the README instructions to re-assemble the .iso.zip file. Then copy that .iso.zip file to the active management node, unzip it, and mount the included .iso as a loopback device.

# mkdir /mnt/pca_2.3.X_iso
# mount -o loop ovca-2.3.1-bxxxx.iso /mnt/pca_2.3.X_iso

The file pre_upgrade_check.sh is located in the scripts subdirectory of the mounted ISO.

Change to the scripts directory and run the script.

# cd /mnt/pca_2.3.X_iso/scripts
# ./pre_upgrade_check.sh

This will generate an output file located at /tmp/YYYY_MM_DD_HH.mm.SS/pre_upgrade_check.log

Please upload that log file to the Proactive SR created during Step 1. Do not attempt the upgrade if any of the automated pre-checks are failed.

b) PCA 2.3.2 only

Once you unzip the p26982346_232_Linux-x86-64_1of2.zip file, you will find the enclosed pca_precheck_mysql.sh script.

Copy pca_precheck_mysql.sh to the PCA master management node :

# scp pca_precheck_mysql.sh root@10.100.1.101:/tmp/pca_precheck_mysql.sh

Then login to master management node and run this script prior to upgrade:

# ssh root@10.100.1.101
# chmod +x /tmp/pca_precheck_mysql.sh
# /tmp/pca_precheck_mysql.sh

Results from this script will have further instructions on any additional steps you would need to perform. Refer to Note 2334970.1 as directed in the script results

NOTE: 10.100.1.101 IP address used in this procedure is an example. Please substitute the IP address of your master management node.

This step is not applicable to PCA 2.3.3 upgrades.

3. Base PCA release

It is not a supported operation to upgrade a Private Cloud Appliance running any version lower than 2.1.1 to 2.3.x. Please ensure that the starting PCA image is at least 2.1.1 when attempting to upgrade to 2.3.x.

More details are available in Note 2235615.1

4. Number of LUNs

Check that the number of paths and LUNs on each compute node do not exceed the supported count using e.g. :

# multipathd paths count
Paths: <value>
# multipath -ll | grep 'dm-' | wc -l

First command should return 1024 or less, second command should return 256 or less.

Please refer to : http://docs.oracle.com/cd/E71897_01/E79050/html/pcarelnotes-maxconfig.html for details.

5. ZFS firmware release

Ssh to both ZFS heads :

# ssh 192.168.4.1
# ssh 192.168.4.2

and check that both heads are running the same version :

> maintenance system updates show

6. Check for multiple tenant groups

a) pca-admin database (this is applicable to PCA 2.2.1 and higher)

connect on the active management node and collect the output of :

#pca-admin list tenant-group

If there are more than one tenant group - Please upgrade one tenant group at a time when running the compute node upgrade portion of the complete upgrade.

b) OVM database

Connect to the OVM CLI using from the active management node :

# ssh localhost -p 10000 -l admin

Collect the output of :

OVM> list serverpool

7. External Storage LUNs

Please check that the external storage LUNs are not visible from the management nodes.

More details are available in Note 2148589.1

8. Check for OSWatcher customizations

In 2.3.x OSWatcher is enabled by default on compute nodes, and must be disabled prior to upgrade following the instructions in note 2258195.1

9. multipath.conf customizations

In 2.3.x, the ZFS Stanza in the multipath.conf on compute nodes will be overwritten; however any other customizations will be left as is. A backup of the multipath.conf file will be saved as /etc/multipath.conf.pca.<timestamp>

10. Check for customized ILOM and Xsigo NTP settings

In 2.3.x upgrades NTP is configured by default for all components within the rack. User settings will be updated to use PCA settings. Note that former settings are discarded by the upgrade but now all rack components are synchronized on the same ntp servers. Please note that there are no particular actions needed for this check - Just verifying that there are no custom ILOM and Xsigo NTP setting which are going to get discarded and synced with the management node irrespective of their former settings.

11. Check for customized inet settings on MNs

These settings will be wiped out upon upgrade. If there have been changes to the inet settings, make a note and reimplement them after the upgrade has completed.

These settings are altered in the following configuration file :

/etc/postfix/main.cf

12. Check to see whether compute nodes have been reprovisioned or replaced

If a motherboard or complete compute node has been replaced on the PCA rack being ugpraded, the pre-upgrade check is likely to fail with :

[05/23/2017 07:39:07 20580] INFO (upgrade_utils:853) [Password Check] FAILED: The check failed on ilom. Failed to verify Compute Node ILOM password on 192.168.4.112.
[05/23/2017 07:39:07 20580] INFO (upgrade_utils:853) [Password Check] FAILED: The check failed on cn. Failed to verify Compute Node host password on 192.168.4.12.

In that case - Raise a Service Request with Oracle Support.

Please raise a PCA Software collaboration to get these nodes removed from the inventory of the PCA

13. Verify ILOM versions

Connect on each compute and management node and run :

# fwupdate list all
# ilomconfig list system-summary

The ILOM version should be 3.2.4.68 or higher for X5-2 Compute nodes

See : http://docs.oracle.com/cd/E83758_01/E83757/html/pcarelnotes-issues-hardware.html#pcarelnotes-x5-ilom-firmware-provisionfail-bug25392805

Please review :Doc ID 2350974.1 How to upgrade the ILOM service processor firmware on PCA compute and management nodes. for instructions about how to upgrade the Firmware if needed.

All required Compute Nodes versions should be upgraded before proceeding with the management nodes upgrade.

14. Verify all Compute Node names

The Compute Nodes names are hard coded - They should not be any different from ovcacnXXr1. Please open the Oracle VM Manager Graphical user interface and check that no Compute Nodes have been renamed to anything else.

15. Verify the number of ethernet cards in the Xsigos

ssh on both Xsigo's in use :

ssh admin@192.168.4.204
ssh admin@192.168.4.205

Run at the prompt :

admin@ovcasw15r1[xsigo] show ethernet-card

This should return four ethernet cards per Xsigo, or 8 total cards.

16. Verify the interface labels on the ZFSSA

Verify the labels of ipmp1, ipmp2, vnic1, vnic2, ixgbe1, ixgbe2 and update the labels to be spec compliant if they are not.

This can be verified by connecting on both ZFS heads :

# ssh 192.168.4.1
# ssh 192.168.4.2

And run :

configuration net interfaces show

This should match the following output for the respective labels (last column) :

> configuration net interfaces show
Interfaces:

INTERFACE STATE CLASS LINKS ADDRS LABEL
ipmp1 up ipmp vnic1 192.168.4.100/24 IPMP_Ethernet_Interface
vnic2
ipmp2 up ipmp pffff_ibp0 192.168.40.1/24 IPMP_IPoIB_Interface
pffff_ibp1
ixgbe2 up ip ixgbe2 192.168.4.2/24 ixgbe2
pffff_ibp0 up ip pffff_ibp0 0.0.0.0/8 ibp0
pffff_ibp1 up ip pffff_ibp1 0.0.0.0/8 ibp1
vnic1 up ip vnic1 0.0.0.0/8 vnic1
vnic2 up ip vnic2 0.0.0.0/8 vnic2

Please check that ipmp2 is named IPMP_IPoIB_Interface, if it has a different name, please follow Note 2337830.1.

ixgbe1 net interface should only be used on ZFS head 1 (192.168.4.1);
ixgbe2 net interface should only be used on ZFS head 2 (192.168.4.2).

17. Check for resilvering jobs on ZFSSA

Check if there are ongoing resilvering jobs on the ZFS Storage Appliance and wait for them to complete before triggering the upgrade.

This can be verified by connecting on both ZFS heads :

# ssh 192.168.4.1
# ssh 192.168.4.2

And run :

> configuration storage list

This should not return in-progress resilvering jobs.

18. Check for hardware faults on all components

Verify that there are no hardware faults across all rack's components by checking e.g. tech-support on the xsigo's, ILOM snapshots on compute nodes,...

a) Compute Nodes

There is a built-in command on the master management node which can be used to diagnose the Hardware faults. Please run :

# pca-admin diagnose ilom

b) Management nodes

Connect on the ILOM of each management node and run :

-> show faulty
Target | Property | Value
---------+------------+-------

Show faulty should not return any entries on a healthy management node

c) Internal ZFS

Connect to both head 1 and 2 of the internal ZFSSA from the active Management Node :

# ssh 192.168.4.1
# ssh 192.168.4.2

and run :

ovcasn01r1:> status show
ovcasn01r1:> exit
ovcasn02r1:> status show
ovcasn02r1:> exit

Capture the output of "status show".

19. Check tinyproxy configuration

Refer to Note 2148041.1

20. Check for the Pool / o2cb Cluster's health

Before performing the upgrade - check that all the compute nodes and all the pool(s) (Rack1_ServerPool and all the custom tenant groups) do not report any kind of warning (yellow warning sign) or critical errors (red cross) in the Oracle VM Manager GUI.

21. Collect the number of objects in the database

As the root user on the active management node, run the attached script.

This will return the number of objects and the number of jobs in the database. Note down these numbers.

22. Verify failover works

Run a reboot (possibly in a maintenance window) on the master management node to ensure that the former slave is able to take over the master role :

# reboot

23. Customers making use of EM ONLY - Enterprise Manager 13c

Please apply steps from Note 2280818.1.

IMPORTANT NOTE : Please note that these steps collect data needed for EM such as a backup of the oraInventory which will NOT be recoverable if these steps are not followed properly before applying the upgrade as the upgrade wipes the data of the management nodes.

24. List the server profiles on the PCA

Run the following command on the master management node and save its output:

# pca-admin list server-profile

25. Retrieve the OSA Settings of the BIOS of each Node (Compute Nodes and Management Nodes)

Connect on each Management Node and Compute Node's ILOM using Secure Shell (SSH) and follow the following note ( Any method from that note is fine - Suggestion would be to use the xml dump ).

[ PCA ] PCA 2.3.1 Upgrade: How to check the OSA BIOS status on PCA nodes. (Doc ID 2281943.1)

Please note that any non-disabled OSA Status should be reported as part of the pre-checks for all nodes.

26. No warning or error signs in OVMM GUI

Check that there are no warning or error signs in Oracle VM Manager GUI on VMs, pools, Compute Nodes, LUNs, Repositories, Storage Servers.

27. Supply Rack information

List the number of compute nodes broken down by model (X-5, X-6,...)

28. Space requirements on the shared storage

There should be a minimum of 2 Terabytes available on the /nfs/shared_storage/ mount point.

To check this, run on either management node :

# df -h | grep shared_storage

29. Override Global Server Update Group (PCA 2.3.1+)

For each server pool / tenant group - Check that the "Override Global Server Update Group" checkbox is not checked. If it is checked, uncheck it. This may not be applicable depending on the Oracle VM release before upgrade.

30. Network interface configuration.

Check that the correct interfaces are present on both the management nodes and compute nodes by running :

# ifconfig -a

a) On the management nodes:

Configured interfaces will include:

bond0, bond1, eth0, eth4, eth4B, eth5,eth5B, eth6, eth6B, eth7, eth7B, ib0, ib1, and bond2.

The following interfaces are configured by PCA in the following subnet:

bond0 - 192.168.140.4 (master node only)
bond1 - 192.168.40.x
eth0 - 192.168.4.x
eth0:1 129.168.4.216 (mn master only)
bond2 is a placeholder for an external network connection, so IP will vary.

b) On the compute nodes:

Configured interfaces will include:

bond0, bond1, eth0, eth4, eth4B, eth5,eth5B, eth6, eth6B, eth7, eth7B, ib0, ib1, bond2 and bond3.

The following interfaces will have IPs set by OVM:

bond0 - 192.168.140.x
bond1 - 192.168.40.x
eth0 - 192.168.4.x

31. Check for IB Symbol errors

From current MN master, issue :

# ibqueryerrors

This will report error counts from both IB switches - Please save this output.

32. Internal ZFSSA protocols for NFS

Collect the output of :

# nfsstat -m

On both management nodes. Each mounted share should use NFS protocol version 4.

Before starting the actual install

1. Disable Backup cron job

On each Management node there is a backup cron job for the root user :

[root@ovcamn05r1 ~]# crontab -l
0 9,21 * * * /usr/sbin/pca-backup -q > /dev/null 2>&1

Disable that cron job by using :

# crontab -e

adding a # in front of it

#0 9,21 * * * /usr/sbin/pca-backup -q > /dev/null 2>&1

2. Customers making use of EM ONLY - Move PCA to blackout state in EM.

Please apply specific steps from Note 2280818.1

Post Upgrade

Please review :
Note 2288094.1 for a known issue during compute node upgrades.
Note 2324886.1 for missing kernel parameters after Compute Node upgrade.

1. Management Nodes

1. Names of the Unmanaged Storage Arrays

It can be that the names of the Unmanaged Storage Arrays are no longer displaying correctly after the upgrade. There is a simple workaround for this issue.

More details are available in Note 2244130.1

2. Check for errors / warnings in Oracle VM

Connect to the VM Manager Graphical User Interface and check interface for existence of :

Padlock icons against Compute Nodes or Storage Servers
Red cross(es) against Compute Nodes, Repositories or Storage Servers
Warning sign(s) against Compute Nodes, Repositories or Storage Servers

3. PCA Dashboard status

Verify in the PCA dashboard that all compute nodes and other hardware elements have a green checkbox on the right and none has a red cross.

4. Networks check

Check that all networks (custom - or factory default ones) are present and correct.

2. Compute Nodes

1. [ All 2.3.x Versions ] Change min_free_kbytes on all compute nodes

Please apply corresponding steps from Note 2314504.1 and reboot the compute node after the change has been made permanent.

2. [ All 2.3.x Versions] Check that the fm package is installed on all compute nodes.

using a command such as :

# rpm -q fm

If it is not installed, run :

# chkconfig ipmi on; service ipmi start; LFMA_UPDATE=1 /usr/bin/yum install fm -q -y --nogpgcheck

3. [only for PCA 2.3.2 - MANDATORY] upgrade vmpinfo-sosreport on all compute nodes

Please review Note 2339679.1 for a description of the problem and the required steps.

4. [only for PCA 2.3.2 - MANDATORY] Set dom0_vpcu pinning after upgrading compute nodes

Please review Note 2337757.1 for a description of the problem and the required steps.

5. [Only for PCA 2.3.1 - MANDATORY] Missing parameters

Please review Note 2324886.1 for a description of the problem and the required steps.

6. Virtual Machine test

Start a test virtual machine, verify that network(s) are functional, test live migration by live migrating it to a different compute node.

3. Customers making use of EM ONLY - Recover PCA management node agent and certificate

Please apply corresponding steps from Note 2280818.1

4. Reinstall Automatic Service Request if it was installed before (ASR)

Known Issues

1. [ Applies only to PCA 2.3.1] XMS connection from slave

This issue is described in Note 2278432.1

References

<NOTE:1940756.1> - The VMPScan Site Review and Cluster Analysis Tool
<NOTE:2244130.1> - [ PCA ] After a 2.3.1 Upgrade , the Unmanaged Storage Arrays Lost their respective Names in the Oracle VM GUI
<NOTE:2288094.1> - [ PCA ] After a 2.3.1 upgrade, a compute node is no longer able to start the o2cb service or mount any ocfs2 file systems (or the pool FS)
<NOTE:1628815.2> - Information Center: Oracle Private Cloud Appliance (PCA) - Patching & Maintenance
http://docs.oracle.com/cd/E71897_01/E79050/html/pcarelnotes-maxconfig.html
<NOTE:2278432.1> - [PCA] Upgrading first Management node to 2.3.1 loops with error “[Errno 111] Connection refused) while executing POST request for xmsSession when trying to Login by slave”
<NOTE:2282101.1> - [ PCA ] PCA 2.3.1 Upgrade: How to update the ILOM firmware on PCA nodes.
<NOTE:2324886.1> - [PCA] Two Required Kernel Parameters are Missing when a Compute Node upgrade to PCA 2.3.1
<NOTE:2334970.1> - [PCA] How to sync Oracle VM database passwords in PCA
<NOTE:2236828.1> - [ PCA ] A management node upgrade fails at " Exception: Attempting to collect OFM DB on remote MN: list index out of range "
<BUG:26198978> - 2.3.1 UPGRADE CN FAILED - OVM SERVER CLUSTER OUT OF SYNC
<NOTE:2337830.1> - [ PCA 2.3.2+ ] Attempting to upgrade fails at pre-checks phase with "[ZFS Label Check] FAILED: The check failed on zfs."
<NOTE:2235615.1> - [ PCA ] Attempting to upgrade a 2.0.5 or lower rack to 2.3.1 leaves both management nodes unresponsive
<NOTE:2281943.1> - Private Cloud Appliance (PCA) 2.3.1 Upgrade: How to check the Oracle System Assistant (OSA) BIOS status on PCA nodes.
<NOTE:2339679.1> - [ PCA 2.3.2 ] running vmpinfo on the Management node or sosreport on the Compute Nodes can cause soft lockups
<BUG:26534360> - CHECK DISK SPACE ON MN AND SHARED STORAGE BEFORE STARTING UPGRADE
<NOTE:2280818.1> - [ PCA ] Specific steps for customers making use of Enterprise Manager 13c when upgrading Private Cloud Appliance to release 2.3.1

Attachments

This solution has no attachment

Applies to:

Purpose

Scope

Details

Before upgrading

0. The following chapters should be carefully reviewed:

1. Raise a Proactive Service Request with Oracle Support.

2. Run the pre-upgrade verification tools.

a) all 2.3.X releases

b) PCA 2.3.2 only

3. Base PCA release

4. Number of LUNs

5. ZFS firmware release

6. Check for multiple tenant groups

a) pca-admin database (this is applicable to PCA 2.2.1 and higher)

b) OVM database

7. External Storage LUNs

8. Check for OSWatcher customizations

9. multipath.conf customizations

10. Check for customized ILOM and Xsigo NTP settings

11. Check for customized inet settings on MNs

12. Check to see whether compute nodes have been reprovisioned or replaced

13. Verify ILOM versions

14. Verify all Compute Node names

15. Verify the number of ethernet cards in the Xsigos

16. Verify the interface labels on the ZFSSA

17. Check for resilvering jobs on ZFSSA

18. Check for hardware faults on all components

a) Compute Nodes

b) Management nodes

c) Internal ZFS

19. Check tinyproxy configuration

20. Check for the Pool / o2cb Cluster's health

21. Collect the number of objects in the database

22. Verify failover works

23. Customers making use of EM ONLY - Enterprise Manager 13c

24. List the server profiles on the PCA

25. Retrieve the OSA Settings of the BIOS of each Node (Compute Nodes and Management Nodes)

26. No warning or error signs in OVMM GUI

27. Supply Rack information

28. Space requirements on the shared storage

29. Override Global Server Update Group (PCA 2.3.1+)

30. Network interface configuration.

a) On the management nodes:

b) On the compute nodes:

31. Check for IB Symbol errors From current MN master, issue :

32. Internal ZFSSA protocols for NFS

Before starting the actual install

1. Disable Backup cron job

2. Customers making use of EM ONLY - Move PCA to blackout state in EM.

Post Upgrade

1. Management Nodes

1. Names of the Unmanaged Storage Arrays

2. Check for errors / warnings in Oracle VM

3. PCA Dashboard status

4. Networks check

2. Compute Nodes

1. [ All 2.3.x Versions ] Change min_free_kbytes on all compute nodes

2. [ All 2.3.x Versions] Check that the fm package is installed on all compute nodes.

3. [only for PCA 2.3.2 - MANDATORY] upgrade vmpinfo-sosreport on all compute nodes

4. [only for PCA 2.3.2 - MANDATORY] Set dom0_vpcu pinning after upgrading compute nodes

5. [Only for PCA 2.3.1 - MANDATORY] Missing parameters

6. Virtual Machine test

3. Customers making use of EM ONLY - Recover PCA management node agent and certificate

4. Reinstall Automatic Service Request if it was installed before (ASR)

Known Issues

1. [ Applies only to PCA 2.3.1] XMS connection from slave

References

31. Check for IB Symbol errors

From current MN master, issue :