SuperCluster - Patching Best Practices For The Quarterly Full Stack Download Patch

Asset ID:	1-79-1569461.1
Update Date:	2018-03-20
Keywords:

Solution Type Predictive Self-Healing Sure

Solution 1569461.1 : SuperCluster - Patching Best Practices For The Quarterly Full Stack Download Patch

Applies to:

Sun ZFS Storage 7320 - Version All Versions to All Versions [Release All Releases]
Oracle SuperCluster T5-8 Half Rack - Version All Versions to All Versions [Release All Releases]
Solaris SPARC Operating System - Version 10 1/13 U11 to 11.1 [Release 10.0 to 11.0]
SPARC SuperCluster T4-4 Full Rack - Version All Versions to All Versions [Release All Releases]
Oracle Exadata Storage Server Software - Version 11.2.3.2.1 to 11.2.3.2.1 [Release 11.2]
Linux x86-64
Oracle Solaris on SPARC (64-bit)

Purpose

This note is to cover any best practices associated with the complete project planning and implementation of Oracle SuperCluster Quarterly Full Stack Download Patch. This is generic to all QFSDP releases starting January 2014. However there will be a table at the end of the document that links you to the specific companion note that has known issues for a specific QFSDP.

Scope

This note will cover items associated in the planning assment and delivery of the Oracle SuperCluster Quarterly Full Stack Download Patch.

Companion Note Table

Oracle SuperCluster Quarterly Full Stack Download Patch (QFSDP) Realease	Companion Note
October 2014	SuperCluster- OCT 2014 Quarterly Full Stack Download Patch (QFSDP) Companion Document <Document 1955519.1>
July 2014	SuperCluster- July 2014 Quarterly Full Stack Download Patch (QFSDP) Companion Document<Document 1910505.1>
April 2014	SuperCluster- April 2014 Quarterly Full Stack Download Patch (QFSDP) Companion Document <Document 1669716.1>
January 2014	SuperCluster- January 2014 Quarterly Full Stack Download Patch (QFSDP) Companion Document <Document 1630752.1>

Readme's can also be obtained without downloading the entire patchset by using the attachments at the bottom of the following documents

Patch Bundle	Readme MOS Note
October 2014	SuperCluster- July 2014 Quarterly Full Stack Download Patch (QFSDP) Companion Document <Document 1910505.1>
July 2014	Supercluster QFSDP July 2014 SCMU Readme <Document 1910077.1>
April 2014	Supercluster QFSDP April 2014 SCMU Readme <Document 1682460.1>

Details

Planning

The Oracle SuperCluster Quarterly Full Stack Download Patch, hereinafter referred to as QFSDP, may contain components that require the shutdown and restart of various components of the engineered system. This includes, but is not limited to, the storage cells, zones, logical domains (LDoms) and entire computes nodes or physical domains (PDoms). Taking this into consideration if you have deployed any software that is not configured for high availability you will have to plan to fail those items to another location within or outside of the Oracle SuperCluster or schedule an outage of the same.

Oracle Support requests that you start planning for your patching at least six weeks prior to your desired patching window so you can ensure all required change management processes are in place and that you can get the assessment phase completed four weeks prior to the patching window . This will include filling out the attached Oracle SuperCluster Patching Assessment Worksheet attached to this document.

The reason for the four weeks is that any given system may have one off defect fixes applied that are not part of the standard Oracle SuperCluster software deployment. Your organization will need time to provide this information to Oracle Support so that we may determine if these fixes are in your target patching level and if not give us sufficient lead time for you to have the necessary additional fixes built and ready at least two to seven days before the patching window.

SuperCluster uses a Jumbo IDR process. The Jumbo IDR may be at a newer revision by the time you decide to patch to a specific QFSDP. If this is the case then when you apply the QFSDP it is mandatory to upgrade the JIDR beyond the level delivered in the QFSDP. You can find the latest recommended version in SuperCluster- Solaris 11 Support Repository Updates (SRU) and Jumbo Interim Diagnostic Relief (JIDR) Support Matrix. Document 1632521.1

Note: If Oracle Platinum Services or Oracle Advance Customer Services are performing your patching exercise follow the time frames as applicable to each organization for submitting patching requests. However those windows should not be shorter than 4 weeks if you have additional defect fixes.

Note: Never enter any patching window with any defective hardware. Ensure any hardware faults are tended to prior to entering the patching window

Note: QFSDPs typically support updating systems at component versions up to 1 year old - see the QFSDP README for details. If the assessment determines the system components are outside the supported scope of the desired QFSDP, then you will need to perform a two or more step upgrade process, first to apply an older QFSDP to bring the system to a level the desired QFSDP supports and then to apply that QFSDP. These types of scenarios will require a special patching assessment that can be completed by Oracle Support or Oracle Advanced Customer Services under the guidance of Oracle Support. Please give an additional two weeks lead time for these situations.

Note: To try and shorten pre patching window work and patching windows themselves we request that strong consideration be made to allowing the creation of an IPMP group on the infiniband storage network, and in turn an nfsv3 share from the integrated ZFS Storage Appliance to each zone, be allowed in database zones. If not the alternative would be to have to copy the database patch components to each LDom and zone. This will lengthen the clean up time at the end of the patching process and could impact patching window.

Assessment

If the QFSDP will be applied by Oracle Platinum Services or Oracle Advanced Customer Services they will assist you in completing the Oracle SuperCluster Patching Assessment Worksheet. However it would help expedite items if you could complete the General Information tab, Host Information tab, and Current Version column of the Component Versions and Counts tab to the best of your ability. If you are patching on your own and targeting a yet to be released QFSDP, you will need to do the aforementioned and then open a Service Request to Oracle Support using your Super Cluster CSI and have a support engineer assist you in completing the assessment. Again, please give the four to six weeks lead time.

Note: Do not be put off by the sample times in Patch Timing Estimates tab. I put those numbers in as a non real world example to account for every possible combination of options and virtualization strategy available on Oracle SuperCluster. Set the items you are not using to zero in the Count if Version Changes column of the Component Versions and Counts tab.

Note: The Oracle SuperCluster Patching Assessment Worksheet contains tool tips for just about every cell to assist in gathering information. Also the worksheet uses simple mathematical macros for the timing calculations so it expects the user to enable macros.

What type of data should I provide before engaging Oracle?

Note: Most of how to gather this data is in the tool tips in the Oracle SuperCluster Patching Assessment Worksheet.

Database patch level information – use this information to find out if your one off patches are included or require a patch to be delivered on the target patching level.

$ORACLE_HOME/OPatch/opatch lsinventory

--output for all differing Grid Infrastructure and Database homes that are present on the Oracle SuperCluster. If all of your homes are patched to the exact same levels and the exact same one off patches then one output from a single Grid --Infrastructure and one output from a single Database home is sufficient. This will have to be a separate attachment as the information is too verbose to fit in the Oracle SuperCluster Patching Assessment Worksheet.

Compute Node/ Operating System information - use this information to find out if your IDRs are included or require an IDR to be delivered on the target patching level. The Solaris 10 information will have to be a separate attachment as it is too verbose to fit in the assessment worksheet.

--Solaris 11 patch information

# pkg list entire

# pkg list |grep idr

--Solaris 10 patch information

# uname –a

# showrev –p

LDom information – use this on the primary LDom to determine the number of LDoms for each compute node and in turn the total for the Oracle SuperCluster

# ldm list

Zone Information – use this on each LDom from above to determine the zone information.

# zoneadm list –cv

Note: Determine if the zone is an exavm zone or an application zone. Please realize that even if it has a database in it, it is not an exavm zone unless explicitly built with the exavm tools or Java One Command and accessing Exadata storage.

Infiniband HCA firmware version – use this information to determine if your HCA firmware is at the latest supported revision.

--Solaris 11

# fwflash -l -c IB

--Solaris 10

# fwflash -l -c IB

Note: For Solaris 10 you will need to obtain and install the SUNWfirmwareflash pkg.

Fault information – use this to determine if there are any reported hardware faults or critical software faults reported on an LDom.

# fmadm faulty

Note: If you suspect any faulty hardware get an ilom snapshot and open a Service request with Oracle Support to get it rectified well before your patching window. Never enter a patching window with any defective hardware.

Compute Node firmware version – use this information to determine if your firmware is at the latest supported version for your hardware type.

--Log into ILOM and run:

-> show /System/Firmware

Note: Alternatively, instead of all the commands above, if you are familiar with reading the output, you can gather an explorer for each LDom on the Oracle SuperCluster with the options below.

--Solaris 11

# explorer -w default,localzones,smfextended

Solaris 10

# /opt/SUNWexplo/bin/explorer

Exadata storage cell information – use this information to determine version information and to ensure that there is no defective hardware or other serious fault on the cells.

Note: if ssh root equivalency is set up and you have the cells in a group file, you may use dcli to gather all the data from all cells at once.

# imageinfo

# cellcli –e list griddisk

# cellcli –e list physicaldisk

# cellcli –e list alerthistory

Note: If you find or suspect any bad hardware please gather a sundiag.sh output from the suspect cell so it can be remediated long before the patching window. Never enter a patching window with any defective hardware.

Power Distribution Unit Information –use this to obtain version information.

--Connect to the PDU over http using the host name or IP and go to the” Module Info” section.

Infiniband Switch information –use this to obtain version information and to verify the integrity of the Infiniband partitioning. If the Infiniband partitioning is invalid you would probably know because portions of the Oracle SuperCluster software would not be functioning. Never enter a patching window with any defective infiniband partitioning.

--Connect to the switches with ssh as root and run:

# version

# smpartition list active

ZFS Storage Appliance –use this information to determine if your storage heads are at the right AK software level.

--From a compute node LDom modify the IP for the infiniband storage IP of the ZFSSA and run the following replacing the IP address:
ssh 192.168.29.1 maintenance system updates show |grep current

Once you have all the information detailed in this section you will use the Oracle SuperCluster Patching Assessment Worksheet and the README of the QFSDP you are going to apply to determine which and how many of your components are going to be patched. Then use the time calculations to determine estimated windows for patching.

Note: It is possible to have multiple patching windows if your single Oracle SuperCluster is divided up into virtual systems i.e. For example one set of LDoms or zones for test and another set for production. This does have a few caveats. If a firmware patch is to be applied to a compute node all environments on that compute node will have to be rebooted regardless of which LDom they are in. If your environments are virtualized out of Solaris 11 local zones on a single LDom then all of the zones will be patched at the same time of the global zone and will be rebooted when the LDom is rebooted. If you have multiple Oracle homes sharing a common Grid Infrastructure home then they all have to be patched as a single unit. You cannot run a single RAC cluster with different patch levels for Grid Infrastructure.

Gather the assessment worksheet and all other detailed data too large for it and provide it to the parties doing your patching so they can determine if there are defect fixes they will have to ask for on top of the target QFSDP. The request for these items should be done with two days of completing the assessment. The request for these items should explicitly state the target date of patching in them.

Work to be done no less than 1 week prior to the patching window.

Note: This is critical because the Oracle SuperCluster QFSDP bundles are quite large currently anywhere between 7 and 9 GB. If you are in an environment with a slow WAN it could take an entire day to copy the source over. Once you have it downloaded you will have to merge the pieces together, in accordance with the top level README file, and the extract the main bundle and navigate through the directories uncompressing the underlying archives.

Note: The Oracle SuperCluster QFSDP has several README files, each associated with the components they are patching. Once you are done uncompressing the patch bundle, ensure you read them all. The top level README lists what they are. You should start reading through them the first time now and do so a few times during the next couple weeks so you can leverage them to build your patch plan.

Log into the target SuperCluster and verify the health of the IPS repository via the document SuperCluster- Best Practices for local Image Packaging System (IPS) software package repository. <Document 1625719.1>.

Note: The exact names of the shares do not matter unless you are building it from scratch. If the repository is not healthy it is best to take that step. However, if the share names are different but the repository is otherwise healthy proceed to the next step.

If the repository check above in this section shows that the ssc project is already present skip this code block. If not create it. Modify the IP address or hostname as required.

# ssh root@192.168.1.15
orlt5sn1:>configuration cluster show
Make sure the controller state is either AKCS_OWNER or AKCS_CLUSTERED before
proceeding.
orlt5sn1:>shares
orlt5sn1:shares>project ssc
orlt5sn1:shares ssc (uncommitted)>set mountpoint=/export/ssc
orlt5sn1:shares ssc (uncommitted)>commit

If running database in zones, exavm, check for connectivity in each zone to the Infiniband storage network in accordance with SuperCluster- Best Practices for creating 8503 infiniband storage network IPMP interface in Solaris 11 local zones (application or exavm) <Document 1630341.1>.

Note: If this is an Oracle Platinum Services or Oracle Advanced Customer Services patching engagement ensure you have permission from the customer to do this and they have approved it via their applicable change management processes.

Create a share on the integrated ZFS Storage Appliance, a mount point on the primary LDom of the first compute node and mount it with nfsv3. Modify for your IP address hostname and subnet where applicable.

# mkdir /QFSDP
# chmod 777 QFSDP
# ssh root@192.168.1.15
orlt5sn1:>shares
orlt5sn1:shares>select ssc
orlt5sn1:shares ssc>filesystem qfsdp
orlt5sn1:shares ssc/qfsdp (uncommitted)>set root_permissions=755
orlt5sn1:shares ssc/qfsdp (uncommitted)>set sharenfs="sec=sys,rw=@192.168.1.0/24,root=@192.168.1.0/24"
orlt5sn1:shares ssc/qfsdp (uncommitted)>commit
orlt5sn1:shares ssc/qfsdp (uncommitted)>exit
# vi /etc/vfstab
--Add the following line,without the "--" modified for your integrated ZFS Storage Appliance IP address and then save the file.
--192.168.29.1:/export/ssc/qfsdp - /QFSDP - yes rw,bg,hard,nointr,rsize=131072,wsize=131072,proto=tcp,vers=3
# mount –a

Copy the QFSDP patch pieces, any additional Database one off patches and IDRs into this QFSDP directory and follow the instructions for merging the pieces and extracting. Once everything is extracted do a chmod to all the database bundle patch items, opatch archive, and additional database related patches, excluding the Exadata Storage cell patches to the Grid Infrastructure home user and group. If for any reason you could not create the storage network in the zones, nor the nfs mount point, do the same to/u01/patches on the primary LDom of all compute nodes. And then push the database bits into /u01/patches of all exavm database zones.

Ensure sufficient disk space. Substitute with the zfs pool that your /u01 mount point is on. For T5-8 and later installs this may a mount point in rpool.

# zfs list <poolname>

Check for free space in the rpool of each LDom and zone. There is no hard number but we typically like to see them <75% full. If they are over this speak to the systems administrators and have them clean up what they can.

# zpool list rpool

While you are on the system do another sanity check of the storage cell hardware to make sure nothing went bad since the initial assessment. Do the same for the operating system by checking fmadm. If you notice anything get a Service Request opened immediately to address the faulty hardware. Do not ever go into a patching window with faulty hardware.

--Solaris LDoms

# fmadm faulty

--Exadata storage cells get

Note: Check all at once with dcli if it is enabled.

# cellcli –e list griddisk

# cellcli –e list physicaldisk

# cellcli –e list alerthistory

Items to do 48 hours before the patching window

Do checks on statuses for any database one off patches or operating system IDRs that you have not received. If you cannot get a confirmed ETA of before the patching window escalate the Service Request to a severity 1 if it is not already and start evaluating the possibility of other patching windows. This will require close coordination with application teams and any business units served.

Do another sanity check of the storage cell hardware to make sure nothing went bad since the initial assessment. Do the same for the LDoms by checking fmadm. If you notice anything get a Service Request opened immediately to address the faulty hardware. Do not ever go into a patching window with faulty hardware.

--Solaris LDoms

# fmadm faulty

--Exadata storage cells - get the following output from each cell.

Note: Check all at once with dcli if it is enabled.

# cellcli –e list griddisk

# cellcli –e list physicaldisk

# cellcli –e list alerthistory

# zpool list rpool

If you do not have the additional required database one off patches nor operating system IDRs escalate again to Oracle Support and request a guarantee that you will have them by your patching window.

Items to do the day of patching

Obtain the additional required database one off patches and/or IDRs. If they are not available you will have to evaluate the risk of running without them and if it is too high you will have to cancel the patch window.

Do another sanity check of the storage cell hardware to make sure nothing went bad since the initial assessment. Do the same for the operating system by checking fmadm. If you notice anything get a Service Request opened immediately to address the faulty hardware. Work with the Service request owner and the assigned field engineer to determine how much, if any delay, this will put on your patch window. Work with your project managers and application teams and business units so they know what the status is throughout the day. Do not ever go into a patching window with faulty hardware.

--Solaris LDoms

# fmadm faulty

--Exadata storage cells get

Note: Check all at once with dcli if it is enabled.

# cellcli –e list griddisk

# cellcli –e list physicaldisk

# cellcli –e list alerthistory

Refresh the SuperCluster Supported Versions note for your hardware type, The SuperCluster Critical Issues note and most importantly the SuperCluster QFSDP Companion note for your QFSDP for any updates.

Note: Never rely on static copies of MOS documentation. All MOS content is dynamic and can be updated at any time.

Start of patching window

Connect to the Exadata storage cell and compute nodes ILOMs and reset the SP.

Confirm with all application teams and database administrators that it is OK to patch the targeted nodes. Patch in accordance with your action plan derived from the assessment, patching method (rolling or non-rolling), the QFSDP README files, and the patching. In case if failure during the window raise a Service Request immediately at a severity commiserate with business impact. Please provide any log or trace information at the time of filing to expedite resolution.

-> reset /SP

Post patching obligations

Once you have completed patching and ensured all components are at target levels and are operational please go through the README files one more time and ensure all steps were complete. Receive the iso images into the local repository if you did not do it during the preparation work. This would typically be Solaris SRU iso, exafamily iso, IDRs and Solaris Cluster iso. There may be more or less depending on what was patched in the particular QFSDP. Instructions are clearly defined in the README files. Ensure that ssctuner is functioning and reporting output of its work in the /var/log/messages file and ensure all database LMS, LGWR and VKTM processes are at scheduling class of FX-60 for all databases and ASM instances where applicable. Check to make sure that the entries in /etc/system are time stamped earlier that the latest reboot.

# cat /etc/system

# who -b

Unless you did the install from the NFS mount clean up all installation files transferred to /u01/patch. No matter the method chosen make sure none of the rpools on any of the LDoms nor zones are at or over 85% capacity. If they are look for areas to clean up such as /var/tmp, $OARCLE_BASE/diag for those systems where /u01 is a volume under rpool, extraneous Boot Environments (BEs). What is extraneous is up to each shops best practice. Ensure the integrated ZFS Storage Appliance is left with the same head active as when you entered the patching window. The reason for this is some systems actually have NFS mounts over the management network and the management network does not fail over with the head. Document all findings during the patching exercise and document them in the applicable service requests. Also provide feedback for the companion document or the overall assessment process.

Generic feedback

For general feedback about this note or the QFSDP companion notes or general positive and negative feedback about the Oracle SuperCluster patching process please let us know via the Oracle SuperCluster MOS Community QFSDP thread. Do not use this thread for technical troubleshooting please do all that via service requests. Also do not provide any personal or customer specific information that you do not want have seen by possibly all MOS users.

References

<NOTE:1452277.1> - SuperCluster Critical Issues
<NOTE:1567979.1> - Oracle SuperCluster Supported Software Versions - All Hardware Types
<NOTE:2056975.1> - Contents of each SuperCluster Quarterly Full Stack Download Patch (QFSDP)
<NOTE:2090692.1> - SuperCluster QFSDP Installation Checklists

Attachments

This solution has no attachment