SuperCluster Critical Issues

Asset ID:	1-79-1452277.1
Update Date:	2018-01-23
Keywords:

Solution Type Predictive Self-Healing Sure

Solution 1452277.1 : SuperCluster Critical Issues

Applies to:

SPARC SuperCluster T4-4 Full Rack - Version All Versions to All Versions [Release All Releases]
Oracle Database - Enterprise Edition - Version 11.2.0.4 to 12.1.0.2 [Release 11.2 to 12.1]
Oracle SuperCluster M6-32 Hardware - Version All Versions to All Versions [Release All Releases]
Oracle SuperCluster Specific Software - Version 1.x to 1.x [Release 1.0]
Oracle Exadata Storage Server Software - Version 11.2.2.3.1 to 11.2.2.3.1 [Release 11.2]
Oracle Solaris on SPARC (64-bit)
Linux x86-64

Purpose

The following tables list fixes and workarounds that are deemed critical for Oracle SuperCluster T4-4 , T5-8, M6-32 and M7. If you are running an affected release, it is highly recommended that you either employ the recommended workaround, or install the recommended patch.

A new date column has been added per customer request but it will not be retroactively filled out for existing issues

Scope

Only issues affecting Oracle SuperCluster T4-4, T5-8, M6-32 and M7 that meet one or more of the following criteria are included in this document:

Causes on-disk corruption or data loss
Causes failure that impacts system wide availability
Causes intermittent wrong results
Is expected to impact a large number of SuperCluster customers

Details

There may be some duplication of items from <Document 1270094.1> Exadata Critical Issues however you should make sure you review the Exadata document in conjunction with this one

For patching specific best practices and known issues refer to SuperCluster - Patching Best Practices For The Quarterly Full Stack Download Patch <Document 1569461.1>

SuperCluster Tools

Some of these tools may not be specific to SuperCluster but the table will adress their usage on SuperCluster

Applies to

Issue

Fix or Workaround

Date Updated

SCT_1

All SuperCluster versions running Database in Zones.

CAUTION - Java OneCommand can destroy the storage cell disks and griddisks. Take extreme care when you run this utility in an existing environment. In particular, running the Java OneCommand undo option on certain steps (Create Cell disk, for example) can cause complete destruction of all the griddisks on the storage cells. In addition, re-running the griddisk creation step or mistakenly specifying a non-unique diskgroup in OEDA will result in the destruction of existing griddisks. Note, too, that older versions of Java OneCommand also destroy cell disks and griddisks with the "Create Cell Disks" step.

IMPORTANT WARNING - make sure you backup all existing databases BEFORE running Java OneCommand.

IMPORTANT - Always use the latest OEDA and Java OneCommand patch. Refer to the OneCommand section in MOS Note 888828.1 for details.

SCT_2

SuperCluster systems using osc-config-backup

Tool issues with the following versions requiring mandatory patching

The tool versions affected are v1.1 and v.1.1.1 installed via SuperCluster platform v2.3.8 and v2.3.13.
v1.1 : pkg://exa-family/system/platform/supercluster/osc-config-backup@0.5.11,5.11-2.3.0.1014:20170112T223654Z
v1.1.1 : pkg://exa-family/system/platform/supercluster/osc-config-backup@0.5.11,5.11-2.3.0.1044:20170130T204306Z

If your version is newer than this it does not pertain

Download exafamily Patch 25993487: ORACLE SOLARIS EXA-FAMILY 2.3.0.1095 REPO ISO IMAGE (SPARC 64-BIT)

extract and stage the p5p and set it as an exafamily repo and then pkg update osc-config-backup then unset this specific exafamily publisher

5/27/2017

SCT_3

SuperCluster systems using SuperCluster IO domains

SuperCluster IO domains in "Error" State may be Deleted by SVA Health Monitor (Doc ID 2342509.1)

follow instructions in the referenced note

1/17/2018

Oracle Engineered Systems Hardware Manager (OESHM)

These issues are specific to OESHM on Oracle SuperCluster

Applies to

Issue

Fix or Workaround

Date Updated

OESHM_1

All SuperCluster M7 systems with OESHM version .

M7 platforms will see frequent SP resets as well as OESHM errors about SP connectivity

Apply Patch in accordance with SuperCluster OESHM 1.0 causes SP reset due to TLI mismatch <Document 2147363.1>

6/8/2016

Oracle Solaris 11


#	Applies to	Issue	Fix or Workaround	Date Updated

SOL_11_21	Solaris 11.3	Race condition in the RDS socket layer will lead to periodic 2 second operational delays which will surface as log file parallel write delays that increase the averages reported for log file sync. Operating system Bug 26288397 - LOG FILE SYNC HIGH DUE FREQUENT PERIODIC POSTS TAKING ALMOST 2 SECONDS	Apply supercluster-solaris custom incorporation that includes fix for your SRU level . This fix should be considered mandatory for performance reasons. Solaris 11.3 SRU19 Custom Incorporationsolaris/supercluster-solaris@0.5.11,5.11-0.175.3.19.0.5.0.11031905.10000105 (or greater 19.0.5 version) Solaris 11.3 SRU16 Custom Incorporation solaris/supercluster-solaris@0.5.11,5.11-.175.3.16.0.3.0.11031603.10000106 (or greater 16.0.3 version) This will be fixed in the OCT 2017 QFSDP custom incorporation	11/6/2017
SOL_11_20	All SuperCluster systems running Solaris 11.3 JUL 2016, or OCT 2016 QFSDP	RDMA anonymous port exhaustion due to scheduled or unscheduled cell outages while DB / CRS is up and running. Please note the patches for this issue are considered mandatory. The fix for this issue needs to be in place before attempting rolling cell maintenance. If scheduled to patch to OCT 2016 QFSDP it is strongly advised to go to JAN 2017 to save time in your patching window as item DB_27 is mandatory as well for both JUL 2016 and OCT 2016 QFSDP patch levels	Permanent fix in JAN 2017 QFSDP. <Note 2226961.1 > SuperCluster: Critical Issue: SOL_11_20 Mandatory Patch: rdma anon port exhaustion can occur if communication to Exadata Storage Server is interrupted	2/8/2017
SOL_11_19	All Versions	OES/ID SMF Service does not exist in many SuperCluster LDoms or Zones. Typically T4-4 but can apply to any manually or OpsCenter created zone. it is mandatory to create this service in all LDoms and or zones that does not have it.	Instructions are included in SuperCluster: OES/ID SMF Service does not exist in some SuperCluster LDoms and Zones <Document 2165959.1>	7/28/2016
SOL_11_18	SuperCluster M7 and T5-8 with V2.0 functionality	Reboot of SuperCluster IO domains can result in PCIE errors on the Infiniband HCA	Follow instructions in SuperCluster - Reboot of SuperCluster IO domains can result in PCIE errors on the Infiniband HCA <Document 2150184.1>	6/16/2016
SOL_11_17	All Hardware types all Solaris versions	Transient Threads can lead to instance crashes, node evictions and random database or application performance issues	SuperCluster - Transient Threads can lead to instance crashes, node evictions and random database or application performance issues <Document 2149887.1>	6/15/2016
SOL_11_16	All Hardware types Solaris 11.2	Infiniband switch reboot may cause database evictions on if there are a large number of RDS connections. This should be reviewed prior to applying the JULY 2015 QFSDP as it moves the OS to 11.2 and should also be reviewed for SuperCluster V2.0 cusotmers.	Please review and follow the recommendations from document 2043654.1	08/15/2015
SOL_11_14	T4-4 Solaris 11.1 environments that were initially installed at Solaris 11.0	There is a chance that your IPMP groups are missing their companion ports. Please verify with ipmpstat -g you should see an output similar to the one below with each pair having one active and one inactive port in the bond.	Contact support for remediation so we can ensure they are set up by SuperCluster best practices.
SOL_11_13	T4-4, T5-8 Solaris 11.0 /11.1	The OpsCenter installation enables svc:network/dns/multicast:default which will lead to gpnp issues with RAC installed on same host. <Bug 17024367> You may see ORA-29783: GPnp attribute SET failed with error [CLSGPNP_NOT_FOUND]	Disable dns/multicast: svcadm disable svs:network/dns/multicast:default
SOL_11_12	T4-4, T5-8 Solaris 11.1	If you have issues importing the zone pools or seemingly missing your luns after reboot , but the luns show fine on your storage appliance, you may have encountered this problem. you may also see NOTICE: iscsi connection(5) unable to connect to target iqn. <unique id> in the messages file while the system is coming up	You will have to upgrade to Solaris 11.1 SRU 7.5 and obtain IDR 808.x. ( See item SOL_11_15 for more information) This will expose issue SOL_11_10 if you will be running Oracle databases out of this LDom or Solaris 11 zones inside the domain get an IDR for issue SOL_11_10
SOL_11_11	T4-4, T5-8,M6-32 Solaris 11.0 and 11.1	Memory capping for zones is not supported on SuperCluster. Memory capping can be an effective tool for memory management when configured correctly. However, it is possible for problems to occur when memory capping is misconfigured (for example, the cap is set lower than the resident set size of memory).	Check the export or your zones via zonecfg and if you have capped memory you are advised to use alternative methods for managing memory on SuperCluster. This applies to both virtual and physical memory capping
SOL_11_10	T4-4, T5-8 Solaris 11.1 SRU 7.5 only	Solaris 11.1 SRU 7.5 is only supported with oracle Databases using 11.2.03.16,11.2.0.3.17 and 11.2.0.3.18 respectively with idr 552.1 due to an issue with diskmon compatibility.	Contact support for this IDR if you are on 7.5 and meet this criteria. The fix is also available in Solaris 11.1 SRU 8.4
SOL_11_9	T4-4 Solaris 11.0 and Solaris 11.1	Solaris Exadata , SPARC SuperCluster Adaptive replacement cache (ARC) issue can lead to slow compute node performance, apparent LDom hangs and possible node eviction	Due to Solaris bug 15813348 fixed in 11.1 SRU 3.4 and above.
SOL_11_8	T4-4 Solaris 11.1	After installing QFSDP component SSCMU_2013.04.tar.gz to Solaris domains, 'zpool status' will report that a new ZFS pool version is available. Booting earlier boot environments will not be possible once the ZFS pool version is upgraded. For this reason, the ZFS pool version should not be upgraded immediately. Instead, it is recommended to operate the system in production, allowing it to 'soak' at the April QFSDP software level. When sufficient soak time has elapsed to satisfy the operator the system is running nominally, and there has been no reason to revert to the previous boot environment, then the operator may proceed to upgrade the ZFS pool version.	To upgrade the zpool format, issue the commands 'zpool upgrade -a', 'zfs mount -a', 'zfs upgrade -a' and 'beadm create SSCMU_2013.04_ZU'. Creation of the SSCMU_2013.04_ZU boot environment is done purely for the purposes of backup. Note that the ZFS pool upgrade takes place in the live environment, there is no need to reboot the system.
SOL_11_7	Solaris 11.1	Fix for <BUG 16409079> threads stuck in fed_baseline are causing Oracle DB timeouts. This can surface as node evitions and slowly degrading performance on thos with Solaris 11.1 SRU4 and above. This is a mandatory patch for any one on the QFSDP for APRIL 2013. Also for those that have applied Solaris 11.1 SRU4 or above to address other issues.	Download and apply idr482.7 in accordance with <Document 1547278.1>. The IDR can be obtained via <Patch 17898194> Please note the IDR is only for Solaris 11.1 SRU 5.5 if need be upgrade to Solaris 11. SRU 5.5 prior to applying this IDR. The preferred method of upgrading to Solaris 11 SRU 5.5 is via the April 2013 QFSDP <Patch 16346054>. If you have the issue on SRU 7.5 contact support as the IDR for that version is still in testing.
SOL_11_6	T4-4,T5-8 Solaris 11.1	RDS services is disabled by default after upgrading to Solaris 11.1 any SRU. This will cause CRS to not be able to start due to ASM not being able to reach the cells. This applies to both LDoms and Zones	enable the service before or after after upgrading Solaris 11 LDoms and Zones #svcadm enable rds Please not this should not be an issue with newer install tools but if manually bringing an older T4-4 system up from scratch it may be an issue.
SOL_11_5	T4-4 , T5-8, M6-32 Solaris 11.0 & Solaris 11.1	RDS issues contributing to RDS Latency RAC Node Evictions Intermittent spikes in cluster waits ORA-27300 MTU errors	Install and verify ssctuner is running all LDoms and your /etc/system file will be maintained with best practices. #svcs -a \|grep ssctuner Please make sure you update ssctuner with each QFSDP and make sure you have rebooted after it is updated. As of Jan 2014 QFSDP this is done as part of install_smu.
SOL_11_4	T4-4 , T5-8 Solaris 11.0 & Solaris 11.1	Solaris 11 and ZFS Storage Appliance Software (ZFSSA) May Encounter Data Integrity Issues Following an Unclean Shutdown of the System	Solaris 11 and ZFS Storage Appliance Software (ZFSSA) May Encounter Data Integrity Issues Following an Unclean Shutdown of the System <Document 1502451.1>
SOL_11_3	T4-4 Oracle Solaris 11.0 pre SRU12	Bug 7174049: ixgbe unplumbed interfaces show errors in /var/adm/messages This can also lead to performance issue such as delaying kernel threads which may trigger some RDS latency	Plumb up unused 10g interfaces to prevent the polling for interface in `dladm show-phys \| grep ixgbe \| grep down \| awk '{print $1}'`; do ipadm create-ip $interface; done Apply October QFSDP . See SPARC SuperCluster T-4 with Exadata Storage Server: Supported Versions <Document 1567979.1> for more information on the latest patches.
SOL11_2	T4-4 , T5-8,M6-32 Solaris 11.0 & Solaris 11.1	CR 7172851 System hang, threads blocked in DISM code	SuperCluster - OSM ( Optimized Shared Memory ) (12c) is Supported on SuperCluster DISM ( Dynamic Intimate Shared Memory )(11g) is not <Document 1468297.1>
SOL11_1	T4-4,T5-8 All LDoms running Solaris 11.x	7157525: rds-ping reports spikes of very high latency	1) As root run modinfo and get the module id for nxge 2) As root "modunload -i XXX" (where XXX is the nxge module ID from modinfo). 3) Add the following to /etc/system to prevent it from reloading #Prevent the unused nxge driver from loading, as a temporary #workaround for CR 7157525 exclude:nxge

Oracle Solaris 10

#	Applies to	Issue	Fix or Workaround
SOL_10_6		Systems running Oracle Virtual Machine (OVM) Server for SPARC are affected by a race condition in virtual switch in Solaris 11.1.9.5.1 through 11.1.13.6.0 and in Solaris 10 with patch 150031-02 through 150031-04. The race condition may cause physical interface hang or a TCP packet corruption for packets originated by Guest domains. The corruption can occur only when 'extended-mapin-space' is set to 'off' and only occurs during periods of high stress on the network interfaces. The only packets affected are those destined for the external network.	Follow the workaround in Solaris 10 and 11 Virtual Network Switch Can Corrupt TCP Packets Or Hang Interface When 'extended-mapin-space' is Off (Doc ID 1593243.1). # ldm set-domain extended-mapin-space=on <ldom name> For all LDoms The Ldoms will have to rebooted once this modification is made
SOL_10_5	Oracle Solaris 10 LDoms running on SuperCluster	After installing QFSDP component SSCMU_2013.04.tar.gz to Solaris domains, 'zpool status' will report that a new ZFS pool version is available. Booting earlier boot environments will not be possible once the ZFS pool version is upgraded. For this reason, the ZFS pool version should not be upgraded immediately. Instead, it is recommended to operate the system in production, allowing it to 'soak' at the April QFSDP software level. When sufficient soak time has elapsed to satisfy the operator the system is running nominally, and there has been no reason to revert to the previous boot environment, then the operator may proceed to upgrade the ZFS pool version.	To upgrade the zpool format, issue the commands 'zpool upgrade -a' and 'lucreate -n SSCMU_2013.04_ZU'. Creation of the SSCMU_2013.04_ZU boot environment is done purely for the purposes of backup. Note that the ZFS pool upgrade takes place in the live environment, there is no need to reboot the system.
SOL_10_4	SolarisCluster3.3u1 on Oracle Solaris 10 running on SuperCluster	READMEs refer to operator to the 'Oracle Solaris Cluster System Administration Guide' for detailed patching instructions. These instructions indicate to use 'boot -sx', and to perform patching in single user mode if patching the active boot environment.	IMPORTANT do not use 'boot -sx', use instead 'boot -x' to boot the system out of cluster mode, and then perform patching while the domain is multi-user mode
SOL_10_3	Oracle Solaris 10 running on SuperCluster	Bug 7174049: ixgbe unplumbed interfaces show errors in /var/adm/messages This can also lead to performance issue such as delaying kernel threads which may trigger some RDS latency	Plumb up unused 10g interfaces to prevent the polling for interface in `dladm show-dev \| grep ixgbe \| grep down \| awk '{print $1}'`; do touch /etc/hostname.$interface;ifconfig $interface up; done.
SOL_10_2	Oracle Solaris 10 running on SuperCluster	CR 7146107 ib_sw/ibd S10 ldoms lose connectivity to IB Fabric on SPARC SuperCluster	Change the Solaris 10 LDOMS, Zones and branded zones to use User Datagram in stead of Reliable Connection 1)edit /kernel/drv/ibd.conf change the 1s to 0s 2)reboot the Zone,LDOM , etc....
SOL_10_1	Oracle Solaris 10 running on SuperCluster	CR 7157525 RDS excessively high rds-ping latency	1) As root run modinfo and get the module id for nxge 2) As root "modunload -i XXX" (where XXX is the nxge module ID from modinfo). 3) Add the following to /etc/system to prevent it from reloading #Prevent the unused nxge driver from loading, as a temporary #workaround for CR 7157525 exclude:nxge

Infiniband Switches

#	Applies to	Issue	Fix or Workaround	Date Updated

Exadata Storage Cells

PLEASE NOTE: All Cell , Generic Database , Switch and GI issuesfrom the Exadata Critical Issues Note also apply.


#	Applies to	Issue	Fix or Workaround	Date Updated

ESS_11 (EX_40 in Exadata Critical Issues)	Exadata Storage cells running older version of 12.2 or 12.1	(EX40) Storage servers with 8TB high capacity disks running Exadata older 12.2 or 12.1 versions require software update to receive replacement drives <Document 2352138.1>	Upgrade the cell software in accordance with: (EX40) Storage servers with 8TB high capacity disks running Exadata older 12.2 or 12.1 versions require software update to receive replacement drives <Document 2352138.1>	1/23/2018
ESS_10	Exadata Storage cells running 12.1.x and 12.2.x with version 12.1 databases	Byte Swap Optimization can lead to flash card failures that could cascade down to data loss and/or corruption.	Mandatory fix to disable Byte Swap Optimization if on or before upgrading to Exadata storage cell versions 12.1.x and 12.2.x SuperCluster: Critical Issue ESS_10 Mandatory Action to Disable Byte Swap Optimization on All Cells or Risk Data Loss or Corruption <Document 2325475.1>	11/6/2017
ESS_9 (EX_37 in Exadata Critical Issues)	Exadata X6 storage servers with write-back flash cache enabled using the default flash firmware supplied with Exadata versions lower than 12.1.2.3.4.	Bug 25595250 - A flash predictive failure on an Exadata X6 storage server with write-back flash cache enabled may lead to corruption in primary and/or secondary ASM mirror copies, and may propagate to other storage servers during certain ASM rebalance operations.	Fixed in Exadata 12.1.2.3.4. See Document 2242320.1 for details. NOTE: going out of band to 12.1.2.3.4 or higher and DB is running 11.2.0.4 any BP version then they must have this one off applied . 13245134 See Exadata/SuperCluster: SQLs fail with ORA-27626: Exadata error: 242 (Doc ID 2250760.1)	04/11/2017
ESS_8 (EX_ in Exadata Critical Issues)	SuperCluster all version with deployments that deploy multiple gird disks per cell disk.	After storage server upgrade from 12.1.2.1.3 or earlier, CREATE or ALTER GRIDDISK may result in cell disk metadata corruption, error ORA-600 [addNewSegmentsToGDisk_2], and loss of cell disk content	Review (EX31) After storage server upgrade from 12.1.2.1.3 or earlier, CREATE or ALTER GRIDDISK may result in cell disk metadata corruption, error ORA-600 [addNewSegmentsToGDisk_2], and loss of cell disk content <Document 2195523.1> to determine if you are at risk and recommended course of action.	10/26/2016
ESS_7	SuperCluster, all versions, with storage cell version 12.1.2.3.0 to 12.1.2.3.2	ORA-600 [RWOROFPRFASTUNPACKROWSETS:OOBP] ORA-600 [kcfis_dump_app_state ORA-600 [kcfis_dump_global_context_all ORA-600 [qesrCopyOrigRowsetOpns()+480] ORA-600 [kxhrHash()+64]	Follow SuperCluster: Critical Issue ORA-600 [RWOROFPRFASTUNPACKROWSETS:OOBP] alone or combined with other ORA-600 errors possible with cell version 12.1.2.3.0 and above <Document 2196717.1> Even if you are not seeing this error , disabling this optimization should be considered mandatory.	10/24/2016
ESS_6 (EX_24 in Exadata Critical Issues)	Exadata Storage Server 12.1.2.1.0 and 12.1.2.1.1	After replacing a failed system disk (disk 0 or disk 1), the new disk is not correctly configured leaving the system vulnerable to the other system disk failing.	Fixed in Exadata 12.1.2.1.2. See Document 2032402.1 for additional details.	08/15/2015
ESS_5 (EX_23 in Exadata Critical Issues)	Exadata Storage Server 12.1.2.1.0 and 12.1.2.1.1	Bug 21174310 - Wrong results, ORA-1438 errors, or other internal errors are possible from smart scan offloaded queries against HCC or OLTP compressed tables stored on Exadata storage for databases upgraded from Oracle Database 11.2 to 12.1.	Fixed in Exadata 12.1.2.1.2. See Document 2032464.1 for additional details. This also requires DB <Patch 20881450>	08/15/2015
ESS_4	Cell version 12.1.2.1.2	Unable to create new databases or start existing databases against > 12.1.2.1.1 cells due to control file errors such as ora-00200 or ora-00205. This is due to a new cell check that looks for unique database DB_UNIQUE_NAMES across all instance sharing the cells regardless of which virtual hosts they are in.	Please carefully review and follow 2044088.1	08/15/2015
ESS_3	All cell versions before 12.1.2.1.0	An exadata storage cell metadata corruption can occur after an indeterminant number of create and/or alter griddisk commands . Please carefully read and run the script from: Bug 19695225 - Running Many Create or Alter Griddisk Commands Over Time Causes Cell Disk Metadata Corruption (ORA-600 [addNewSegmentsToGDisk_2]) and Loss of Cell Disk Content (Doc ID 1991445.1)	If you are in the bug condition, which means the script returns a 31 . Stop everything and take an rman backup of every single database and then open an SR with Support using your SuperCluster CSI. Please note the likelihood of this is rare. For all others , no matter the return value please obtain <Patch 19695225> for your version or upgrade to 12.1.2.1.	08/15/2015
ESS_2	11.2.0.4 Database and Grid Infrastructure with 11.2.3.2.1 Exadata Storage Cells	KFOD can not discover disks. Can be encountered during upgrades as well and will indicated by rootupgrade.sh coting disk not found errors.	If upgrading or installing 11.2.0.4 on SuperCluster that will be accessing 11.2.3.2.1 storage cells then you must apply <Patch 16547261> prior to the upgrade/install.

Database


#	Applies to	Issue	Fix or Workaround	Date Updated
DB_29	GIPSU 12.2.0.1.171003/171017 part of QFSPD Oct 2017	Node eviction stemming from pfiles being run against GI processes causing thier threads to get delayed	SuperCluster : Node eviction after apply GIPSU 12.2.0.1.171003/171017 part of QFSPD Oct 2017 <Document 2176610.1>	12/19/2017
DB_28	SuperCluster Grid Infrastructure provided with JULY 2017 QFSDP 12.1.0.2.170718	RAC nodes failing to start due to voting disk corruption following patching	Apply OCW <Patch 26512962> to the Grid Infrastructure Home.	09/13/2017
DB_27	SuperCluster systems running 12.1.0.2.161018 or 12.1.0.2.160719 OCWPSU	Generic bug impacting SuperCluster systems Frequent RAC Node eviction that appears to be network heartbeat related. If detected in time A pstack <ocssd.bin pid> will show several threads in the function clsdadr_bucket_syncb. This Patch should be considered mandatory. If scheduled to patch to OCT 2016 QFSDP it is strongly advised to go to JAN 2017 to save time in your patching window as item SOL_11_20 is mandatory as well for both JUL 2016 and OCT 2016 QFSDP patch levels	Permanent fix in JAN 2017 QFSDP. <Note 2227319.1> SuperCluster Critical issue: DB_27 :Mandatory patch: Bug 25233268 Leading to Frequent Node Evictions with JUL and OCT 2016 QFSDP
DB_26	All Database all versions regardless of storage used or if deployed in DB or application domains.	Use of database parameter use_large_pages=false is completely unsupported on SuperCluster systems. Using it can cause unnecessary performance implications at the DB or OS kernel level, especially for larger SGA sizes.	Set use_large_pages=true or unset it completely, as true is the default, and restart your databases. Solaris operating system, by default is optimized to take advantage of large pages.	10/24/2016
DB_25	11.2.0.4 through 12.1.0.2	RAC Node evicting and not rejoining the cluster	SuperCluster : RAC : CRS not able to rejoin the cluster following node eviction or reboot due to CSSD. <Document 2166436.1>	7/28/2016
DB_24	12.1.0.2	ASM XDMG process exiting in a way where it can hang zones and or logical domains	SuperCluster: 12.1.0.2 ASM XDMG process causes hang in PR_P_LOCK on ORA-15311: process terminated due to fenced I/O <Document 2166445.1>	7/28//2016
DB_23	CRS all versions	Storage network improperly listed as a public interface in OIFCFG	SuperCluster: storage network IB interface listed as public in oifcfg getif could result in improper nodeapp VIP failover. <Document 2150668.1>	6/16/2016
DB_22	DB and ASM 11.2.0.3 BP 25 and below	Bug 20116094, the ASM/kfod does not discover the Griddisks on SPARC systems runnig 11.2.0.3 BP 25 and below against 12.1.2.1.0 and above storage cells. This issue is typically found during patching if the cells are patched prior to the DB and GI homes.	The bug has been fixed in 11.2.0.3 BP 26 onwards Apply 11.2.0.3 BP 26 or one-off patch for Bug.20116094	09/1/2015
DB_21	ASM 12.1.0.2	Bug 21281532 - ASM rebalance interrupted with errors ORA-600 [kfdAtbUpdate_11_02] and ORA-600 [kfdAtUnlock00].	See Document 2031709.1 for additional details.	08/15/2015
DB_20	ASM 12.1.0.2	Bug 20904530 - During disk resync ORA-600 [kfdsBlk_verCb] reported due to corruption in ASM staleness registry.	See Document 2028222.1 for additional details.	08/15/2015
DB_19	11.2.0.4.x running against 12.1.2.x Exadata storage cells	After restoring an RMAN backup in this combination and running a subsequent backup or running DBV data block corruption is detected. trace file could show Bad header found during validation.	<Patch 20952966> for the 11.2.0.4 DB home(s) or redo the initial restore with the workaround of setting _cell_fast_file_restore=FALSE in the database SPfile. The patch is the preferred approach and it should be considered mandatory for all 11.2.0.4 Databases accessing Exadata Sotrage cell version 12.1.2.1.x	5/9/2014
DB_18	11.2.0.4 and 12.1.0.2	11.2.0.4 Bug 10194190 - Solaris: Process spin and/or ASM and DB crash if RAC instance up for > 248 days <Document 10194190.8> 12.1.0.2 Bug 22901797 - LMHB (OSPID: 29160): TERMINATING THE INSTANCE DUE TO ERROR 29770	Fixed in 11.2.0.4.9 and above. Other documentation makes it appear this is fixed in 12.1.0.2.5 but it is not it is fixed in April 2016 12.1.0.2.DBBP:160419	6/7/2016
DB_17	12.1.0.2.4 (JAN 2015 level)	Bug 20591915 has introduced a regression in DBBP 12.1.0.2.4 (Jan2015) for Solaris SPARC Exadata machines. Because of this regression XMDG process may crash on SuperCluster causing asm core files under the GI home. Also can be rediscovered with Bug 20591915 - Grid disk asmmodestatus query hangs when a grid disk is inactive. This issue causes CellCLI command "list griddisk attributes asmmodestatus" to hang, which subsequently causes rolling cell patching to hang when upgrading from Exadata 12.1.1.1.1, or earlier, to any later Exadata version when Grid Infrastructure is version 12.1.0.2.4 (DBBP4) or 12.1.0.2.5 (DBBP5).	<Patch 20591915> to 12.1.0.2.4 Please note this needs to be applied in the ASM home (GI) This is also fixed in 12.1.0.2.6 PSU and above.
DB_16	11.2.0.3 through 12.1.0.2	Critical Performance enhancements for the database on SPARC 19308965 RAW HAZARDS SEEN WITH RDBMS CODE ON SOLARIS T5 13846337 QESASIMPLEMULTICOLKEYCOMPARE NOT OPTMIZED FOR SOLARIS SPARC64 12660972 CHECKSUM CODE NEEDS REVISTING IN LIGHT OF NEW PROCESSORS	11.2.0.3.21 or later plus <Patch 20097385> and <Patch 12660972> 11.2.0.4.15 and BP below plus <Patch 19839616> and <Patch 12660972> 11.2.0.4.16 and above plus <Patch 12660972> 12.1.0.2.6 and below plus <Patch	6/23/2015
DB_15	11.2.0.3 and 11.2.0.4 ASM instances	<Bug 17997507> - xdmg process exits without closing skgxp context when ora-15311 is seen. This can actually show its self as EXAVM database zones getting stuck in the shutdown state in conjunction with a command like ps -ef hanging in the global zone.	Fixed in 11.2.0.3.24 and 11.2.0.4.7. If at a BP previous search on MOS for patch 17997507 and SPARC if one does not exist for your BP level contact support
DB_14	11.2.0.3 Grid Infrastructure	<Bug 13798847> - add multilple ports to scan_listener fails	Apply the latest SuperCluster 11.2.0.3.9 GI PSU Merge which will be documented in the Supported Versions note for your hardware type. Latest is MLR <Bug 19459715>
DB_13	11.2.0.3 to 12.1.0.1 Grid Infrastructure	<Bug 17722664> - clsa crash during client connection cleanup for large number of changing connect. fixed in 12.1.0.2	11.2.0.3 Apply the latest SuperCluster 11.2.0.3.9 GI PSU Merge which will be documented in the Supported Versions note for your hardware type. Latest is MLR <Bug 19459715> 11.2.0.4 contact support for a merge in with <Bug 16429265> and your current level of GI PSU 12.1.0.1 contact support for a merge in with your current level of GI PSU Fixed in 12.1.0.2
DB_12	Systems with one of the following grid infrastructure home versions: 11.2.0.4 BP1-BP5 11.2.0.3 BP22	Same as item DB_24 in the Exadata critical issues note	Fixed in BP 23 and above however you should get the fix for DB_11-14 va the latest SuperCluster 11.2.0.3.9 GI PSU Merge which will be documented in the Supported Versions note for your hardware type. Latest is MLR <Bug 19459715>
DB_11	11.2.0.3 and 11.2.0.4 Grid Infrastructure	<Bug 17443419> - chm (ora.crf) can't be online in solaris local zone (solaris sparc64) fixed in 12.1	Apply the latest SuperCluster 11.2.0.3.9 GI PSU Merge which will be documented in the Supported Versions note for your hardware type. Latest is MLR <Bug 19459715>
DB_10	11.2.0.3 to 11.2.0.4 GI / ASM upgrade	<Bug 17837626 > HAIP failures in the `orarootagent_root.log` `CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'hostname' failed CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on 'hostname' CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'hostname' succeeded CRS-4000: Command Start failed, or completed with errors.`	Workaround In the 11.2.0.4 home cd <GRID_HOME>/crs/install# vi s_crsconfig_lib.pm Look for the funcion s_is_sun_ipmp and make the following change at the end of that function /is_sun_ipmp # made it all the way out without finding any IPMP private #return FALSE; return TRUE;
DB_9	RMAN incremental backups created with one of the following database patch sets: 12.1.0.1 GIPSU1 or earlier 11.2.0.3 BP21 or earlier any 11.2.0.2 any 11.2.0.1	Bug 16057129 - Exadata cell optimized incremental backup can miss some blocks if a database file grows larger while the file is being backed up. A missed block can lead to stuck recovery and ORA-600[3020] errors if the incremental backup is used for media recovery. See Document 16057129.8 for details. Existing RMAN incremental backups taken without the bug fix in place should be considered invalid and not usable for database recovery, incrementally updating level 0 backups, or standby database creation. RMAN full backups, level 0 backups that are not part of an incrementally updated backup strategy, and database recovery using archived redo logs are not affected by this issue.	Step 1.Set the following parameter in all databases that use Exadata storage: _disable_cell_optimized_backups=TRUE SQL> alter system set "_disable_cell_optimized_backups"=TRUE scope=both; The parameter specified above may be removed after the fix for bug 16057129 is installed by upgrade or by applying an interim patch. See below for fix availability. Step 2. Create new RMAN backups. Minimally a new RMAN cumulative incremental backup must be taken. In addition, level 0 backups that are part of an incrementally updated backup strategy must be recreated. Fix availability Fixed in 12.1.0.1 GIPSU2 (planned January 2014) Fixed in 11.2.0.4.0 Fixed in 11.2.0.3 BP22 (planned January 2014) Fixed in Patch 16057129 for 11.2.0.3 BP21 Fixed in Patch 17599908 for 11.2.0.2 BP22
DB_8	All DB versions	For RAC Databases in LDoms or Zones with more than one IB bond interface , onecommand is not setting all interfaces in oifcfg nor in cluster_interconnects parameter of ASM and DB spfiles	You can check this in ASM and DB instances with a show parameter cluster_interconnect and at the RAC level oifcfg getif. If you have multiple interfaces available add them into oifcfg and into cluster_interconnects in each ASM and DB instance. Make sure you assign the right IP addresses in cluster interconnects to the right sids based on hsot location of instance.
DB_6	11.2.0.3.x Grid Infrastructure	<Bug 13604285> ora.net1.network keeps failing over. Key indicator `"Networkagent: check link false"` in orarootagent log combined with the network resource constantly failing over around the cluster nodes.	All current exadata bundle patches through BP21 require this fix. If you have existing one offs on your Grid Infrastructure you will have to open an SR to Support for a merge.
DB_4		<Bug 12865682>- byte swapping causing some of the extra overhead. This can lead to a performance degradation in hash join plans on big endian platforms.	Download and apply <Patch 12865682> for Solaris SPARC to all of your 11.2.0.3.x database homes even if they are not using the Exadata storage. This is now considered a mandatory patch for SPARC SuperCluster. This does not need to be backported to a specific Exadata BP level as it does not conflict with the bundle patch. This is now fixed as part of 11.2.0.4, 12.1.x and 11.2.0.3.21 Bundle patch, This means you do not need this patch if 11.2.0.3.21 or beyond.	5/9/2015
DB_2	11.2.0.3.x Grid Infrastructure and DB	Default thread priority of RT (Real Time) for LMS can cause blocking of kernel threads to the CPU. Also LGWR being at TS (Thread Select) can lead to excessive log writer write times leading to general database performance issues.	The fix for this is often called the Critical Threads fix or the FX-60 fix. There are multiple ways to correct this. One is to apply a one off patch to all Database and Grid Infrastructure homes. The one off patch can be downloaded using <Patch 12951619>. The prefered method for systems without databases in zones is to be patched to OCT 2013 QFSDP and ensure you have installed and are running the ssctuner service from that exafmily version, ssctuner@0.5.11,5.11-1.5.0.5. For databases running in exavm zones you need to be at the version of ssctuner provided in the JAN 2014 QFSDP, ssctuner@0.5.11,5.11-1.5.9.237, and above and ensure your zones are running with the TS scheduling class see <Document 1618396.1> for more information on how to verify and rectify the scheduling class and how to update ssctuner out of band with the QFSDP. Please also review and comply with SuperCluster - ssctuner is not adjusting the scheduling class of all of lms , lgwr and vktm processes to FX-60 Document 1628298.1. This is an additional step that has to be done under the supervision of an Oracle badged employee.
DB_1	All Database versions	CR 7172851 System hang, threads blocked in DISM code	Dynamic Intimate Shared Memory (DISM) is not supported for use on SPARC SuperCluster Solaris environments in instances other than the ASM instance <Document 1468297.1>

ZFS Storage Appliance


#	Applies to	Issue	Fix or Workaround	Date Updated
ZFS_2	All Platforms / All Versions	Using ZFS Deduplication is not supported on SuperCluster	None, Using ZFS Deduplication is not supported on SuperCluster. Oracle SuperCluster ZFS Storage Appliance Best Practices <Document 2002988.1>	7/11/2016
ZFS_1	2011.1.3 (Version string 2011.04.24.3.0,1-1.19) 2011.1.4 (Version string 2011.04.24.4.0,1-1.21) 2011.1.4.1 (Version string 2011.04.24.4.1,1-1.21)	Solaris 11 and ZFS Storage Appliance Software (ZFSSA) May Encounter Data Integrity Issues Following an Unclean Shutdown of the System	Solaris 11 and ZFS Storage Appliance Software (ZFSSA) May Encounter Data Integrity Issues Following an Unclean Shutdown of the System <Document 1502451.1>

SuperCluster Hardware Issues


#	Applies to	Issue	Fix or Workaround	Date Updated
HW_2	SPARC M6-32 - Version All Versions and later	Probable fault diagnosis failures please apply firmware in the "Fix or Workaround Section"	<Patch 22982110>SPARC M5-32 and M6-32 Servers With Sun System Firmware 9.4.2.d, 9.4.2.e, 9.5.1.c, or 9.5.3 may Misidentify Faulty Components or Fail to Diagnose Faulty Components (Doc ID 2133737.1)	5/20/2016
HW_1	SPARC M6-32 - Version All Versions and later SPARC T5-8 - Version All Versions and later Oracle SuperCluster T5-8 Full Rack - Version All Versions and later Oracle Exalytics T5-8 - Version All Versions and later SPARC T5-2 - Version All Versions and later Information in this document applies to any platform.	PCIEX-8000-J5 and/or PCIEX-8000-KP FMA faults similar to those below will be reported. Systems utilizing InfiniBand fabric, i.e., T5-8 SSC are seen to be more susceptible to these faults. The fmdump -e output will contain ereport.io.pciex.dl.btlp and/or ereport.io.pciex.dl.bdllp events.	Follow I/O SERD threshold values are set too low and may result in PCIEX-8000-J5 and PCIEX-8000-KP faults. <Document 1617956.1>

Archive Table

Items in this table are issues that only apply to systems over a year behind on QFSDP and SSCTUNER levels.

# Issue Date Archived

DB_3

11.2.0.3.x Grid Infrastructure

. Bug 16619733 - "FAILED TO PATCH QOS USERS" DURING PATCH INSTALLATION OF BP17 (11.2.0.3.17).

patch /u01/app/oracle/patches/16474946/16315641 apply failed for home/u01/app/11.2.0.3/grid

On some SPARC SuperCluster systems, the file "racgvip" may have been manually modified to workaround an issue with the path to the "whoami"utility. Prior to applying the 11gR2 BP17, please ensure that the "racgvip" file is owned by the grid infrastructure user typically user "oracle".

08/15/2015

DB_5

11.2.0.3.x Grid Infrastructure

<Bug 16562733> Instance eviction due to loss of voting file access. CRS and CSSD logs show CRS-1604:CSSD voting file is offline.

All current exadata bundle patches through BP21 require this fix.

If you have existing one offs on your Grid Infrastructure you will have to open an SR to Support for a merge.

08/15/2015

DB_7

11.2.0.4 Database and Grid Infrastructure with 11.2.3.2.1 Exadata Storage Cells

KFOD can not discover disks

If upgrading or installing 11.2.0.4 on SuperCluster that will be accessing 11.2.3.2.1 storage cells then you must apply <Patch 16547261> prior to the upgrade/install.

08/15/2015

ESS_1

Aura 1.x flash DOM firmware version D20Y

or earlier is supplied with Exadata Storage

Server software version 11.2.3.2.0 and earlier

Recent Aura 1.x flash DOMs in

Exadata Database Machine X2 and V2 and

SPARC SuperCluster T4-4 storage servers may prematurely

fail if using firmware version D20Y or earlier

Aura 1.x flash DOMs require firmware update in storage servers in Exadata Database Machines X2, V2,

and SPARC SuperCluster T4-4 systems <Document 1504776.1>

08/15/2015

IB_1

Sun Datacenter InfiniBand Switch 36

software 1.3.3-2

Unpublished CR 7013467: 1.3.3 Too strict error handling in partitiond.

This can cause subnet manager to not start properly after a complete power cycle of th switch.

Fixed in Switch software version 2.0.1

Apply October QFSDP . See SPARC SuperCluster T-4 with Exadata Storage Server: Supported Versions <Document 1567979.1> for more information on the latest patches.

Workaround if this occurs:

1) Login as root on the infiniband switch

2) issue following command # smpartition start; smpartition commit

3) if this does not work open an SR to Oracle Support clearly state that you have lost your IB partitioning.

08/15/2015

SOL_11_15

T4-4,T5-8

Solaris 11.1 SRU 7.5

IDR 808.x supersedes the IDRS 553.4, 562.1 and 570.1 . The specific incarnation of the IDR to be delivered will be the latest offered. This contains additional memory management fixes and iscsi fixes that were not in the combined IDR set. This could expose it's self as node evictions due to apparent network timeouts or periodic slow downs with threads stuck in sleeps or waits for memory management functions.

Contact support to get this IDR before patching exercises that are targeting the JUL 2013 or OCT 2013 QFSDP levels. Or if you are encountering random node evictions or intermittant performance issues while at the JUL 2013 or OCT 2013 QFSDP levels.

08/15/2015

References

<NOTE:1567979.1> - Oracle SuperCluster Supported Software Versions - All Hardware Types
<NOTE:1632521.1> - SuperCluster- Deprecated Document - Solaris 11 Support Repository Updates (SRU) and SuperCluster specific IDR Support Matrix.
<NOTE:1468297.1> - SuperCluster - OSM ( Optimized Shared Memory ) (12c) is Supported on SuperCluster DISM ( Dynamic Intimate Shared Memory )(11g) is not
<NOTE:1628298.1> - SuperCluster - ssctuner is not adjusting the scheduling class of all of lms , lgwr and vktm processes to FX-60
<NOTE:2325475.1> - SuperCluster: Critical Issue ESS_10 Mandatory Action to Disable Byte Swap Optimization on All Cells or Risk Data Loss or Corruption
<NOTE:1569461.1> - SuperCluster - Patching Best Practices For The Quarterly Full Stack Download Patch

Attachments

This solution has no attachment

Applies to:

Purpose

Scope

Details

SuperCluster Tools

SCT_1

SCT_2

SCT_3

Oracle Engineered Systems Hardware Manager (OESHM)

OESHM_1

Oracle Solaris 11

SOL_11_21

SOL_11_20

SOL_11_19

SOL_11_18

SOL_11_17

SOL_11_16

SOL_11_14

SOL_11_13

SOL_11_12

SOL_11_11

SOL_11_10

SOL_11_9

SOL_11_8

SOL_11_7

SOL_11_6

SOL_11_5

SOL_11_4

SOL_11_3

SOL11_2

SOL11_1

Oracle Solaris 10

SOL_10_6

SOL_10_5

SOL_10_4

SOL_10_3

SOL_10_2

SOL_10_1

Infiniband Switches

Exadata Storage Cells

ESS_11 (EX_40 in Exadata Critical Issues)

ESS_10

ESS_9(EX_37 in Exadata Critical Issues)

ESS_8

(EX_ in Exadata Critical Issues)

ESS_7

ESS_6

(EX_24 in Exadata Critical Issues)

ESS_5

(EX_23 in Exadata Critical Issues)

ESS_4

ESS_3

ESS_2

Database

DB_29

DB_28

DB_27

DB_26

DB_25

DB_24

DB_23

DB_22

DB_21

DB_20

DB_19

DB_18

DB_17

DB_16

DB_15

DB_14

DB_13

DB_12

DB_11

DB_10

DB_9

DB_8

DB_6

DB_4

DB_2

DB_1

ESS_9
(EX_37 in Exadata Critical Issues)