Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-79-1452277.1
Update Date:2018-01-23
Keywords:

Solution Type  Predictive Self-Healing Sure

Solution  1452277.1 :   SuperCluster Critical Issues  


Related Items
  • SPARC SuperCluster T4-4 Half Rack
  •  
  • Oracle Database - Enterprise Edition
  •  
  • Oracle SuperCluster M6-32 Hardware
  •  
  • Oracle SuperCluster Specific Software
  •  
  • Solaris Operating System
  •  
  • Oracle Exadata Storage Server Software
  •  
  • Oracle SuperCluster M7 Hardware
  •  
  • Oracle SuperCluster T5-8 Hardware
  •  
  • SPARC SuperCluster T4-4 Full Rack
  •  
  • Oracle SuperCluster Specific Software
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>SPARC SuperCluster>DB: SuperCluster_EST
  •  




Applies to:

SPARC SuperCluster T4-4 Full Rack - Version All Versions to All Versions [Release All Releases]
Oracle Database - Enterprise Edition - Version 11.2.0.4 to 12.1.0.2 [Release 11.2 to 12.1]
Oracle SuperCluster M6-32 Hardware - Version All Versions to All Versions [Release All Releases]
Oracle SuperCluster Specific Software - Version 1.x to 1.x [Release 1.0]
Oracle Exadata Storage Server Software - Version 11.2.2.3.1 to 11.2.2.3.1 [Release 11.2]
Oracle Solaris on SPARC (64-bit)
Linux x86-64

Purpose

The following tables list fixes and workarounds that are deemed critical for Oracle SuperCluster T4-4 , T5-8, M6-32 and M7.  If you are running an affected release, it is highly recommended that you either employ the recommended workaround, or install the recommended patch.

A new date column has been added per customer request but it will not be retroactively filled out for existing issues

  .


Scope

Only issues affecting Oracle SuperCluster T4-4, T5-8, M6-32 and M7  that meet one or more of the following criteria are included in this document:

  • Causes on-disk corruption or data loss
  • Causes failure that impacts system wide availability
  • Causes intermittent wrong results
  • Is expected to impact a large number of SuperCluster customers

Details

 

There may be some duplication of items from  <Document 1270094.1> Exadata Critical Issues  however you should make sure you review the Exadata document in conjunction with this one

 

For patching specific best practices and known issues refer to SuperCluster - Patching Best Practices  For The Quarterly Full Stack Download Patch <Document 1569461.1>

SuperCluster Tools 

Some of these tools may  not be specific to SuperCluster  but the table will adress their usage on SuperCluster

  

#Applies toIssueFix or WorkaroundDate Updated

SCT_1    

 All SuperCluster versions running Database in Zones.   CAUTION - Java OneCommand can destroy the storage cell disks and griddisks. Take extreme care when you run this utility in an existing environment. In particular, running the Java OneCommand undo option on certain steps (Create Cell disk, for example) can cause complete destruction of all the griddisks on the storage cells.  In addition, re-running the griddisk creation step or mistakenly specifying a non-unique diskgroup in OEDA will result in the destruction of existing griddisks.  Note, too, that older versions of Java OneCommand also destroy cell disks and griddisks with the "Create Cell Disks" step. IMPORTANT WARNING - make sure you backup all existing databases BEFORE running Java OneCommand.

IMPORTANT - Always use the latest OEDA and Java OneCommand patch.  Refer to the OneCommand section in MOS Note 888828.1 for details.
 

SCT_2

SuperCluster systems using osc-config-backup

Tool issues with the following versions requiring mandatory patching

The tool versions affected are v1.1 and v.1.1.1 installed via SuperCluster platform v2.3.8 and v2.3.13.
v1.1 : pkg://exa-family/system/platform/supercluster/osc-config-backup@0.5.11,5.11-2.3.0.1014:20170112T223654Z
v1.1.1 : pkg://exa-family/system/platform/supercluster/osc-config-backup@0.5.11,5.11-2.3.0.1044:20170130T204306Z

If your version is newer than this it does not pertain

Download exafamily Patch 25993487: ORACLE SOLARIS EXA-FAMILY 2.3.0.1095 REPO ISO IMAGE (SPARC 64-BIT)

extract and stage the p5p and set it as an exafamily repo and then pkg update osc-config-backup then unset this specific exafamily publisher

5/27/2017

SCT_3

SuperCluster systems using SuperCluster IO domains

SuperCluster IO domains in "Error" State may be Deleted by SVA Health Monitor (Doc ID 2342509.1)

follow instructions in the referenced note

1/17/2018

Oracle Engineered Systems Hardware Manager (OESHM) 

These issues are specific to OESHM on Oracle SuperCluster

  

#Applies toIssueFix or WorkaroundDate Updated

OESHM_1   

 All SuperCluster M7 systems with OESHM version .   M7 platforms will see frequent SP resets as well as  OESHM errors  about SP connectivity Apply Patch in accordance with SuperCluster OESHM 1.0 causes SP reset due to TLI mismatch <Document 2147363.1>
 6/8/2016


 

Oracle Solaris 11

 
 #

 Applies to

Issue

 Fix or Workaround

Date Updated

         

SOL_11_21

Solaris 11.3

Race condition in  the RDS socket layer will lead to periodic 2 second  operational delays  which will surface as log file parallel write delays that increase the averages reported for log file sync. Operating system Bug 26288397 - LOG FILE SYNC HIGH DUE FREQUENT PERIODIC POSTS TAKING ALMOST 2 SECONDS

Apply supercluster-solaris custom incorporation that includes fix for your SRU level . This fix should be considered mandatory for performance reasons. 

Solaris 11.3 SRU19 Custom Incorporationsolaris/supercluster-solaris@0.5.11,5.11-0.175.3.19.0.5.0.11031905.10000105 (or greater 19.0.5 version)

Solaris 11.3 SRU16 Custom Incorporation solaris/supercluster-solaris@0.5.11,5.11-.175.3.16.0.3.0.11031603.10000106  (or greater 16.0.3  version)

This will be fixed in the OCT 2017 QFSDP custom incorporation

11/6/2017

SOL_11_20

All SuperCluster systems running Solaris 11.3 JUL 2016, or OCT 2016 QFSDP

RDMA anonymous port exhaustion due to scheduled or unscheduled cell outages while DB / CRS is up and running.

Please note the patches for this issue are considered mandatory.

The fix for this issue needs to be in place before attempting rolling cell maintenance.

If scheduled to patch to OCT 2016 QFSDP it is strongly advised to go to JAN 2017 to save time in your patching window as item DB_27 is mandatory as well for both JUL 2016 and OCT 2016 QFSDP patch levels

Permanent fix in JAN 2017 QFSDP.

<Note 2226961.1 > SuperCluster: Critical Issue: SOL_11_20 Mandatory Patch: rdma anon port exhaustion can occur if communication to Exadata Storage Server is interrupted

2/8/2017

SOL_11_19

All Versions OES/ID SMF Service does not exist in many SuperCluster LDoms or Zones. Typically T4-4 but can apply to any manually or OpsCenter created zone. it is mandatory to create this service in all LDoms and or zones that does not have it. Instructions are included in SuperCluster: OES/ID SMF Service does not exist in some SuperCluster LDoms and Zones <Document 2165959.1> 7/28/2016

SOL_11_18

SuperCluster M7 and T5-8 with V2.0 functionality Reboot of SuperCluster IO domains can result in PCIE errors on the Infiniband HCA Follow instructions in SuperCluster - Reboot of SuperCluster IO domains can result in PCIE errors on the Infiniband HCA <Document 2150184.1> 6/16/2016

SOL_11_17

All Hardware types all Solaris versions Transient Threads can lead to instance crashes, node evictions and random database or application performance issues SuperCluster - Transient Threads can lead to instance crashes, node evictions and random database or application performance issues <Document 2149887.1> 6/15/2016

SOL_11_16

All  Hardware types Solaris 11.2 Infiniband switch reboot may cause database evictions on if there are a large number of RDS connections. This should be reviewed prior to applying the JULY 2015 QFSDP as it moves the OS to 11.2 and should also be reviewed for SuperCluster V2.0 cusotmers.
Please review and follow the recommendations from document 2043654.1 08/15/2015

SOL_11_14

T4-4

Solaris 11.1 environments that were initially installed at Solaris 11.0

There is a chance that your IPMP groups are missing their companion ports. Please verify with ipmpstat -g you should see an output similar to the one below with each pair having one active and one inactive port in the bond.

 

 

Contact support for remediation so we can ensure they are set up by SuperCluster best practices.

 

SOL_11_13

T4-4, T5-8

Solaris 11.0 /11.1

The OpsCenter installation enables  svc:network/dns/multicast:default which will lead to gpnp issues with RAC installed on same host.  <Bug 17024367>  You may see ORA-29783: GPnp attribute SET failed with error [CLSGPNP_NOT_FOUND]

Disable dns/multicast:

svcadm disable svs:network/dns/multicast:default

 

SOL_11_12

T4-4, T5-8

Solaris 11.1

If you have issues importing the zone pools  or seemingly missing your luns after reboot , but the luns show fine on your storage appliance, you may have encountered this problem.  you may also see NOTICE: iscsi connection(5) unable to connect to target iqn. <unique id>  in the messages file while the system is coming up

You will have to upgrade to Solaris 11.1 SRU 7.5 and obtain IDR 808.x. ( See item SOL_11_15 for more information)

This will expose issue SOL_11_10 if you will be running Oracle databases out of this LDom or Solaris 11 zones inside the domain get an IDR for issue SOL_11_10

 

SOL_11_11

 T4-4, T5-8,M6-32

Solaris 11.0 and 11.1

Memory capping for zones is not supported on SuperCluster. Memory capping can be an effective tool for memory management when configured correctly. However, it is possible for problems to occur when memory capping is misconfigured (for example, the cap is set lower than the resident set size of memory).

Check the export or your zones via zonecfg and if you have capped memory  you are advised to use alternative methods for managing memory on SuperCluster. This applies to both virtual and physical memory capping

 

SOL_11_10

T4-4, T5-8

Solaris 11.1 SRU 7.5 only

Solaris 11.1 SRU 7.5 is only supported with oracle Databases using 11.2.03.16,11.2.0.3.17 and 11.2.0.3.18 respectively with idr 552.1 due to an issue with diskmon compatibility.

Contact support for this IDR if you are on 7.5 and meet this criteria. The fix is also available in Solaris 11.1 SRU 8.4

 

SOL_11_9

T4-4

Solaris 11.0 and Solaris 11.1

Solaris Exadata , SPARC SuperCluster Adaptive replacement cache (ARC) issue can lead to slow compute node performance, apparent LDom hangs and possible node eviction

Due to Solaris bug 15813348 fixed in 11.1 SRU 3.4 and above.


 

SOL_11_8

T4-4

Solaris 11.1

After installing QFSDP component SSCMU_2013.04.tar.gz to Solaris

domains, 'zpool status' will report that a new ZFS pool version is

available.

Booting earlier boot environments will not be possible once the ZFS pool

version is upgraded. For this reason, the ZFS pool version should *not*

be upgraded immediately. Instead, it is recommended to operate the

system in production, allowing it to  'soak' at the April QFSDP software

level. When sufficient soak time has elapsed to satisfy the operator the

system is running nominally, and there has been no reason to revert to

the previous boot environment, then the operator may proceed to upgrade

the ZFS pool version.

 

To upgrade the zpool format, issue the commands 'zpool upgrade -a', 'zfs mount -a', 'zfs upgrade -a' and 'beadm create SSCMU_2013.04_ZU'.

Creation of the SSCMU_2013.04_ZU boot environment is done purely for the purposes of backup. Note that the ZFS pool upgrade takes place in the

live environment, there is no need to reboot the system.

 

SOL_11_7

Solaris  11.1

Fix for <BUG 16409079> threads stuck in fed_baseline are causing Oracle DB timeouts. This can surface as node evitions and
slowly degrading performance on thos with Solaris 11.1 SRU4 and above. This is a mandatory patch for any one on the QFSDP for
APRIL 2013. Also for those that have applied Solaris 11.1 SRU4 or above to address other issues.

Download and apply idr482.7 in accordance with <Document 1547278.1>. The IDR  can be obtained via <Patch 17898194>

Please note the IDR is only for Solaris 11.1 SRU 5.5 if need be upgrade to Solaris 11. SRU 5.5 prior to applying this IDR. The preferred method of upgrading to Solaris 11 SRU 5.5 is via the April 2013 QFSDP <Patch 16346054>. If you have the issue on SRU 7.5 contact support as the IDR for that version is still in testing.

 

SOL_11_6

T4-4,T5-8

Solaris 11.1

RDS services is disabled by default after upgrading to Solaris 11.1 any SRU.
This will cause CRS to not be able to start due to ASM not being able to reach the cells. This applies to both LDoms and Zones

enable the service before or after after upgrading Solaris 11 LDoms and Zones

#svcadm enable rds

Please not this should not be an issue with newer install tools but if manually bringing an older T4-4 system up from scratch it may be an issue.

 

SOL_11_5

 T4-4 , T5-8, M6-32

Solaris 11.0 & Solaris 11.1

RDS issues contributing to
RDS Latency
RAC Node Evictions
Intermittent spikes in cluster waits
ORA-27300 MTU errors

Install and verify ssctuner is running all LDoms and  your /etc/system file will be maintained with best practices.

#svcs -a |grep ssctuner

Please make sure you update ssctuner with each QFSDP and make sure you have rebooted after it is updated. As of Jan 2014 QFSDP this is done as part of install_smu.

 

 

SOL_11_4

 T4-4 , T5-8

Solaris 11.0 & Solaris 11.1

Solaris 11 and ZFS Storage Appliance Software (ZFSSA)
May Encounter Data Integrity Issues Following an Unclean Shutdown of the System

Solaris 11 and ZFS Storage Appliance Software (ZFSSA) May Encounter Data Integrity Issues Following an Unclean Shutdown of the System <Document 1502451.1>

 

SOL_11_3

T4-4

Oracle Solaris 11.0 pre SRU12

Bug 7174049: ixgbe unplumbed interfaces show errors in /var/adm/messages

This can also lead to performance issue such as delaying kernel threads which may
trigger some RDS latency

Plumb up unused 10g interfaces to prevent the polling

for interface in `dladm show-phys | grep ixgbe | grep down | awk '{print $1}'`; do ipadm create-ip $interface; done

Apply October QFSDP . See SPARC SuperCluster T-4 with Exadata Storage Server: Supported Versions

<Document 1567979.1> for more information on the latest patches.

 

SOL11_2

T4-4 , T5-8,M6-32

Solaris 11.0 & Solaris 11.1

CR 7172851 System hang, threads blocked in DISM code SuperCluster - OSM ( Optimized Shared Memory ) (12c) is Supported on SuperCluster DISM ( Dynamic Intimate Shared Memory )(11g) is not  <Document 1468297.1>  

SOL11_1

T4-4,T5-8

All LDoms running Solaris 11.x

7157525: rds-ping reports spikes of very high latency

1) As root run modinfo and get the module id for nxge
2) As root "modunload -i XXX" (where XXX is the nxge module ID from modinfo).
3) Add the following to /etc/system to prevent it from reloading
#Prevent the unused nxge driver from loading, as a temporary
#workaround for CR 7157525
exclude:nxge

 

  Oracle Solaris 10

#Applies toIssueFix or WorkaroundDate Updated

SOL_10_6

 

Systems running Oracle Virtual Machine (OVM) Server for SPARC are affected by a race condition in virtual switch in Solaris 11.1.9.5.1 through 11.1.13.6.0 and in Solaris 10 with patch 150031-02 through 150031-04. The race condition may cause physical interface hang or a TCP packet corruption for packets originated by Guest domains. The corruption can occur only when 'extended-mapin-space' is set to 'off' and only occurs during periods of high stress on the network interfaces. The only packets affected are those destined for the external network.

Follow the workaround in Solaris 10 and 11 Virtual Network Switch Can Corrupt TCP Packets Or Hang Interface When 'extended-mapin-space' is Off (Doc ID 1593243.1).

 # ldm set-domain extended-mapin-space=on <ldom name>

For all LDoms The Ldoms will have to rebooted once this modification is made
 

SOL_10_5

Oracle Solaris 10 LDoms running on SuperCluster

After installing QFSDP component SSCMU_2013.04.tar.gz to Solaris

domains, 'zpool status' will report that a new ZFS pool version is

available.

 

Booting earlier boot environments will not be possible once the ZFS pool

version is upgraded. For this reason, the ZFS pool version should *not*

be upgraded immediately. Instead, it is recommended to operate the

system in production, allowing it to  'soak' at the April QFSDP software

level. When sufficient soak time has elapsed to satisfy the operator the

system is running nominally, and there has been no reason to revert to

the previous boot environment, then the operator may proceed to upgrade

the ZFS pool version.

 

To upgrade the zpool format, issue the commands 'zpool upgrade -a' and 'lucreate -n SSCMU_2013.04_ZU'.

Creation of the SSCMU_2013.04_ZU boot environment is done purely for the purposes of backup. Note that the ZFS pool upgrade takes place in the live environment, there is no need to reboot the system.

 

SOL_10_4

SolarisCluster3.3u1 on Oracle Solaris 10 running on SuperCluster

READMEs refer to operator to
the 'Oracle Solaris Cluster System Administration Guide' for detailed
patching instructions. These instructions indicate to use 'boot -sx',
and to perform patching in single user mode if patching the active boot
environment.

*IMPORTANT* do not use 'boot -sx', use instead 'boot -x' to
boot the system out of cluster mode, and then perform patching while the
domain is multi-user mode

 

SOL_10_3

Oracle Solaris 10 running on SuperCluster

Bug 7174049: ixgbe unplumbed interfaces show errors in /var/adm/messages

This can also lead to performance issue such as delaying kernel threads which may
trigger some RDS latency

Plumb up unused 10g interfaces to prevent the polling

for interface in `dladm show-dev | grep ixgbe | grep down | awk '{print $1}'`; do touch /etc/hostname.$interface;ifconfig $interface up; done.

 

 

SOL_10_2

Oracle Solaris 10 running on SuperCluster CR 7146107  ib_sw/ibd S10 ldoms lose connectivity to IB Fabric on SPARC SuperCluster

Change the Solaris 10 LDOMS, Zones and branded zones to use User Datagram in stead of Reliable Connection

1)edit  /kernel/drv/ibd.conf change the 1s to 0s
2)reboot the Zone,LDOM , etc....

 

SOL_10_1

Oracle Solaris 10 running on SuperCluster CR 7157525 RDS excessively high rds-ping latency 1) As root run modinfo and get the module id for nxge
2) As root "modunload -i XXX" (where XXX is the nxge module ID from modinfo).
3) Add the following to /etc/system to prevent it from reloading
#Prevent the unused nxge driver from loading, as a temporary
#workaround for CR 7157525
exclude:nxge
 

 Infiniband Switches

 #Applies toIssueFix or WorkaroundDate Updated

Exadata Storage Cells

 

PLEASE NOTE: All Cell , Generic Database , Switch and GI issuesfrom the Exadata Critical Issues Note also apply.

 

 
 #Applies toIssueFix or WorkaroundDate Updated
         

ESS_11 (EX_40 in Exadata Critical Issues)

Exadata Storage cells running older version of 12.2 or 12.1

(EX40) Storage servers with 8TB high capacity disks running Exadata older 12.2 or 12.1 versions require software update to receive replacement drives <Document 2352138.1>

Upgrade the cell software in accordance with:

(EX40) Storage servers with 8TB high capacity disks running Exadata older 12.2 or 12.1 versions require software update to receive replacement drives <Document 2352138.1>

1/23/2018

ESS_10

Exadata Storage cells running 12.1.x and 12.2.x with version 12.1 databases

Byte Swap Optimization can lead  to  flash card failures  that could cascade down to data loss and/or corruption.

Mandatory fix to disable Byte Swap Optimization if on or before upgrading to Exadata storage cell versions 12.1.x and 12.2.x 

SuperCluster: Critical Issue ESS_10 Mandatory Action to Disable Byte Swap Optimization on All Cells or Risk Data Loss or Corruption <Document 2325475.1>

11/6/2017

ESS_9
(EX_37 in Exadata Critical Issues)

Exadata X6 storage servers with write-back flash cache enabled using the default flash firmware supplied with Exadata versions lower than 12.1.2.3.4.

Bug 25595250 - A flash predictive failure on an Exadata X6 storage server with write-back flash cache enabled may lead to corruption in primary and/or secondary ASM mirror copies, and may propagate to other storage servers during certain ASM rebalance operations.

Fixed in Exadata 12.1.2.3.4. See Document 2242320.1 for details.

NOTE: going out of band to 12.1.2.3.4 or higher and DB is running 11.2.0.4 any BP version then they must have this one off applied .

13245134

See Exadata/SuperCluster: SQLs fail with ORA-27626: Exadata error: 242 (Doc ID 2250760.1)

 

 04/11/2017

ESS_8

(EX_ in Exadata Critical Issues)

SuperCluster all version with deployments that deploy multiple gird disks per cell disk.

After storage server upgrade from 12.1.2.1.3 or earlier, CREATE or ALTER GRIDDISK may result in cell disk metadata corruption, error ORA-600 [addNewSegmentsToGDisk_2], and loss of cell disk content

Review (EX31) After storage server upgrade from 12.1.2.1.3 or earlier, CREATE or ALTER GRIDDISK may result in cell disk metadata corruption, error ORA-600 [addNewSegmentsToGDisk_2], and loss of cell disk content <Document 2195523.1> to determine if you are at risk and recommended course of action.

10/26/2016

ESS_7

SuperCluster, all versions, with storage cell version 12.1.2.3.0 to 12.1.2.3.2

ORA-600 [RWOROFPRFASTUNPACKROWSETS:OOBP]
ORA-600 [kcfis_dump_app_state
ORA-600 [kcfis_dump_global_context_all
ORA-600 [qesrCopyOrigRowsetOpns()+480]
ORA-600 [kxhrHash()+64]

Follow SuperCluster: Critical Issue ORA-600 [RWOROFPRFASTUNPACKROWSETS:OOBP] alone or combined with other ORA-600 errors possible with cell version 12.1.2.3.0 and above <Document 2196717.1>

Even if you are not seeing this error , disabling this optimization should be considered mandatory.

10/24/2016

ESS_6

(EX_24 in Exadata Critical Issues)

Exadata Storage Server 12.1.2.1.0 and 12.1.2.1.1

 

After replacing a failed system disk (disk 0 or disk 1), the new disk is not correctly configured leaving the system vulnerable to the other system disk failing.

 

 Fixed in Exadata 12.1.2.1.2.  See Document 2032402.1 for additional details.  08/15/2015

ESS_5

(EX_23 in Exadata Critical Issues)

Exadata Storage Server 12.1.2.1.0 and 12.1.2.1.1

 

Bug 21174310 - Wrong results, ORA-1438 errors, or other internal errors are possible from smart scan offloaded queries against HCC or OLTP compressed tables stored on Exadata storage for databases upgraded from Oracle Database 11.2 to 12.1.

 

 Fixed in Exadata 12.1.2.1.2.  See Document 2032464.1 for additional details.

This also requires DB <Patch 20881450>

 08/15/2015

ESS_4

Cell version 12.1.2.1.2

Unable to create new databases or start existing databases against > 12.1.2.1.1 cells due to control file errors such as ora-00200 or ora-00205. This is due to a new cell check that looks for unique database DB_UNIQUE_NAMES across all instance sharing the cells regardless of which virtual hosts they are in.

 Please carefully review and follow 2044088.1 08/15/2015

ESS_3

All cell versions before 12.1.2.1.0

An exadata storage cell metadata corruption can occur after an indeterminant number of create  and/or alter griddisk commands . Please carefully read and run the script from: Bug 19695225 - Running Many Create or Alter Griddisk Commands Over Time Causes Cell Disk Metadata Corruption (ORA-600 [addNewSegmentsToGDisk_2]) and Loss of Cell Disk Content (Doc ID 1991445.1)


If you are in the bug condition, which means the script returns a 31 . Stop everything and take an rman backup of every single database and then open an SR with Support using your SuperCluster CSI. Please note the likelihood of this is rare.  For all others , no matter the return value  please obtain <Patch 19695225> for your version or upgrade to 12.1.2.1.
 08/15/2015

ESS_2

11.2.0.4 Database and Grid Infrastructure with 11.2.3.2.1 Exadata Storage Cells

KFOD can not discover disks. Can be encountered during upgrades as well and will indicated by rootupgrade.sh coting disk not found errors.

If upgrading or installing 11.2.0.4 on SuperCluster that will be accessing 11.2.3.2.1 storage cells then you must apply <Patch 16547261> prior to the upgrade/install.

 

Database

 
#Applies toIssueFix or WorkaroundDate Updated

DB_29

GIPSU 12.2.0.1.171003/171017 part of QFSPD Oct 2017

Node eviction stemming from pfiles being run against GI  processes causing thier threads to get delayed

SuperCluster : Node eviction after apply GIPSU 12.2.0.1.171003/171017 part of QFSPD Oct 2017 <Document 2176610.1>

12/19/2017

DB_28

SuperCluster Grid Infrastructure provided with JULY 2017 QFSDP  12.1.0.2.170718

RAC nodes failing to start due to voting disk corruption following patching

Apply OCW <Patch 26512962> to the Grid Infrastructure Home. 

09/13/2017

DB_27

SuperCluster systems running 12.1.0.2.161018 or 12.1.0.2.160719 OCWPSU

Generic bug impacting SuperCluster systems Frequent RAC Node eviction that appears to be network heartbeat related. If detected in time A pstack <ocssd.bin pid> will show several threads in the function clsdadr_bucket_syncb.

This Patch should be considered mandatory.

If scheduled to patch to OCT 2016 QFSDP it is strongly advised to go to JAN 2017 to save time in your patching window as item SOL_11_20 is mandatory as well for both JUL 2016 and OCT 2016 QFSDP patch levels

Permanent fix in JAN 2017 QFSDP.

<Note 2227319.1> SuperCluster Critical issue: DB_27 :Mandatory patch: Bug 25233268 Leading to Frequent Node Evictions with JUL and OCT 2016 QFSDP

 

DB_26

All Database all versions regardless of storage used or if deployed in DB or application domains. Use of database parameter use_large_pages=false is completely unsupported on SuperCluster systems. Using it can cause unnecessary performance implications at the DB or OS kernel level, especially for larger SGA sizes. Set use_large_pages=true  or unset it completely, as true is the default, and restart your databases. Solaris operating system, by default is optimized to take advantage of large pages. 10/24/2016

DB_25

11.2.0.4 through 12.1.0.2 RAC Node evicting and not rejoining the cluster SuperCluster : RAC : CRS not able to rejoin the cluster following node eviction or reboot due to CSSD. <Document 2166436.1> 7/28/2016

DB_24

12.1.0.2 ASM XDMG process exiting in a way where it can hang zones and or logical domains SuperCluster: 12.1.0.2 ASM XDMG process causes hang in PR_P_LOCK on ORA-15311: process terminated due to fenced I/O <Document 2166445.1> 7/28//2016

DB_23

CRS all versions Storage network improperly listed as a public interface in OIFCFG SuperCluster: storage network IB interface listed as public in oifcfg getif could result in improper nodeapp VIP failover. <Document 2150668.1> 6/16/2016

DB_22

 DB and ASM 11.2.0.3 BP 25 and below   Bug 20116094, the ASM/kfod does not discover the Griddisks on SPARC systems runnig 11.2.0.3 BP 25 and below against 12.1.2.1.0 and above storage cells. This issue is typically found during patching if the cells are patched prior to the DB and GI homes.   

The bug has been fixed in 11.2.0.3 BP 26 onwards

Apply 11.2.0.3 BP 26 or one-off patch for Bug.20116094

 09/1/2015

DB_21

ASM 12.1.0.2 Bug 21281532 - ASM rebalance interrupted with errors ORA-600 [kfdAtbUpdate_11_02] and ORA-600 [kfdAtUnlock00]. See Document 2031709.1 for additional details. 08/15/2015

DB_20

ASM 12.1.0.2

Bug 20904530 - During disk resync ORA-600 [kfdsBlk_verCb] reported due to corruption in ASM staleness registry.

See Document 2028222.1 for additional details. 08/15/2015

DB_19

11.2.0.4.x running against 12.1.2.x Exadata storage cells

After restoring an RMAN backup in this combination and running a subsequent backup or running DBV data block corruption is detected. trace file could show
Bad header found during validation.

<Patch 20952966> for the 11.2.0.4 DB home(s) or redo the initial restore with the workaround of  setting _cell_fast_file_restore=FALSE in the database SPfile.  The patch is the preferred approach and it should be considered mandatory for all 11.2.0.4 Databases accessing Exadata Sotrage cell version 12.1.2.1.x 5/9/2014

DB_18

11.2.0.4 and 12.1.0.2

11.2.0.4 Bug 10194190 - Solaris: Process spin and/or ASM and DB crash if RAC instance up for > 248 days <Document 10194190.8> 12.1.0.2 Bug 22901797 - LMHB (OSPID: 29160): TERMINATING THE INSTANCE DUE TO ERROR 29770

Fixed in 11.2.0.4.9 and above. Other documentation makes it appear this is fixed in 12.1.0.2.5 but it is not it is fixed in April 2016  12.1.0.2.DBBP:160419
 6/7/2016

DB_17

12.1.0.2.4 (JAN 2015 level)

Bug 20591915 has introduced a regression in DBBP 12.1.0.2.4 (Jan2015) for Solaris SPARC Exadata machines. Because of this regression  XMDG process may crash on SuperCluster causing asm core files under the GI home.

 

Also can be rediscovered with

 

Bug 20591915 - Grid disk asmmodestatus query hangs when a grid disk is inactive.

 

This issue causes CellCLI command "list griddisk attributes asmmodestatus" to hang, which subsequently causes rolling cell patching to hang when upgrading from Exadata 12.1.1.1.1, or earlier, to any later Exadata version when Grid Infrastructure is version 12.1.0.2.4 (DBBP4) or 12.1.0.2.5 (DBBP5).

 

<Patch 20591915>   to 12.1.0.2.4
Please note this needs to be applied in the ASM home (GI)

 This is also fixed in 12.1.0.2.6 PSU and above.

 

DB_16

11.2.0.3 through 12.1.0.2

Critical  Performance  enhancements for the database on SPARC

19308965 RAW HAZARDS SEEN WITH RDBMS CODE ON SOLARIS T5   
13846337 QESASIMPLEMULTICOLKEYCOMPARE NOT OPTMIZED FOR SOLARIS SPARC64
12660972 CHECKSUM CODE NEEDS REVISTING IN LIGHT OF NEW PROCESSORS

11.2.0.3.21 or later plus <Patch 20097385> and <Patch 12660972>

11.2.0.4.15 and BP  below plus <Patch 19839616> and <Patch 12660972>

11.2.0.4.16 and above plus  <Patch 12660972>

12.1.0.2.6 and below plus <Patch 

 6/23/2015

DB_15

11.2.0.3 and 11.2.0.4 ASM instances

<Bug 17997507> - xdmg process exits without closing skgxp context when ora-15311 is seen.

This can actually show its self as EXAVM database zones getting stuck in the shutdown state  in conjunction with a command like ps -ef hanging in the global zone.

Fixed in 11.2.0.3.24 and 11.2.0.4.7. If at a BP previous search on MOS for patch 17997507 and SPARC if one does not exist for your BP level contact support

 

DB_14

11.2.0.3 Grid Infrastructure

<Bug 13798847> - add multilple ports to scan_listener fails

Apply the latest SuperCluster 11.2.0.3.9 GI PSU Merge which will be documented in the Supported Versions note for your hardware type. Latest is MLR <Bug 19459715>

 

DB_13

11.2.0.3 to 12.1.0.1 Grid Infrastructure

<Bug 17722664> - clsa crash during client connection cleanup for large number of changing connect. fixed in 12.1.0.2

11.2.0.3

Apply the latest SuperCluster 11.2.0.3.9 GI PSU Merge which will be documented in the Supported Versions note for your hardware type. Latest is MLR <Bug 19459715>

11.2.0.4 contact support for a merge in with <Bug 16429265> and your current level of GI PSU

12.1.0.1 contact support for a merge in with your current level of GI PSU

Fixed in 12.1.0.2

 

DB_12

Systems with one of the following grid infrastructure home versions:

    11.2.0.4 BP1-BP5
    11.2.0.3 BP22

Same as item DB_24 in the Exadata critical issues note

Fixed in BP 23 and above however you should get the fix for DB_11-14 va the latest SuperCluster 11.2.0.3.9 GI PSU Merge which will be documented in the Supported Versions note for your hardware type. Latest is MLR <Bug 19459715>

 

DB_11

11.2.0.3 and 11.2.0.4 Grid Infrastructure

<Bug 17443419> - chm (ora.crf) can't be online in solaris local zone (solaris sparc64) fixed in 12.1

Apply the latest SuperCluster 11.2.0.3.9 GI PSU Merge which will be documented in the Supported Versions note for your hardware type. Latest is MLR <Bug 19459715>

 

DB_10    

11.2.0.3 to 11.2.0.4 GI / ASM upgrade

<Bug 17837626 >

HAIP failures in the

  

orarootagent_root.log

CRS-2674: Start of 'ora.cluster_interconnect.haip' on 'hostname' failed
CRS-2679: Attempting to clean 'ora.cluster_interconnect.haip' on
'hostname'
CRS-2681: Clean of 'ora.cluster_interconnect.haip' on 'hostname'
succeeded
CRS-4000: Command Start failed, or completed with errors.

  

 Workaround

In the 11.2.0.4 home

  

cd <GRID_HOME>/crs/install# vi s_crsconfig_lib.pm

Look for the funcion s_is_sun_ipmp and make the following change
at the end of that function

/is_sun_ipmp

  # made it all the way out without finding any IPMP private
  #return FALSE;
  return TRUE;

  

 

DB_9

RMAN incremental backups created with one of the following database patch sets:
  • 12.1.0.1 GIPSU1 or earlier
  • 11.2.0.3 BP21 or earlier
  • any 11.2.0.2
  • any 11.2.0.1

Bug 16057129 - Exadata cell optimized incremental backup can miss some blocks if a database file grows larger while the file is being backed up. A missed block can lead to stuck recovery and ORA-600[3020] errors if the incremental backup is used for media recovery.

See Document 16057129.8 for details.

Existing RMAN incremental backups taken without the bug fix in place should be considered invalid and not usable for database recovery, incrementally updating level 0 backups, or standby database creation.

RMAN full backups, level 0 backups that are not part of an incrementally updated backup strategy, and database recovery using archived redo logs are not affected by this issue.

 Step 1.Set the following parameter in all databases that use Exadata storage:

_disable_cell_optimized_backups=TRUE

SQL> alter system set "_disable_cell_optimized_backups"=TRUE scope=both;

The parameter specified above may be removed after the fix for bug 16057129 is installed by upgrade or by applying an interim patch. See below for fix availability.

Step 2. Create new RMAN backups. Minimally a new RMAN cumulative incremental backup must be taken. In addition, level 0 backups that are part of an incrementally updated backup strategy must be recreated.

Fix availability
Fixed in 12.1.0.1 GIPSU2 (planned January 2014)
Fixed in 11.2.0.4.0
Fixed in 11.2.0.3 BP22 (planned January 2014)
Fixed in Patch 16057129 for 11.2.0.3 BP21
Fixed in Patch 17599908 for 11.2.0.2 BP22

 

DB_8

All DB versions

For RAC Databases in LDoms or Zones with more than one IB bond interface , onecommand is not setting all interfaces in oifcfg nor in cluster_interconnects parameter of ASM and DB spfiles

You can check this in ASM and DB instances with a show parameter cluster_interconnect and at the RAC level oifcfg getif. If you have multiple interfaces available add them into oifcfg and into cluster_interconnects in each ASM and DB instance. Make sure you assign the right IP addresses in cluster interconnects to the right sids based on hsot location of instance.

 

DB_6

11.2.0.3.x Grid Infrastructure

<Bug 13604285>
ora.net1.network keeps failing over.

Key indicator "Networkagent: check link false" in orarootagent log combined with the network resource constantly failing over around the cluster nodes.

All current exadata bundle patches through BP21 require this fix.

If you have existing one offs on your Grid Infrastructure you will  have to open an SR to Support for a merge.

 

DB_4

 

<Bug 12865682>- byte swapping causing some of the extra overhead. This can lead to a performance degradation in hash join plans on big endian platforms.

Download and apply <Patch 12865682> for Solaris SPARC to all of your 11.2.0.3.x database homes  even if they are not using the Exadata storage.  This is now considered a mandatory patch for SPARC SuperCluster. This does not need to be backported to a specific Exadata BP level as it does not conflict with the bundle patch. This is now fixed as part of 11.2.0.4, 12.1.x and 11.2.0.3.21 Bundle patch, This means you do not need this patch if 11.2.0.3.21 or beyond.
5/9/2015

DB_2

11.2.0.3.x Grid Infrastructure and DB

Default thread priority of RT (Real Time) for LMS can cause blocking of kernel threads to the CPU. Also LGWR being at TS (Thread Select) can lead to excessive log writer write times leading to general database performance issues.

The fix for this is often called the Critical Threads fix or the FX-60 fix.

There are multiple ways to correct this. One is to apply a one off patch to all Database and Grid Infrastructure homes. The one off patch can be downloaded using <Patch 12951619>.

The prefered method for systems without databases in zones is to be  patched to OCT 2013 QFSDP and ensure you have installed and are running the ssctuner service from that exafmily version, ssctuner@0.5.11,5.11-1.5.0.5.

For databases running in exavm zones you need to be at the version of ssctuner provided in the JAN 2014 QFSDP, ssctuner@0.5.11,5.11-1.5.9.237,  and above and ensure your zones are running with the TS scheduling class see <Document 1618396.1> for more information on how to verify and rectify the scheduling class and how to update ssctuner out of band with the QFSDP.

 Please also review and comply with  SuperCluster - ssctuner is not adjusting the scheduling class of all of lms , lgwr and vktm processes to FX-60 Document 1628298.1. This is an additional step that has to be done under the supervision of an Oracle badged employee.

 

DB_1         

All Database versions

 CR 7172851 System hang, threads blocked in DISM code

Dynamic Intimate Shared Memory (DISM) is not supported for use on SPARC SuperCluster Solaris environments in instances other than the ASM instance <Document 1468297.1>  

  ZFS Storage Appliance

 
#Applies toIssueFix or WorkaroundDate Updated

ZFS_2

All Platforms / All Versions Using ZFS Deduplication is not supported on SuperCluster None, Using ZFS Deduplication is not supported on SuperCluster.  Oracle SuperCluster ZFS Storage Appliance Best Practices <Document 2002988.1> 7/11/2016

ZFS_1  

 

2011.1.3  (Version string 2011.04.24.3.0,1-1.19)

2011.1.4  (Version string 2011.04.24.4.0,1-1.21)

2011.1.4.1  (Version string 2011.04.24.4.1,1-1.21)

Solaris 11 and ZFS Storage Appliance Software (ZFSSA) May Encounter Data Integrity Issues Following an Unclean Shutdown of the System Solaris 11 and ZFS Storage Appliance Software (ZFSSA) May Encounter Data Integrity Issues Following an Unclean Shutdown of the System <Document 1502451.1>  

 SuperCluster  Hardware Issues

 
#Applies toIssueFix or WorkaroundDate Updated

HW_2

SPARC M6-32 - Version All Versions and later Probable fault diagnosis failures please apply firmware in the "Fix or Workaround Section" <Patch 22982110>SPARC M5-32 and M6-32 Servers With Sun System Firmware 9.4.2.d, 9.4.2.e, 9.5.1.c, or 9.5.3 may Misidentify Faulty Components or Fail to Diagnose Faulty Components (Doc ID 2133737.1) 5/20/2016

HW_1    

 SPARC M6-32 - Version All Versions and later
SPARC T5-8 - Version All Versions and later
Oracle SuperCluster T5-8 Full Rack - Version All Versions and later
Oracle Exalytics T5-8 - Version All Versions and later
SPARC T5-2 - Version All Versions and later
Information in this document applies to any platform.
 

PCIEX-8000-J5 and/or PCIEX-8000-KP FMA faults similar to those below will be reported. Systems utilizing InfiniBand fabric, i.e., T5-8 SSC are seen to be more susceptible to these faults.

The fmdump -e output will contain ereport.io.pciex.dl.btlp and/or ereport.io.pciex.dl.bdllp events.

 Follow I/O SERD threshold values are set too low and may result in PCIEX-8000-J5 and PCIEX-8000-KP faults. <Document 1617956.1>  

 Archive Table

Items in this table are issues that only apply to systems over a year behind on QFSDP and SSCTUNER levels.

 

 

 
#IssueDate Archived

DB_3

11.2.0.3.x Grid Infrastructure

. Bug  16619733 - "FAILED TO PATCH QOS USERS" DURING PATCH INSTALLATION OF BP17 (11.2.0.3.17).

patch /u01/app/oracle/patches/16474946/16315641 apply failed for home/u01/app/11.2.0.3/grid

 

On some SPARC SuperCluster systems, the file "racgvip" may have been manually modified to workaround an issue with the path to the "whoami"utility. Prior to applying the 11gR2 BP17, please ensure that the "racgvip" file is owned by the grid infrastructure user typically user "oracle".

 08/15/2015

DB_5

11.2.0.3.x Grid Infrastructure

<Bug 16562733>  Instance eviction due to loss of voting file access. CRS and  CSSD logs show CRS-1604:CSSD voting file is offline.

All current exadata bundle patches through BP21 require this fix.

If you have existing one offs on your Grid Infrastructure you will  have to open an SR to Support for a merge.

08/15/2015

DB_7

11.2.0.4 Database and Grid Infrastructure with 11.2.3.2.1 Exadata Storage Cells

KFOD can not discover disks

If upgrading or installing 11.2.0.4 on SuperCluster that will be accessing 11.2.3.2.1 storage cells then you must apply <Patch 16547261> prior to the upgrade/install.

08/15/2015
ESS_1

 Aura 1.x flash DOM firmware version D20Y

or earlier is supplied with Exadata Storage

Server software version 11.2.3.2.0 and earlier

 Recent Aura 1.x flash DOMs in

Exadata Database Machine X2 and V2 and

SPARC SuperCluster T4-4 storage servers may prematurely

fail if using firmware version D20Y or earlier

Aura 1.x flash DOMs require firmware update in storage servers in Exadata Database Machines X2, V2,

and SPARC SuperCluster T4-4 systems <Document 1504776.1>

08/15/2015
IB_1

Sun Datacenter InfiniBand Switch 36

software 1.3.3-2

Unpublished CR 7013467: 1.3.3 Too strict error handling in partitiond.

This can cause subnet manager to not start properly after a complete power cycle of th switch.

Fixed in Switch software version 2.0.1

Apply October QFSDP . See SPARC SuperCluster T-4 with Exadata Storage Server: Supported Versions <Document 1567979.1> for more information on the latest patches.

Workaround if this occurs:

1) Login as root on the infiniband switch

2) issue following command # smpartition start; smpartition commit

3) if this does not work open an SR to Oracle Support clearly state that you have lost your IB partitioning.

08/15/2015
SOL_11_15

T4-4,T5-8

Solaris  11.1 SRU 7.5

IDR 808.x supersedes the IDRS 553.4, 562.1 and 570.1 . The specific incarnation of the IDR to be delivered will be the latest offered. This contains additional memory management fixes and iscsi fixes that were not in the combined IDR set. This could expose it's self as node evictions due to apparent network timeouts or periodic slow downs with threads stuck in sleeps or waits for memory management functions.

Contact support to get this IDR before patching exercises that are targeting the JUL 2013 or OCT 2013 QFSDP levels. Or if you are encountering random node evictions or intermittant performance issues while at the JUL 2013 or OCT 2013 QFSDP levels.

08/15/2015

References

<NOTE:1567979.1> - Oracle SuperCluster Supported Software Versions - All Hardware Types
<NOTE:1632521.1> - SuperCluster- Deprecated Document - Solaris 11 Support Repository Updates (SRU) and SuperCluster specific IDR Support Matrix.
<NOTE:1468297.1> - SuperCluster - OSM ( Optimized Shared Memory ) (12c) is Supported on SuperCluster DISM ( Dynamic Intimate Shared Memory )(11g) is not
<NOTE:1628298.1> - SuperCluster - ssctuner is not adjusting the scheduling class of all of lms , lgwr and vktm processes to FX-60
<NOTE:2325475.1> - SuperCluster: Critical Issue ESS_10 Mandatory Action to Disable Byte Swap Optimization on All Cells or Risk Data Loss or Corruption
<NOTE:1569461.1> - SuperCluster - Patching Best Practices For The Quarterly Full Stack Download Patch

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback