Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2008286.1
Update Date:2016-02-09
Keywords:

Solution Type  Problem Resolution Sure

Solution  2008286.1 :   Oracle ZFS Storage Appliance: Replication Performance is Slower after Upgrade to a 2013.1.x Release  


Related Items
  • Sun ZFS Storage 7420
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Oracle ZFS Storage ZS4-4
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun ZFS Storage 7120
  •  
  • Sun ZFS Storage 7320
  •  
  • Oracle ZFS Storage Appliance Racked System ZS4-4
  •  
  • Oracle ZFS Storage ZS3-BA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Symptoms
Cause
Solution


Created from <SR 3-10274818991>

Applies to:

Oracle ZFS Storage ZS4-4 - Version All Versions and later
Oracle ZFS Storage ZS3-4 - Version All Versions and later
Oracle ZFS Storage ZS3-2 - Version All Versions and later
Oracle ZFS Storage ZS3-BA - Version All Versions and later
Sun ZFS Storage 7420 - Version All Versions and later
7000 Appliance OS (Fishworks)

Symptoms

The release of 2013.06.05.0.1.x code, included new support for replication analytics. Prior to this release, if was very difficult to monitor the rate or amount of data being replicated from a source to a target.

The new features included the following.

  • Replication bytes, as a raw statistic or broken down by direction, type of operation, peer, pool, project, or dataset.
  • Replication latency, as a raw statistic or broken down by direction, type of operation, peer, pool, project, or dataset.
  • Replication operations, as a raw statistic or broken down by direction, type of operation, peer, pool name, project, dataset, latency, offset, or size

The inclination is to go into analytics and start all of these utilities to get an accurate measurement of throughputs, bandwidth, and time to completion.

The problem is that the replication data shows significantly slower values than what was expected, or what the connection can operate at.

In this example, we see initial replication over a dedicated 10GB connection replicating at only 50 Meg/sec (parsed)

ZFS-Source> shares select Project replication select action-xxx show
        Properties:                                                                                                                        
                          id = 75280bec-ef95-6f05-854d-d0637209d520                                                              
                      target = ZFS-Target                                                                                 
                     enabled = true                                                                                              
                  continuous = false 
               include_snaps = true                                                                                              
               max_bandwidth = unlimited                                                                                          
                  bytes_sent = 5T                                                                                              
              estimated_size = 864.0T                                                                                                                                                                                  
          average_throughput = 51 MB/s                                                                                            
                     use_ssl = false                                                                                              
                       state = sending                                                                                            
        

 

Cause

The problem is with the number of datasets actively running in analytics. All of the new replication analytics are dtrace scripts.

Any dataset which uses dtrace scripting, (instead of 'kstat'. aka 'kernel statistics') can impact the utility you are monitoring.  By enabling all these datasets for replication, throughput was severely throttled.

You can see if an analytic uses dtrace by pressing the "Shift" key and selecting the "Download Arrow" on the analytic. If you get a popup like the one shown below, then this lets you know a dtrace script is running this analytic.

 ANALYTIC_WITH_DTRACE

Solution

The obvious solution is to minimize the number of analytics running. In this example, all analytics are suspended followed my monitoring of the replication action.

ZFS-source> analytics datasets suspend
    This will suspend all datasets. Are you sure? (Y/N) Y

 

ZFS-Source> shares select Project replication select action-xxx show
        Properties:                                                                                                                        
                          id = 75280bec-ef95-6f05-854d-d0637209d520                                                              
                      target = ZFS-Target                                                                                 
                     enabled = true                                                                                              
                  continuous = false 
               include_snaps = true                                                                                              
               max_bandwidth = unlimited                                                                                          
                  bytes_sent = 171T                                                                                              
              estimated_size = 864.0T                                                                                                                                                                                  
          average_throughput = 251 MB/s                                                                                            
                     use_ssl = false                                                                                              
                       state = sending  


Its important to note that the average_throughput of an initial replication is calculated as the Total Data Replicated/Total Time.

Therefore, you will not see an immediate increase in this value until a majority of time has passed with analytics suspended.

To verify an immediate increase, it may be best to temporarily enable a dataset for review.

ZFS-Source:> analytics datasets select dataset-0XX
ZFS-Source:> analytics dataset-038> set suspended=false
                    suspended = false (uncommitted)
ZFS-Source:> analytics dataset-038> commit
ZFS-Source:> analytics dataset-038> read 5
  DATE/TIME                KB/SEC     KB/SEC  BREAKDOWN
  2015-5-26 13:32:47       251040     251040  Project-Name
  2015-5-26 13:32:48       248967     248967  Project-Name
  2015-5-26 13:32:49       259116     259116  Project-Name
  2015-5-26 13:32:50       242932     242932  Project-Name
  2015-5-26 13:32:51       255009     255009  Project-Name
ZFS-Source:> analytics dataset-038> set suspended=true
                     suspended = true (uncommitted)
ZFS-Source:> analytics dataset-038> commit

Datasets may be turned back after a new baseline is established.

The impact on replication performance should be monitored as datasets are added back in.


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback