Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2326695.1
Update Date:2017-11-14
Keywords:

Solution Type  Problem Resolution Sure

Solution  2326695.1 :   Oracle ZFS Storage Appliance: Resumed Replication Does Not Correctly Report Bytes_Sent, Estimated_Size and Estimated_Time_Left  


Related Items
  • Sun ZFS Storage 7320
  •  
  • Oracle ZFS Storage ZS5-4
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun ZFS Storage 7420
  •  
  • Oracle ZFS Storage ZS5-2
  •  
  • Oracle ZFS Storage ZS4-4
  •  
  • Sun ZFS Storage 7120
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Symptoms
Cause
Solution


Created from <SR 3-16083948771>

Applies to:

Sun ZFS Storage 7120 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7420 - Version All Versions to All Versions [Release All Releases]
Oracle ZFS Storage ZS3-2 - Version All Versions to All Versions [Release All Releases]
Oracle ZFS Storage ZS4-4 - Version All Versions to All Versions [Release All Releases]
Oracle ZFS Storage ZS5-2 - Version All Versions to All Versions [Release All Releases]
7000 Appliance OS (Fishworks)

Symptoms

One of the new features available in OS8.7.0 is Resumable Replication.

In previous versions of the code, any time an initial (or seed) replication failed, (for example, a node panics, or the network between source and target drops), all replicated data would be lost.

The replication action would have to be destroyed and a new replication action created.

With Resumable Replication, an initial replication is check pointed, allowing the replication action to pick up where it left off, should that initial replication fail.

The problem with Resumable Replication is that the tools used to measure the bytes_sent, estimated_size and estimated_time_left no longer report accurate information.

You will not know how long it will take for the seed replication to complete. 

 

Cause

Resumable Replication is always on. You will see this reporting issue if the initial replication fails for any reason followed by a restart of the initial replication action.

 

Solution

Consider the following example.  A 1TB initial replication is set up between a source and target appliance. The replication is successfully started. The alert logs record the activity.

Began replicating 'Dbase' to appliance 'zs5'.
Action - 30812650-7933-47e0-9904-b81ec62c8225.

 

The Replication Action is Monitored.  Note the accurate reporting of bytes_sent, estimated_size and estimated_time_left :

zs5:shares Dbase action-000> show
            Properties:
            id = 30812650-7933-47e0-9904-b81ec62c8225
            target_id = d1bbf791-71bd-4fb1-b04c-c78eb5b3b4bb
            target = xx.xxx.xxx.xxx
            target_pool = target-pool
            enabled = true
            continuous = false
            include_snaps = true
            retain_user_snaps_on_target = false
            dedup = false
            include_clone_origin_as_data = false
            max_bandwidth = unlimited
            bytes_sent = 461G
            estimated_size = 1.0T
            estimated_time_left = 01:31:23
            average_throughput = 112MB/s
            use_ssl = false
            compression = on
            export_path =
            state = sending
            state_description = Sending update
            .....

On the target node, we can see the dataset accurately reflects the amount of data replicated. 

zs5# zfs list -t all | egrep 'USED|30812650-7933-47e0-9904-b81ec62c8225'
   NAME                                                                                                     USED  AVAIL REFER MOUNTPOINT
   target-pool/nas-rr-30812650-7933-47e0-9904-b81ec62c8225                                                  461G  18.9T 87.5K   none
   target-pool/nas-rr-30812650-7933-47e0-9904-b81ec62c8225/Dbase                                            461G  18.9T 87.5K   /export
   target-pool/nas-rr-30812650-7933-47e0-9904-b81ec62c8225/Dbase@.rr-30812650-7933-47e0-9904-b81ec62c8225-1    0      - 87.5K   -

For whatever reason, the replication is interrupted. In this particular case, the target node paniced. The alert logs identify the break in replication. 

Replication of 'Dbase' to 'zs5' failed after sending '461G' out of '1.03T' because
the appliance could not contact the replication target. Action - 30812650-7933-47e0-9904-b81ec62c8225.

 

Upon recovery of the target node, the initial replication action is resumed and the activity recorded in the logs 

zs5:shares Dbase action-000> sendupdate

Began replicating 'Dbase' to appliance 'zs5'.
Action - 30812650-7933-47e0-9904-b81ec62c8225.

 

However, when you look at the action, you will see that there is no meaningful data being reported for bytes_sent, estimated_size and estimated_time_left

zs5:shares Dbase action-000> show
       Properties:
       id = 30812650-7933-47e0-9904-b81ec62c8225
       target_id = d1bbf791-71bd-4fb1-b04c-c78eb5b3b4bb
       target = xx.xxx.xxx.xxx
       target_pool = target-pool
       enabled = true
       continuous = false
       include_snaps = true
       retain_user_snaps_on_target = false
       dedup = false
       include_clone_origin_as_data = false
       max_bandwidth = unlimited
       bytes_sent = 12.6G
       estimated_size = 11.5K
       estimated_time_left = 00:00:00
       average_throughput = 110MB/s
       use_ssl = false
       compression = on
       export_path =
       state = sending
       state_description = Sending update

It has now become impossible to determine how much time is left before the replication completes.  The BUI also incorrectly reports these values. 

 

Replication Action

 

Unfortunately, there is no accurate way to determine the bytes_sent, estimated_size and estimated_time_left from the BUI or CLI.

Please open an SR with Oracle support.  Oracle Support can determine these parameters for you with some internal tools at our disposal.

 

You can look at the dataset on the target.  The USED column will tell you how much data has been replicated. You can then calculate the time left.

# zfs list -t all | egrep 'USED|30812650-7933-47e0-9904-b81ec62c8225'
  NAME                                                                                                    USED  AVAIL  REFER  MOUNTPOINT
  target-pool/nas-rr-30812650-7933-47e0-9904-b81ec62c8225                                                 477G  18.9T  87.5K   none
  target-pool/nas-rr-30812650-7933-47e0-9904-b81ec62c8225/Dbase                                           477G  18.9T  87.5K   /export
  target-pool/nas-rr-30812650-7933-47e0-9904-b81ec62c8225/Dbase@.rr-30812650-7933-47e0-9904-b81ec62c8225-1   0      -  87.5K   -

 

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback