Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-79-1595549.1
Update Date:2018-05-29
Keywords:

Solution Type  Predictive Self-Healing Sure

Solution  1595549.1 :   Oracle ZFS Storage Appliance: How to set up Replication  


Related Items
  • Sun ZFS Storage 7420
  •  
  • Oracle ZFS Storage ZS5-2
  •  
  • Sun Storage 7110 Unified Storage System
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Oracle ZFS Storage ZS4-4
  •  
  • Sun Storage 7210 Unified Storage System
  •  
  • Sun Storage 7410 Unified Storage System
  •  
  • Oracle ZFS Storage ZS5-4
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun ZFS Storage 7120
  •  
  • Oracle ZFS Storage Appliance Racked System ZS4-4
  •  
  • Sun ZFS Storage 7320
  •  
  • Oracle ZFS Storage ZS3-BA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  


To provide configuration details, suggestions and recommendations when setting up replication.

In this Document
Purpose
Scope
Details
 REPLICATION SETUP
 1.  Setup routing correctly first - to make sure traffic from source to destination will go out through the correct (cluster or not) interface that is tied to the source zpool.
 2.  Additionally, setup routing correctly on the target system to make sure traffic to the source is routed back to the correct (cluster or not) interface
 3.  Setup the replication target
 4.  Setup the replication action
 5.  Perform the initial replication update
 6.  Select the project to replicate, setup an appropriate schedule (such as 'daily') then enable the action.
References


Applies to:

Oracle ZFS Storage ZS4-4 - Version All Versions and later
Oracle ZFS Storage Appliance Racked System ZS4-4 - Version All Versions and later
Sun Storage 7210 Unified Storage System - Version All Versions and later
Sun Storage 7310 Unified Storage System - Version All Versions and later
Sun ZFS Storage 7320 - Version All Versions and later
7000 Appliance OS (Fishworks)

Purpose

 This document provides configuration details, suggestions and recommendations when setting up replication.

 

Scope

To provide configuration assistance when setting up replication on the Series 7000 NAS/ZFS Storage Appliance for the first time.

 

Details

Sun ZFS Storage Appliances support snapshot-based replication of projects and shares from a source appliance to any number of target appliances manually, on a schedule, or continuously.

The replication includes both data and metadata. Remote replication (or just "replication") is a general-purpose feature optimized for the following use cases:

  • Disaster recovery - Replication can be used to mirror an appliance for disaster recovery. In the event of a disaster that impacts service of the primary appliance (or even an entire datacenter), administrators activate service at the disaster recovery site, which takes over using the most recently replicated data. When the primary site has been restored, data changed while the disaster recovery site was in service can be migrated back to the primary site and normal service restored. Such scenarios are fully testable before such a disaster occurs.
  • Data distribution - Replication can be used to distribute data (such as virtual machine images or media) to remote systems across the world in situations where clients of the target appliance wouldn't ordinarily be able to reach the source appliance directly, or such a setup would have prohibitively high latency. One example uses this scheme for local caching to improve latency of read-only data (like documents).
  • Disk-to-disk backup - Replication can be used as a backup solution for environments in which tape backups are not feasible. Tape backup might not be feasible, for example, because the available bandwidth is insufficient or because the latency for recovery is too high.
  • Data migration - Replication can be used to migrate data and configuration between Sun ZFS Storage appliances when upgrading hardware or rebalancing storage. Shadow migration can also be used for this purpose.


The remote replication feature has several important properties:

  • Snapshot-based. The replication subsystem takes a snapshot as part of each update operation and sends either the entire project contents up to the snapshot in the case of a full update. In the case of an incremental update, only the changes since the last replication snapshot for the same action are sent.
  • Block-level. Each update operation traverses the filesystem at the block level and sends the appropriate filesystem data and metadata to the target.
  • Asynchronous. Because replication takes snapshots and then sends them, data is necessarily committed to stable storage before replication even begins sending it. Continuous replication effectively sends continuous streams of filesystem changes, but it's still asynchronous with respect to NAS and SAN clients.
  • Includes metadata. The underlying replication stream serializes both user data and ZFS metadata, including most properties configured on the Shares screen. These properties can be modified on the target after the first replication update completes, though not all take effect until the replication connection is severed. For example, to allow sharing over NFS to a different set of hosts than on the source. See Manging Replication Targets for details.
  • Secure. The replication control protocol used among Sun ZFS Storage Appliances is secured with SSL. Data can optionally be protected with SSL as well. Appliances can only replicate to/from other appliances after an initial manual authentication process, see Creating and Editing Targets below.

 

Useful terms and acronyms concerned with replication

TERMEXPLANATION
 Replication Peer A Sun Storage 7000 appliance that has been configured as a replication source or target.
 Source An appliance peer containing data to be replicated to another appliance peer (the target).  Individual appliances can act as both a source and a target, but are only one of these in the context of a particular replication action.
 Target An appliance peer that will receive and store data replicated from another appliance peer (the source).  This term also refers to a configuration object on the appliance that enables it to replicate to another appliance.
 Group A collection of shares that replicate as a group.
 Action A configuration object on a source appliance specifying a project or share, a target appliance, and policy options (including how often to send updates, usage of encryption etc.).
 Package The object on the target side that maps to a source action.  Each package on a target is associated with exactly one action from a source.  Loss of either object requires you to create the whole replication again.
 Full update An operation that sends the entire contents of an action.
 Incremental update A replication that only sends the difference in data from the last replication.
 Continuous

Replication that starts again right away after the last update has been sent . This is NOT a synchronous update. Think of it as a incremental update that begins right when the previous incremental update finishes.

 Scheduled A replication that is done on a scheduled basis according to the rules listed in the action.
 Manual A replication that is started by administrative action.

 

 

Considerations when planning a replication configuration

Are you intending to provide a dedicated (private) network for replication traffic ?

Are you intending to provide dedicated network interfaces for replication traffic ?

The replication functionality works best when replication is at the PROJECT level instead of the SHARE level.

Ensure that the network interfaces and routing configuration are setup (and working) correctly BEFORE starting to setup the replication configuration on the source and target systems.

 

Cluster considerations for replication

On cluster systems, ensure that the network interfaces used for replication and the associated zpool are 'tied' together (ie. 'active' on the same cluster head at all times).  So that, on cluster takeover or failback, the network interfaces ownership always moves with the zpool ownership.

When configuring replication from both heads in a 'source' cluster system, you must configure the replication targets to use different IP addresses on the 'target' system.

If replicating to or from a cluster, the routing configuration is CRITICAL and should be done BEFORE starting the replication setup.

 

Special considerations for setting up replication between two clusters

DO NOT use any private cluster interfaces for replication.

To avoid any future problems, configure replication with both heads in the CLUSTERED state (ie. all cluster resources owned by their 'assigned' owner) and use SEPARATE replication targets for the pools owned by each head.

Be aware that when the above considerations have NOT been followed, all can work fine initially, but may break in the future when ownership of resources change.

Here are some examples of what can happen when the above considerations have not been followed:

When you're configuring a replication target on (system) A for replication to (system) C and that A's cluster peer is called (system) B.

If you configure the target with A in the OWNER state (i.e. having imported IP addresses owned by B), either of the following may happen:

  • The system may use one of A's addresses as the client for the peer connection.  If this happens, replication of data in storage pools owned by A will always work (even from B), but replication of data in storage pools owned by B will only work when A owns them.
  • The system may use one of B's addresses as the client for the peer connection.  If this happens, replication of data in storage pools owned by B will always work (even from A), but replication of data in storage pools owned by A will only work when A owns them.
  • If you configure the target when A has private network addresses, the system may use one of these addresses as the client for the peer connection.  If this happens, replication of data in storage pools owned by either A or B will only work when that pool is owned by A.
  • If you configure a replication target on A, you MUST NOT use that replication target to configure replication actions on data in storage pools owned by both A and B.  This is essentially the same as the first case above: replicating data from the pools owned by one head will always work, while replicating data from the pools owned by the other head will only work when the first head is in the OWNER state.

 

Project-level vs Share-level Replication

The appliance allows administrators to configure remote replication on both the project or share level. Like other properties configurable on the Shares screen, each share can either inherit or override the configuration of its parent project.

Inheriting the configuration means not only that the share is replicated on the same schedule to the same target with the same options as its parent project is, but also that the share will be replicated in the same stream using the same project-level snapshots as other shares inheriting the project's configuration.  This may be important for applications which require consistency between data stored on multiple shares.  Overriding the configuration means that the share will not be replicated with any project-level actions, though it may be replicated with its own share-level actions that will include the project. It is not possible to override part of the project's replication configuration and inherit the rest.

More precisely, the replication configuration of a project and its shares define some number of replication groups, each of which is replicated with a single stream using snapshots taken simultaneously.  All groups contain the project itself (which essentially just includes its properties).  One project-level group includes all shares inheriting the replication configuration of the parent project.  Any shares which override the project's configuration form a new group consisting of only the project and share themselves.

It is strongly recommended that project- and share-level replication be avoided within the same project because it can lead to surprising results (particularly when reversing the direction of replication).

See the documentation for Managing Replication Packages for more details:

        http://docs.oracle.com/cd/E28317_01/html/E38246/shares__projects__replication.html#shares__projects__replication___managing_replication_packages_

 

Replication Implementation Notes

Replication source and target systems must be running the same 'major' Appliance Firmware Release.

Replication target system must be running the same or later 'minor' Appliance Firmware Release.  (Note: Replication between systems running Appliance Firmware Release 2013.1.0.x and 2011.1.8.x is supported. Replication between systems running 2013.1.0.x and earlier versions of 2011.1.x IS supported provided certain software features are NOT being used, eg. "Multiple Initiator Groups per LUN" and/or "LUNs or Datasets with 1M record sizes").  See also Document ID 1958039.1 - Remote Replication Compatibility.

Replication runs on TCP/IP  port 216. (and for 2013.1.4.0/AK8.4.0 and laterport 217 is also required)

Encryption is a (performance) bottleneck.

Replication actions are based on IP address.

You cannot change IP address after the configuration has been set  (This restriction has been removed in the 2013.1.x (AK-8) release).

Replication uses ZFS features, this means you need to apply deferred updates after code upgrades are completed.

Replication was designed to run at the project level. This means that it will take a snapshot of ALL the shares in a project on the source system, even if you are only replication one share.  It also means you will have to remove all these extra snaps after each successful replication is complete.



To troubleshoot replication - look at:    /var/ak/logs/replication.ak
                                                       /var/ak/logs/alert.ak
                                                       /var/ak/logs/akd.ak
                                                       'zpool history -il'

on both the source and target systems

 

REPLICATION SETUP

Ideally, you should configure a dedicated (private) network and use dedicated network interfaces for replication traffic.

 

1.  Setup routing correctly first - to make sure traffic from source to destination will go out through the correct (cluster or not) interface that is tied to the source zpool.

It may be a good idea to add a host-specific route to the target system IP address via the dedicated network interface:

    eg.  Adding a route to target IP address '12.34.56.78' via the 'nge3' network interface

        NAS_src:configuration services routing> create

        NAS_src:configuration services route (uncommitted)> get
                                family = (unset)
                           destination = (unset)
                                  mask = (unset)
                               gateway = (unset)
                             interface = (unset)

        NAS_src:configuration services route (uncommitted)> set family=IPv4
        NAS_src:configuration services route (uncommitted)> set destination=12.34.56.78
        NAS_src:configuration services route (uncommitted)> set mask=32
        NAS_src:configuration services route (uncommitted)> set gateway=12.34.56.254
        NAS_src:configuration services route (uncommitted)> set interface=nge3
        NAS_src:configuration services route (uncommitted)> commit

 

            (mask=32  means this is a host-specific route)

 

        NAS_src:configuration services routing> show
        route-000  0.0.0.0/0                        123.24.30.254   nge0      static
        route-001  123.24.30.0/24                   123.24.30.28    nge0      dynamic
        route-002  123.24.150.0/24                  123.24.150.10   ibd0      dynamic
        route-003  123.24.101.65/32                 123.24.30.254   nge1      inactive
        route-005  12.34.56.78/32                   12.34.56.254    nge3      static

 

 

2.  Additionally, setup routing correctly on the target system to make sure traffic to the source is routed back to the correct (cluster or not) interface

 

    NOTE: If replicating to or from a cluster, routing configuration is CRITICAL to be done before starting the replication setup

 

 

3.  Setup the replication target

Before a source appliance can replicate to a target, the two systems must set up a replication peer connection that enables the appliances to identify each other securely for future communications.

Administrators setup this connection by creating a new replication target on the Configuration > Services > Remote Replication screen on the source appliance.

To create a new target, administrators specify three fields:

    - a name (used only to identify the target in the source appliance's BUI and CLI)

    - a network address or hostname (to contact the target appliance)

    - the target appliance's root password (to authorize the administrator to setup the connection on the target appliance)

The appliances then exchange keys used to securely identify each other in subsequent communications. These keys are stored persistently as part of the appliance's configuration and persist across reboots and upgrades. They will be lost if the appliance is factory reset or reinstalled. The root password is never stored persistently, so changing the root password on either appliance does not require any changes to the replication configuration. The password is never transmitted in the clear either because this initial identity exchange (like all replication control operations) is protected with SSL.

By default, the replication target connection is not bidirectional. If an administrator configures replication from a source A to a target B, B cannot automatically use A as a target. However, the system supports reversing the direction of replication, which automatically creates a target for A on B (if it does not already exist) so that B can replicate back to A.

To configure replication targets, see Creating and Editing Targets below:

        http://docs.oracle.com/cd/E28317_01/html/E38246/shares__projects__replication.html#shares__projects__replication___creating_and_editing_targets_

 

        NAS_src:configuration services replication targets> target
        NAS_src:configuration services replication target (uncommitted)>
        NAS_src:configuration services replication target (uncommitted)> set hostname=10.123.225.201
        NAS_src:configuration services replication target (uncommitted)> set root_password=rootpassword
        NAS_src:configuration services replication target (uncommitted)> set label=repl_1
        NAS_src:configuration services replication target (uncommitted)> commit

 

NOTE: When setting up a replication action, it may be better to set the 'hostname' field to a specific IP address - to ensure that the replication traffic is forced over a specific network interface (and route)

 

 

4.  Setup the replication action

After at least one replication target has been configured, administrators can configure actions on a local project or share by navigating to it in the BUI and clicking the Replication tab or navigating to it in the CLI and selecting the "replication" node.

These interfaces show the status of existing actions configured on the project or share and allow administrators to create new actions.

Replication actions have the following properties, which are presented slightly differently in the BUI and CLI:

PROPERTY (CLI name)DESCRIPTION
Target Unique identifier for the replication target system. This property is specified when an action is initially configured and immutable thereafter.
Pool Storage pool on the target where this project will be replicated. This property is specified when an action is initially configured and not shown thereafter.
Enabled Whether the system will send updates for this action.
Mode (CLI: continuous) and schedule Whether this action is being replicated continuously or at manual or scheduled intervals.  See below for details.
Include Snapshots Whether replication updates include non-replication snapshots. See below for details.
Limit bandwidth Specifies a maximum speed for this replication update (in terms of data transferred over the network per second).
Use SSL Whether to encrypt data on the wire using SSL. Using this feature can have a significant impact on per-action replication performance.
State Read-only property describing whether the action is currently idle, sending an update, or cancelling an update.
Last sync Read-only property describing the last time an update was successfully sent. This value may be unknown if the system has not sent a successful update since boot.
Last attempt Read-only property describing the last time an update was attempted. This value may be unknown if the system has not attempted to send an update since boot.
Next update Read-only property describing when the next attempt will be made. This value could be a date (for a scheduled update), "manual," or "continuous."

 

Modes: Manual, Scheduled, or Continuous

Replication actions can be configured to send updates manually, on a schedule, or continuously.  The replication update process itself is the same in all cases. This property only controls the interval.

Because continuous replication actions send updates as frequently as possible, they essentially result in sending a constant stream of all filesystem changes to the target system. For filesystems with a lot of 'churn' (many files created and destroyed in short intervals), this can result in replicating much more data than actually necessary.  However, as long as replication can keep up with data changes, this results in the minimum data lost in the event of a data-loss disaster on the source system.

Note that continuous replication is still asynchronous (it schedules the "next" replication iteration as soon as the current one is finished). Sun Storage appliances do not currently support synchronous replication, which does not consider data committed to stable storage until it's committed to stable storage on both the primary and secondary storage systems.

 

To configure replication actions, see Creating and Editing Actions below:

        http://docs.oracle.com/cd/E28317_01/html/E38246/shares__projects__replication.html#shares__projects__replication___creating_and_editing_actions_

 

        NAS_src:shares PROJECT1/SHARE1 replication> action

        NAS_src:shares PROJECT1/SHARE1 action (uncommitted)> get
        Properties:
                                target = (unset)
                                  pool = (unset)
                               enabled = true
                            continuous = false
                         include_snaps = true
                         max_bandwidth = unlimited
                               use_ssl = true

        NAS_src:shares PROJECT1/SHARE1 action (uncommitted)> set target=repl_sys
                                target = repl_sys (uncommitted)
        NAS_src:shares PROJECT1/SHARE1 action (uncommitted)> set pool=pool-0
                                  pool = pool-0 (uncommitted)
        NAS_src:shares PROJECT1/SHARE1 action (uncommitted)> set include_snaps=false
                         include_snaps = false (uncommitted)
        NAS_src:shares PROJECT1/SHARE1 action (uncommitted)> set use_ssl=false
                               use_ssl = false (uncommitted)
        NAS_src:shares PROJECT1/SHARE1 action (uncommitted)> commit

 

        NAS_src:shares PROJECT1/SHARE1 replication> ls
        Properties:
                             inherited = false

        Actions:

                    TARGET          STATUS     NEXT
        action-000  repl_sys        idle       manual

 

        NAS_src:shares PROJECT1/SHARE1 replication> select action-000

        NAS_src:shares PROJECT1/SHARE1 action-000> ls
        Properties:
                                    id = a751dc0f-abcd-1234-6789-f5e8315eaffa
                                target = repl_sys
                               enabled = true
                            continuous = false
                         include_snaps = false
                         max_bandwidth = unlimited
                               use_ssl = false
                                 state = idle
                     state_description = Idle (no update pending)
                             last_sync = <unknown>
                              last_try = <unknown>
                           next_update = manual

 

 

5.  Perform the initial replication update

The initial/first replication MUST complete successfully or you must clean up any remnants ('old' actions/snapshots etc.) and start it again.

Because the first replication is so critical, it may be useful to replicate as little info as possible on the initial pass by either replicating an empty project or at least do not elect to replicate the snapshots in the project/shares. These can be added later if you really want them to be replicated.

If you want the data on the target system to be compressed while it is being written on the target system, enable compression at the project level on the SOURCE system.

NOTE: Encryption can be enabled or disabled on the replication stream, and can be changed for the "next" replication iteration.
          Encryption introduces an artificial bandwidth bottleneck due to the extra processing on the stream.

 

        NAS_src:shares PROJECT1/SHARE1 action-000> sendupdate

 

 

6.  Select the project to replicate, setup an appropriate schedule (such as 'daily') then enable the action.

    (The frequency can be selected from one of "halfhour", "hour", "day", "week" or "month")

        NAS_src:shares PROJECT1/SHARE1 action-000> schedule
        NAS_src:shares PROJECT1/SHARE1 action-000 schedule (uncommitted)> set frequency=day
                             frequency = day (uncommitted)
        NAS_src:shares PROJECT1/SHARE1 action-000 schedule (uncommitted)> set hour=23
                                  hour = 23 (uncommitted)
        NAS_src:shares PROJECT1/SHARE1 action-000 schedule (uncommitted)> set minute=05
                                minute = 05 (uncommitted)

        NAS_src:shares PROJECT1/SHARE1 action-000 schedule (uncommitted)> commit
        NAS_src:shares PROJECT1/SHARE1 action-000>

 

The above (example) replication schedule is  -  Daily at 23:05 pm

        NAS_src:shares PROJECT1/SHARE1 action-000> ls
        Properties:
                                    id = a751dc0f-abcd-1234-6789-f5e8315eaffa
                                target = repl_sys
                               enabled = true
                            continuous = false
                         include_snaps = false
                         max_bandwidth = unlimited
                               use_ssl = false
                                 state = idle
                     state_description = Idle (no update pending)
                           next_update = Wed Sep 01 2013 23:05:00 GMT+0000 (UTC)
                             last_sync = Wed Sep 01 2013 10:24:05 GMT+0000 (UTC)
                              last_try = Wed Sep 01 2013 10:24:05 GMT+0000 (UTC)
                           last_result = success

        Schedules:

        NAME                 FREQUENCY            DAY                  HH:MM
        schedule-000         day                  -                    23:05

 

 

 

***Checked for relevance on 25-MAY-2018***

References

<NOTE:1958039.1> - Oracle ZFS Storage Appliance: Remote Replication Compatibility
<NOTE:2196548.1> - Oracle ZFS Storage Appliance: How to configure dedicated management interfaces
<BUG:24388115> - REPLICATION ACTION STUCK IN 'SENDING' STATE WAITING FOR NOTIFICATION FROM REPLD

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback