Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1399581.1
Update Date:2018-05-14
Keywords:

Solution Type  Troubleshooting Sure

Solution  1399581.1 :   Sun Storage 7000 Unified Storage System: Replication starts to a remote appliance but does not complete  


Related Items
  • Sun ZFS Storage 7420
  •  
  • Sun Storage 7110 Unified Storage System
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Sun Storage 7210 Unified Storage System
  •  
  • Oracle ZFS Storage ZS4-4
  •  
  • Sun Storage 7410 Unified Storage System
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Sun ZFS Storage 7120
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun ZFS Storage 7320
  •  
  • Oracle ZFS Storage Appliance Racked System ZS4-4
  •  
  • Oracle ZFS Storage ZS3-BA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  
  • _Old GCS Categories>Sun Microsystems>Storage - Disk>Unified Storage
  •  


Review Sun Storage 7000 Unified Storage System: Resolution Path for Replication Issues [Document 1397959.1] before reading this document

In this Document
Purpose
Troubleshooting Steps
References


Applies to:

Sun ZFS Storage 7320 - Version All Versions and later
Sun Storage 7310 Unified Storage System - Version All Versions and later
Sun Storage 7410 Unified Storage System - Version All Versions and later
Sun Storage 7210 Unified Storage System - Version All Versions and later
Sun Storage 7110 Unified Storage System - Version All Versions and later
7000 Appliance OS (Fishworks)

Purpose

To assist in troubleshooting issues with replication between Sun ZFS 7000 Storage appliances where replication starts to a remote appliance but does not complete.

To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Disk Storage ZFS Storage Appliance Community

Troubleshooting Steps

Examine Maintenance -> Logs -> Alerts in the BUI of the source system. There should be alerts similar to the following :

    2011-9-10 11:30:01  Began replicating 'HOMEtest' to appliance 's7120-ftlauder-a'.
    2011-9-10 11:30     Replication of 'HOMEtest' to 's7120-ftlauder-a' failed.


Click on the "information" symbol to the right of the "failed message" date.  Does it specify a reason why the replication failed ?

Examine the Alert log on the target system, searching for the date and time that it failed.

If there was no alert generated, the possibility exists that the source never made contact with the target.

Here, we see that a successful replication did take place the day before.

2011-9-9 09:33:25    Finished replicating project 'HOMEtest' from appliance 's7420-ftlauder-a'.    Minor Alert
2011-9-9 09:33:08    Began replicating project 'HOMEtest' from appliance 's7420-ftlauder-a'.    Minor Alert


Was there work being performed on the network when the replication failed ?   If that work has been completed, then attempt a manual replication to the target system.

Might the IP address of either the source or the target system have been changed since the last successful replication. The appliance IP addresses can not change. If done, the old replication package will need to be destroyed and a new one created.

Check for messages indicating no route to host in the system log. If those messages exist, from the CLI enter "traceroute <target ip address or hostname>"

Review Document ID 1335245.1, "Sun Storage 7000 Unified Storage System: Replication failure due quotas in use on the source machine"

Replication requires enough appliance kit daemon (akd) memory on both the source and target system to complete.

On the target system, Go into the statistics in the BUI and check the memory.  If there is over 3 GB in use, it could be the issue.   Please contact Oracle support to determine the cause of excessive memory usage.

The remote (target) appliance also must be able to connect to the local (source) appliance using port 216.

 

If an alert message like the following is reported on the source, with no accompanying message on the destination, and the message shows the source's IP address in the message, then the remote appliance may not be able to connect to the source on port 216

       "errmsg = stage 'stream_setup' failed: failed to invoke receive() XDR: rpc com.sun.ak.nas_repl.receive:1 failed on remote peer: failed to connect to xx.xx.xx.xx:216: Connection timed out"

 

From a system on the destination subnet that supports telnet, attempt to connect to the source system on port 216.

telnet to local_appliance port 22 # ssh works
telnet to local_appliance port 2049 # NFS works
telnet to local_appliance port 216 # Replication fails

 

In this case, there was a firewall in the path back to the source system blocking port 216

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback