Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2232013.1
Update Date:2017-02-16
Keywords:

Solution Type  Problem Resolution Sure

Solution  2232013.1 :   Oracle ZFS Storage Appliance: Replication failing due to "Connection timed out" and/or "stage 'init' failed: failed to contact target"  


Related Items
  • Oracle ZFS Storage ZS3-2
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun ZFS Storage 7120
  •  
  • Oracle ZFS Storage ZS3-BA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: ZS
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-14247700221>

Applies to:

Oracle ZFS Storage ZS3-4 - Version All Versions and later
Oracle ZFS Storage ZS3-2 - Version All Versions and later
Oracle ZFS Storage ZS3-BA - Version All Versions and later
Sun ZFS Storage 7120 - Version All Versions and later
7000 Appliance OS (Fishworks)

Symptoms

ZS3-4 ZFS Appliance: replication stuck

Source and target nodes running 2013.06.05.6.3,1-1

Earliest error found in replication.ak.txt log on replication source:

Sun Jan 29 12:06:50 2017
nvlist version: 0
time = 0x588ddada
hrtime = 0x4fea5d56dbc19a
action = (embedded nvlist)
nvlist version: 0
target_label = zfs-1-nfs
target_uuid = c0045870-74e9-cb6a-cead-81cc7db2c7eb
uuid = 4f010aed-e621-6a13-d166-99f790de9b6f
state = sending
dataset = poola/local/test1
(end action)

event = update done
result = failure
errmsg = stage 'compat' failed: compat failed: failed to connect to [destination IP address]:216: Connection timed out
remote_status =

 

Most recent error in replication.ak.txt on entry on replication source indicates:

Mon Feb 6 00:03:05 2017
nvlist version: 0
time = 0x5897bd39
hrtime = 0x12b8c5dfa1f9a
action = (embedded nvlist)
nvlist version: 0
target_label = zfs-1-nfs
target_uuid = c0045870-74e9-cb6a-cead-81cc7db2c7eb
uuid = e0f65a6b-e08f-6a23-e59d-80f9365ed1eb
state = sending
dataset = poola/local/test2
(end action)

event = update done
result = failure
errmsg = stage 'init' failed: failed to contact target
remote_status =

"failed to contact target" errors started on Thu Feb 2 08:14:40 2017.

 

Changes

Initially, the target/destination was to an IPMP interface; where the connection timeout errors occurred.

Later we started seeing the "stage 'init' failed" events.

After that,  the destination was changed to a different (VNIC) interface on the same target, but this did not resolve the problem.

 

In this particular case, replication did eventually start working after AKD was restarted on the source.

 

Cause

Related bugs:

Bug 24486226 Replication failing at 'init' stage      (Base bug: 24600417 replication source should use peer token address to connect to target)

Bug 17173143 ak_peer_token_hostname() confuses the notion of nodename with hostname

...both of which are resolved in 2013.1 update 6.7

Solution

Upgrade to Appliance Firmware Release 2013.1.6.7 (or later).

 

 

References

<BUG:24486226> - REPLICATION FAILING AT 'INIT' STAGE
<NOTE:1397959.1> - Sun Storage 7000 Unified Storage System: How to Troubleshoot Replication Issues

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback