Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1624644.1
Update Date:2018-05-25
Keywords:

Solution Type  Problem Resolution Sure

Solution  1624644.1 :   Sun Storage 7000 Unified Storage System: Replication fails due to missing replication snapshots  


Related Items
  • Sun ZFS Storage 7420
  •  
  • Oracle ZFS Storage ZS5-2
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Sun Storage 7110 Unified Storage System
  •  
  • Sun Storage 7210 Unified Storage System
  •  
  • Oracle ZFS Storage ZS4-4
  •  
  • Sun Storage 7410 Unified Storage System
  •  
  • Oracle ZFS Storage ZS5-4
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Sun ZFS Storage 7120
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun ZFS Storage 7320
  •  
  • Oracle ZFS Storage Appliance Racked System ZS4-4
  •  
  • Oracle ZFS Storage ZS3-BA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-6460495191>

Applies to:

Oracle ZFS Storage ZS3-2 - Version All Versions and later
Sun Storage 7110 Unified Storage System - Version All Versions and later
Sun Storage 7210 Unified Storage System - Version All Versions and later
Sun Storage 7310 Unified Storage System - Version All Versions and later
Sun Storage 7410 Unified Storage System - Version All Versions and later
7000 Appliance OS (Fishworks)

Symptoms

Source : 7320 : ak.prablstorage1-1137FMM00E-maguro_plus-2013-01-31.06.37.52
Destination : 7120 : ak.drablstorage-1137FMM0TM-iwashi_plus-2013-01-31.06.40.02

Failing replications :

Source

Thu Jan 31 03:30:22 2013
nvlist version: 0
time = 0x5109e54e
hrtime = 0x7263fed08b69
action = (embedded nvlist)
nvlist version: 0
target_label = ablreplication
target_uuid = 8a619b03-00cf-c0e2-cf67-bda6b6596924
uuid = 4acdcfff-2513-4fe9-e073-b8bbfbd81300
state = sending
dataset = prablstrpool/local/prdbproj-p01/prabldb01p-fccdata
(end action)

event = update done
result = failure
errmsg = stage 'wait' failed: failed on remote side (code -1)
remote_status = ok

Thu Jan 31 04:00:29 2013
nvlist version: 0
time = 0x5109ec5d
hrtime = 0x7408a56bcb3c
action = (embedded nvlist)
nvlist version: 0
target_label = ablreplication
target_uuid = 8a619b03-00cf-c0e2-cf67-bda6b6596924
uuid = 4acdcfff-2513-4fe9-e073-b8bbfbd81300
state = sending
dataset = prablstrpool/local/prdbproj-p01/prabldb01p-fccdata
(end action)

event = update done
result = failure
errmsg = stage 'stream_send' failed: zfs_send: cannot send 'prablstrpool/local/prdbproj-p01': Broken pipe
remote_status = ok


Target

Thu Jan 31 06:29:16 2013
nvlist version: 0
time = 0x510a0f3c
hrtime = 0xfdde6c16ae132
pkg = (embedded nvlist)
nvlist version: 0
source_asn = 1372fe2a-2884-455f-dbe3-c192dda49477
source_name = prablstorage1
uuid = 4acdcfff-2513-4fe9-e073-b8bbfbd81300
state = receiving
(end pkg)

event = recv_done
result = failed
error = zfs_receive failed: cannot receive incremental stream: most recent snapshot of drablstrpool/nas-rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300/prdbproj-p01/prabldb01p-fccdata

 

The main issue on replication target is for :

# grep zfs_receive replication.ak.txt
error = zfs_receive failed: cannot receive incremental stream: most recent snapshot of drablstrpool/nas-rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300/prdbproj-p01/prabldb01p-fccdata does not match incremental source
error = zfs_receive failed: cannot receive incremental stream: most recent snapshot of drablstrpool/nas-rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300/prdbproj-p01/prabldb01p-fccdata does not match incremental source
error = zfs_receive failed: cannot receive incremental stream: most recent snapshot of drablstrpool/nas-rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300/prdbproj-p01/prabldb01p-fccdata does not match incremental source
error = zfs_receive failed: cannot receive incremental stream: most recent snapshot of drablstrpool/nas-rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300/prdbproj-p01/prabldb01p-fccdata does not match incremental source
error = zfs_receive failed: cannot receive incremental stream: most recent snapshot of drablstrpool/nas-rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300/prdbproj-p01/prabldb01p-fccdata does not match incremental source
error = zfs_receive failed: cannot receive incremental stream: most recent snapshot of drablstrpool/nas-rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300/prdbproj-p01/prabldb01p-fccdata does not match incremental source

 

From the source node we see those replication snapshots.

prablstrpool/local/prdbproj-p01@.rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300-1ee
prablstrpool/local/prdbproj-p01@.rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300-1f1
prablstrpool/local/prdbproj-p01/prabldb01p-bkp@.rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300-1f1
prablstrpool/local/prdbproj-p01/prabldb01p-fccdata@.rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300-1ee <<< prabldb01p-fccdata
prablstrpool/local/prdbproj-p01/prabldb01p-fccdata@.rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300-1f1 <<< prabldb01p-fccdata
prablstrpool/local/prdbproj-p01/prabldb01p-orclsw@.rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300-1ee
prablstrpool/local/prdbproj-p01/prabldb01p-orclsw@.rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300-1f1

 

From the target node, we see those snapshots.  Note there is 1 'recv' snapshot for 'prabldb01p-fccdata' share while there is none for other shares in 'prdbproj-p01' project.

drablstrpool/nas-rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300/prdbproj-p01
drablstrpool/nas-rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300/prdbproj-p01@.auto-1359604800
drablstrpool/nas-rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300/prdbproj-p01@.rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300-1ee
drablstrpool/nas-rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300/prdbproj-p01@.auto-1359612000
drablstrpool/nas-rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300/prdbproj-p01@.rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300-1f1
drablstrpool/nas-rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300/prdbproj-p01/prabldb01p-fccdata
drablstrpool/nas-rr-4acdcfff-2513-4fe9-e073-b8bbfbd81300/prdbproj-p01/prabldb01p-fccdata@recv-1946-1 <<< here

 

----------------------------------------------------------------------------------------------------------

ak-nas-2011.04.24.5.0_1-1.33

Source        : 7320 : /cores/3-6460495191/Feb-8/source
Destination : 7120 : /cores/3-6460495191/Feb-8/destination

Failing replication

source

  Wed Feb 13 11:30:54 2013
  nvlist version: 0
        time = 0x511b796e
        hrtime = 0x354770724891
        action = (embedded nvlist)
        nvlist version: 0
                target_label = ablreplication
                target_uuid = 8a619b03-00cf-c0e2-cf67-bda6b6596924
                uuid = b180dbc6-3763-49dc-de7a-c115c4cd4820
                state = sending
                dataset = prablstrpool/local/prdbproj-p01/prabldb01p-orclsw
        (end action)

        event = update done
        result = failure
        errmsg = stage 'stream_send' failed: zfs_send: cannot send 'prablstrpool/local/prdbproj-p01': Broken pipe
        remote_status = ok

 

prablstorage1# zfs list -t all | grep b180dbc6-3763-49dc-de7a-c115c4cd4820
prablstrpool/local/prdbproj-p01@.rr-b180dbc6-3763-49dc-de7a-c115c4cd4820-1e1                                     0      -  65.1K  -
prablstrpool/local/prdbproj-p01@.rr-b180dbc6-3763-49dc-de7a-c115c4cd4820-1e7                                     0      -  65.1K  -
prablstrpool/local/prdbproj-p01/prabldb01p-bkp@.rr-b180dbc6-3763-49dc-de7a-c115c4cd4820-1e1                   140M      -   119G  -
prablstrpool/local/prdbproj-p01/prabldb01p-bkp@.rr-b180dbc6-3763-49dc-de7a-c115c4cd4820-1e7                      0      -   119G  -
prablstrpool/local/prdbproj-p01/prabldb01p-fccdata@.rr-b180dbc6-3763-49dc-de7a-c115c4cd4820-1e1               322M      -   479G  -
prablstrpool/local/prdbproj-p01/prabldb01p-fccdata@.rr-b180dbc6-3763-49dc-de7a-c115c4cd4820-1e7               359K      -   480G  -  
prablstrpool/local/prdbproj-p01/prabldb01p-orclsw@.rr-b180dbc6-3763-49dc-de7a-c115c4cd4820-1e1                994M      -   999G  -
prablstrpool/local/prdbproj-p01/prabldb01p-orclsw@.rr-b180dbc6-3763-49dc-de7a-c115c4cd4820-1e7                414K      -   999G  -

 

destination

  Wed Feb 13 12:30:15 2013
  nvlist version: 0
        time = 0x511b8757
        hrtime = 0x209c7fb5a3d37
        pkg = (embedded nvlist)
        nvlist version: 0
                source_asn = 1372fe2a-2884-455f-dbe3-c192dda49477
                source_name = prablstorage1
                uuid = b180dbc6-3763-49dc-de7a-c115c4cd4820
                state = receiving
        (end pkg)

        event = recv_done
        result = failed
        error = zfs_receive failed: cannot receive incremental stream: most recent snapshot of drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01/prabldb01p-orclsw does not match incremental source

 

drablstorage# zfs list -t all | grep b180dbc6-3763-49dc-de7a-c115c4cd4820  | nawk '{print $1,$2,$3,$4,$5}'
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820 2.19T 9.54T 68.2K none
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01 2.19T 9.54T 68.2K /export
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01@.rr-b180dbc6-3763-49dc-de7a-c115c4cd4820-1e1 2.44K - 68.2K -
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01@.auto-1360753200 2.44K - 68.2K -
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01@.auto-1360756800 2.44K - 68.2K -
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01@.auto-1360760400 2.44K - 68.2K -
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01@.rr-b180dbc6-3763-49dc-de7a-c115c4cd4820-1e7 0 - 68.2K -
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01/prabldb01p-orclsw 2.19T 10.5T 916G -
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01/prabldb01p-orclsw@fccdata_eod_26012013 41.7G - 915G -
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01/prabldb01p-orclsw@fccdata_eod_28012013 5.82G - 915G -
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01/prabldb01p-orclsw@fccdata_eod_29012013 8.63G - 915G -  
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01/prabldb01p-orclsw@fccdata_eod_30012013 21.5G - 915G -
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01/prabldb01p-orclsw@orclsw_eod_31012013 1.34G - 915G -
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01/prabldb01p-orclsw@orclsw_eod_01022013 5.37G - 915G -
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01/prabldb01p-orclsw@orclsw_eod_02022013 5.84G - 915G -
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01/prabldb01p-orclsw@orclsw_eod_04022013 4.82G - 915G -
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01/prabldb01p-orclsw@orclsw_eod_05022013 2.83G - 915G -
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01/prabldb01p-orclsw@orclsw_eod_06022013 4.46G - 915G -
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01/prabldb01p-orclsw@orclsw_eod_07022013 6.61G - 916G -
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01/prabldb01p-orclsw@orclsw_eod_08022013 904M - 916G -
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01/prabldb01p-orclsw@orclsw_eod_09022013 2.73G - 916G -
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01/prabldb01p-orclsw@orclsw_eod_11022013 6.00G - 916G -
drablstrpool/nas-rr-b180dbc6-3763-49dc-de7a-c115c4cd4820/prdbproj-p01/prabldb01p-orclsw@orclsw_eod_12022013 3.33G - 916G -


Note there is no  "b180dbc6-3763-49dc-de7a-c115c4cd4820" snapshot for  prabldb01p-orclsw share

 

Cause

Bug 16322832 - replication failed due to missing replication snapshots

Bug 16933963 - zfs_send() return code inconsistent


At this stage we now understand the circumstances of the replication failing.

There's a libzfs "send zfs" internal interaction which results in a low level failure code being lost.
This means by the time the replication code in AK gets to see things, it thinks the update succeeded and zaps the oldest snapshot on the source.
This means we no longer have a common snapshot between source and target and everything is bust.

The code in question concerns the use of a logical "or" to return error codes, the fix at this pointis to switch to an if statement to avoid "-1" turning into "1".

Understanding the underlying failure which triggers this is another thing, as at this point we want to get the IDR to the customer so they have some stability in their environment.

Closing this as a dup of:

  16933963 - zfs_send() return code inconsistent

which is where the fix was integrated. the official fix for this is in 2013.1 (formerly ak8), and also in 2011.1 update 8.0.

As this bug describes what is seen after the incident, the only option is to upgrade to one of the above fixed releases, and then recreate the replication as a common snapshot between the replication source and target have been lost when we seen the symptoms described here.

 

Solution

Upgrade replication source and targets system to Appliance Firmware Release 2011.1.8.1 (or later)  or  2013.1.1.1 (or later)

 

 

 

***Checked for relevance on 25-MAY-2018***

References

<NOTE:1434184.1> - Sun Storage 7000 Unified Storage System: How to Troubleshoot Fibre-Channel Problems
<BUG:16322832> - REPLICATION FAILED DUE TO MISSING REPLICATION SNAPSHOTS
<NOTE:1213725.1> - Sun Storage 7000 Unified Storage System: Configuration and tuning for NFS performance
<NOTE:1315536.1> - Sun Storage 7000 Unified Storage System: RAIDZ2 Performance Issues With High I/O Wait Queues
<NOTE:1331769.1> - Sun Storage 7000 Unified Storage System: How to Troubleshoot Performance Issues
<NOTE:1213714.1> - Sun ZFS Storage Appliance: Performance clues and considerations

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback