Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1582913.1
Update Date:2018-05-25
Keywords:

Solution Type  Problem Resolution Sure

Solution  1582913.1 :   Sun Storage 7000 Unified Storage System: Limitation on the number of simultaneous replication jobs  


Related Items
  • Sun ZFS Storage 7420
  •  
  • Oracle ZFS Storage ZS5-2
  •  
  • Sun Storage 7110 Unified Storage System
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Sun Storage 7210 Unified Storage System
  •  
  • Oracle ZFS Storage ZS4-4
  •  
  • Sun Storage 7410 Unified Storage System
  •  
  • Oracle ZFS Storage ZS5-4
  •  
  • Sun ZFS Storage 7120
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun ZFS Storage 7320
  •  
  • Oracle ZFS Storage Appliance Racked System ZS4-4
  •  
  • Oracle ZFS Storage ZS3-BA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-7155651351>

Applies to:

Sun ZFS Storage 7320 - Version All Versions and later
Sun ZFS Storage 7420 - Version All Versions and later
Sun ZFS Storage 7120 - Version All Versions and later
Sun Storage 7310 Unified Storage System - Version All Versions and later
Sun Storage 7210 Unified Storage System - Version All Versions and later
7000 Appliance OS (Fishworks)

Symptoms

Frequent replication failures:

sun7420-1:maintenance logs alert entry-065> show
Properties:
timestamp = 2013-5-3 14:48:12
uuid = c0d4ddf7-d8c9-e67e-ebbb-cda39ad95e7a
description = Replication of 'pSCM/sSVNnew' to 'sun7310' failed.
type = Minor alert

sun7420-1:maintenance logs alert entry-095> show
Properties:
timestamp = 2013-5-3 18:11:04
uuid = 1ddd317b-cf3b-45a9-9bf9-ea6256c946b5
description = Replication of 'pSCM/sCVSHome' to 'sun7310' failed.
type = Minor alert

sun7420-2:maintenance logs alert entry-097> show
Properties:
timestamp = 2013-5-3 18:30:18
uuid = 7a4e9aa4-d990-6b24-af72-aaa1c2322787
description = Replication of 'pBuild/sJenkins' to 'sun7310' failed.
type = Minor alert

sun7420-2:maintenance logs alert entry-099> show
Properties:
timestamp = 2013-5-3 18:30:50
uuid = 5a290111-7d95-6018-a6ee-af5a90410f81
description = Replication of 'pBuild/sJenkins' to 'sun7310' failed.
type = Minor alert

 

Fri May 3 17:35:22 2013
nvlist version: 0
project = pabc/sCVS
target_host = sun7310
source = appliance/kit/akd:default
link = 55259430-da5e-c342-c84b-8213598865d2
class = alert.ak.appliance.nas.project.replication.send.fail.misc
result = failure
ak_errmsg = stage 'stream_setup' failed: failed to invoke receive() XDR: bad status: 'norsrcs'
uuid = 58c6f9d7-8e50-e378-c0bc-f19353a68b00

Fri May 3 18:15:26 2013
nvlist version: 0
project = pabc/sCVSHome
target_host = sun7310
source = appliance/kit/akd:default
link = d6ae4e23-2306-427f-ebab-e61c54fa029c
class = alert.ak.appliance.nas.project.replication.send.fail.misc
result = failure
ak_errmsg = stage 'stream_setup' failed: failed to invoke receive() XDR: bad status: 'norsrcs'
uuid = a3b436ce-106b-6856-d492-c9a605d989af



Cause

There is a hard limit on the number of threads available for use by replication processing.

THE LIMIT IS  30

 

NAS # svccfg -s akd listprop |egrep repl.*max
nas/repl_maxthreads_max astring 30
nas/repl_maxthreads_min astring 30
replication/maxthreads integer 30                                         <<<<<<<<
replication/contwait_max integer 1800000000000


NOTE: Replication uses port 216

NAS# netstat -an | grep 216 | wc -l
38


We have 38 'connections' attempting to replicate. The last 8 requests would report failed but were actually being denied from the target side until one of the running 30 threads completed to allow up to 30 (maximum) threads.

 

Alternative troubleshooting technique:

The max number of threads that may run in parallel is 30.  So, the max number of actions 'sending' in parallel is 30.

'netstat -an | grep 216' may only show 1 connection per couple (source,target), while we may actually have several replication actions running on one connection.

A better way to see if we have reached the maximum number of threads is :

    # echo " ::walk nas_repl_action | ::nas_repl_action  ! grep SENDING | wc -l " | mdb -p `pgrep -ox akd`
    57

 

 

Solution

The problem is with the number of replications running at the same time. The maximum number of replications that can run simultaneously is 30.

The number of client connections is not an issue.

No faults on the target nodes are reported because the replication is running as programmed.

The reason for this threshold is to allow maximum CPU utilization and disk usage on both the target and client, so the system does not crash or 'seize' in the middle of a replication of data.

Currently, there is no ability to set priority on projects or shares for replication.  In this case, it's best to replicate at the project level.

For replications that fail, it's best to check 'shares replication sources' on the target node to see how many actual replications are running.

 

Resolution: Set up the timing/frequency on when replications can run - only allow a maximum of 30 replication updates to be 'active' at any one time.

 

 

 

***Checked for relevance on 25-MAY-2018***

References

<NOTE:1397959.1> - Sun Storage 7000 Unified Storage System: How to Troubleshoot Replication Issues

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback