Sun Storage 7000 Unified Storage System: Replication fails with "space quota exceeded"

Asset ID:	1-72-1538731.1
Update Date:	2017-10-05
Keywords:

Solution Type Problem Resolution Sure

Solution 1538731.1 : Sun Storage 7000 Unified Storage System: Replication fails with "space quota exceeded"

Applies to:

Sun ZFS Storage 7420 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7320 - Version All Versions to All Versions [Release All Releases]
Sun Storage 7110 Unified Storage System - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7120 - Version All Versions to All Versions [Release All Releases]
Sun Storage 7410 Unified Storage System - Version All Versions to All Versions [Release All Releases]
7000 Appliance OS (Fishworks)

Symptoms

From the replication SOURCE system:

Fri Feb 8 04:54:21 2013
nvlist version: 0
time = 0x511484fd
hrtime = 0xa69b652282c50
action = (embedded nvlist)
nvlist version: 0
target_label = u1eis04nas17-bkp
target_uuid = 3ca5516f-0ac0-e587-b8ea-dfdad7c90231
uuid = 1a52ef6e-064b-eb62-d0b0-d2f9e5e1d68a
state = sending
dataset = exalogic/local/NODE_8/general
(end action)

event = update done
result = failure
errmsg = stage 'stream_send' failed: zfs_send: cannot send 'exalogic/local/NODE_8': Broken pipe
remote_status = ok

From the replication TARGET system:

Fri Feb 8 04:50:50 2013
nvlist version: 0
time = 0x5114842a
hrtime = 0x117cd408d57f89
pkg = (embedded nvlist)
nvlist version: 0
source_asn = bd8f7331-c1bb-c49e-8186-e23ff5c3f597
source_name = aueis12nasx03
uuid = 1a52ef6e-064b-eb62-d0b0-d2f9e5e1d68a
state = receiving
(end pkg)

event = recv_done
result = failed
error = zfs_receive failed: cannot receive new filesystem stream: destination pool17a/nas-rr-1a52ef6e-064b-eb62-d0b0-d2f9e5e1d68a/NODE_8 space quota exceeded

From the replication TARGET system ALERTS:

Fri Feb 8 04:38:51 2013
nvlist version: 0
class = alert.ak.appliance.nas.project.replication.receive.start
source = svc:/appliance/kit/akd:default
project = NODE_8
source_host = aueis12nasx03
uuid = 788e8679-0936-6d60-8472-9cc60ad6b438
link =

Fri Feb 8 04:50:50 2013
nvlist version: 0
class = alert.ak.appliance.nas.project.replication.receive.fail.misc
source = svc:/appliance/kit/akd:default
link = 788e8679-0936-6d60-8472-9cc60ad6b438
project = NODE_8
source_host = aueis12nasx03
ak_errmsg = zfs_receive failed: cannot receive new filesystem stream: destination pool17a/nas-rr-1a52ef6e-064b-eb62-d0b0-d2f9e5e1d68a/NODE_8 space quota exceeded
uuid = 501adb71-c35f-c420-e929-b4899247620b

Cause

The "space quota exceeded" error message for the 'NODE_8' replications is related to the 'quota_snap' property setting.

Data quotas

A data quota enforces a limit on the amount of space a filesystem or project can use. By default, it will include the data in the filesystem and all snapshots. Clients attempting to write new data will get an error when the filesystem is full, either because of a quota or because the storage pool is out of space. As described in the snapshot section, this behavior may not be intuitive in all situations, particularly when snapshots are present. Removing a file may cause the filesystem to write new data if the data blocks are referenced by a snapshot, so it may be the case that the only way to decrease space usage is to destroy existing snapshots.

If the 'include snapshots' property is unset, then the quota applies only to the immediate data referenced by the filesystem, not any snapshots. The space used by snapshots is enforced by the project-level quota but is otherwise not enforced. In this situation, removing a file referenced by a snapshot will cause the filesystem's referenced data to decrease, even though the system as a whole is using more space. If the storage pool is full (as opposed to the filesystem reaching a preset quota), then the only way to free up space may be to destroy snapshots.

Data quotas are strictly enforced, which means that as space usage nears the limit, the amount of data that can be written must be throttled as the precise amount of data to be written is not known until after writes have been acknowledged. This can affect performance when operating at or near the quota. Because of this, it is generally advisable to remain below the quota during normal operating procedures.

Quotas are managed through the BUI under Shares -> General -> Space Usage -> Data.

They are managed in the CLI as the quota and quota_snap properties.

To set the 'quota_snap' property for a share via the CLI (example):

clownfish:shares default/foo > get
aclinherit = restricted (inherited)
atime = true (inherited)
checksum = fletcher4 (inherited)
compression = off (inherited)
copies = 1 (inherited)
mountpoint = /export/foo (inherited)
quota = 0 (inherited)
readonly = false (inherited)
recordsize = 128K (inherited)
reservation = 0 (inherited)
secondarycache = all (inherited)
nbmand = false (inherited)
sharesmb = off (inherited)
sharenfs = on (inherited)
snapdir = hidden (inherited)
vscan = false (inherited)
sharedav = off (inherited)
shareftp = off (inherited)
root_group = other (default)
root_permissions = 700 (default)
root_user = nobody (default)
casesensitivity = (default)
normalization = (default)
utf8only = (default)
quota_snap = (default)
reservation_snap = (default)
custom:int = (default)
custom:string = (default)
custom:email = (default)
clownfish:shares default/foo > set quota_snap=false
quota_snap = false(uncommitted)

clownfish:shares default/foo > commit
clownfish:shares default>

Solution

Recommendation: Set the 'quota_snap' property on the project/share to FALSE.

The "space quota exceeded" can happen when there is no available space on shares as well (on source node):

# zfs list
NAME USED AVAIL REFER MOUNTPOINT
pool-570/local/FMW 31K 55.4T 31K /export
pool-570/local/FMW_dr_clusters 1.18T 22.9T 31K /export
pool-570/local/FMW_dr_clusters/IDMDOMAIN 10.3M 4.99G 9.96M /export/IDMDOMAIN
pool-570/local/FMW_dr_clusters/IDMMS1 20.0G 0 20.0G /export/IDMMS1 <<< full
pool-570/local/FMW_dr_clusters/IDMMS2 31.5K 5.00G 31.5K /export/IDMMS2
pool-570/local/FMW_dr_clusters/IDM_BIN 4.39G 5.61G 4.38G /export/IDM_BIN
pool-570/local/FMW_dr_clusters/IDM_DOMAIN 31.5K 5.00G 31.5K /export/IDM_DOMAIN
pool-570/local/FMW_dr_clusters/IDM_P1 30.0G 0 30.0G /export/IDM_P1 <<< full
pool-570/local/FMW_dr_clusters/OCR2 31.5K 10.0G 31.5K /export/OCR2

The outcome was to increase quota and reservation for /export/IDMMS1 and /export/IDM_P1 shares :

# zfs list -o space | egrep "IDMMS1|IDM_P1"
NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD
pool-570/local/FMW_dr_clusters 22.8T 1.20T 0 31K 0 1.20T
pool-570/local/FMW_dr_clusters/IDMMS1 10.0G 20.0G 64K 20.0G 0 0
pool-570/local/FMW_dr_clusters/IDM_P1 15.0G 30.0G 340K 30.0G 0 0

Finally, the replication did succeed :

slcnas570:shares FMW_dr_clusters replication> select action-000
cli:shares FMW_dr_clusters action-000> get
id = caf1f326-1e38-cebb-f3d5-bd69d1531419
target = slcnas505
enabled = true
continuous = false
include_snaps = false
max_bandwidth = unlimited
bytes_sent = 0
estimated_size = 0
estimated_time_left = 00:00:00
average_throughput = 0B/s
use_ssl = true
state = idle
state_description = Idle (no update pending)
next_update = Tue May 13 2014 09:00:00 GMT+0000 (UTC)
last_sync = Tue May 13 2014 08:47:57 GMT+0000 (UTC)
last_try = Tue May 13 2014 08:47:57 GMT+0000 (UTC)
last_result = success

Worth mentioning a few bugs related to quota issues that have been fixed under 2013.1.1.9 :

   15505861     SUNBT6744280 write performance degrades when ZFS filesystem is near quota (18524586)
   15758542     SUNBT7117263 Replications can fail with 'space quota exceeded' after compression (18531721)
   16849863     clone of a share fails if the share run out of space due to quota (18531713)
   17192457     replication reverse fails due to quota validation (18531709)
   17529610     allow zfs_inherit to reset quota even when it is less than used space (18524588)
   17563136     Exceeding quota in replica package makes it completely useless (18531723)
   18110996     clone of a share fails if the share run out of space due to quota (18531731)
   18245698     clone into project that exceeded the space quota produce inconsistent results (18531733)

References

<NOTE:1503867.1> - Configure and Mount NFS shares from SUN ZFS Storage 7320 for SPARC SuperCluster
<BUG:15505861> - SUNBT6744280 WRITE PERFORMANCE DEGRADES WHEN ZFS FILESYSTEM IS NEAR QUOTA
<BUG:16849863> - CLONE OF A SHARE FAILS IF THE SHARE RUN OUT OF SPACE DUE TO QUOTA
<BUG:17192457> - REPLICATION REVERSE FAILS DUE TO QUOTA VALIDATION
<BUG:17529610> - ALLOW ZFS_INHERIT TO RESET QUOTA EVEN WHEN IT IS LESS THAN USED SPACE
<BUG:17563136> - EXCEEDING QUOTA IN REPLICA PACKAGE MAKES IT COMPLETELY USELESS
<BUG:18110996> - CLONE OF A SHARE FAILS IF THE SHARE RUN OUT OF SPACE DUE TO QUOTA
<BUG:18245698> - CLONE INTO PROJECT THAT EXCEEDED THE SPACE QUOTA PRODUCE INCONSISTENT RESULTS

Attachments

This solution has no attachment