Asset ID: |
1-72-1507737.1 |
Update Date: | 2018-01-05 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1507737.1
:
Sun Storage 7000 Unified Storage System: Tuning Solaris hosts using ZFS filesystem (exported from 7000 Series NAS)
Related Items |
- Sun ZFS Storage 7420
- Sun Storage 7410 Unified Storage System
- Sun ZFS Storage 7120
- Sun Storage 7310 Unified Storage System
- Sun ZFS Storage 7320
|
Related Categories |
- PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
|
In this Document
Created from <SR 3-6337737318>
Applies to:
Sun ZFS Storage 7420 - Version All Versions and later
Sun ZFS Storage 7120 - Version All Versions and later
Sun Storage 7410 Unified Storage System - Version All Versions and later
Sun Storage 7310 Unified Storage System - Version All Versions and later
Sun ZFS Storage 7320 - Version All Versions and later
7000 Appliance OS (Fishworks)
Symptoms
A solaris host uses ZFS filesystem. Only 1 LUN is used from the ZFS appliance. This LUN has a volblocksize of 8KB (defined on the ZFS appliance). The data filesystem has a recordsize of 128KB.
host # zpool status
pool: data
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
c5t21000024FF2D607Fd2 ONLINE 0 0 0 <<< LUN exported by a 7120 (zfs appliance)
Performance is not as expected. We cannot sustain more than 50 MB/s for IO writes.
Cause
- Partition alignment is really important to get correct performance. This has to be checked on client side. Solaris hosts using ZFS will have to force alignment using modifications to /kernel/drv/sd.conf and/or ssd.conf.
- Limited number of maximum outstanding IOs for a LUN. This can be changed dynamically using kernel tuning.
Solution
Here are some recommendations to get the good performance. A filebench result is provided as well.
- Use only 2 pools on the 7120, so that each pool will be able to use 2 logzilla (SSD for IO writes) in a stripe layout. Then, each pool can sustain up to 200MB/s for IO writes
- On the client side, always check for good alignment.
- For LUNs with an EFI label, we should have something like :
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 4 00 256 7549730527 7549730782 << first sector is a multiple of 16, 256 works for any LUN blocksize defined on the appliance
8 11 00 7549730783 16384 7549747166
-
- /kernel/drv/sd.conf and ssd.conf files should contain a blocksize value matching with the ZFS appliance LUN blocksize (here 8KB). This workaround can be implemented since S10u11 (142910-17: SunOS 5.10_x86, 142909-17: SunOS 5.10 sparc) and S11. A reboot is needed to take effect.
This change allows sd/ssd to override the disk's advertised physical block size for a given VID/PID tuple. The overridden value can then be used by other consumers like ZFS to determine the appropriate ashift value.
sd.conf : sd-config-list="SUN ZFS Storage 7120", "physical-block-size:8192";
Vendor Product <<< Vendor has to be with 8 chars, information captured from format -e + inq
01234567
ssd.conf : ssd-config-list="SUN ZFS Storage 7120", "physical-block-size:8192";
Vendor Product <<< Vendor has to be with 8 chars, information captured from format -e + inq
01234567
if different devices have to be used, do as follows (comma used between entries) :
sd.conf : sd-config-list="SUN ZFS Storage 7120", "physical-block-size:8192", "SUN ZFS Storage 7320", "physical-block-size:4096";
ssd.conf : ssd-config-list="SUN ZFS Storage 7120", "physical-block-size:8192", "SUN ZFS Storage 7320", "physical-block-size:4096";
-
- from the client side, the VID/PID tuple can be retrieved thanks to the 'format -e command' :
# format -e
[select the LUN]
format> inq
Vendor: SUN
Product: ZFS Storage 7120
Revision: 1.0
if different devices have to be used, do as follows (comma used between entries) :
sd.conf : sd-config-list="SUN ZFS Storage 7120", "physical-block-size:8192", "SUN ZFS Storage 7320", "physical-block-size:4096";
ssd.conf : ssd-config-list="SUN ZFS Storage 7120", "physical-block-size:8192", "SUN ZFS Storage 7320", "physical-block-size:4096";
Important notes :
- the client must be rebooted to make the sd/ssd driver aware of the change
- the pool must be recreated for this tuning to take effect
- one may need to use several LUNs from the ZFSSA to create filesystems on top of them (with different
recordsize). This is typical for database where redo may be configured with a recordsize of 1MB and datafiles with 8KB.
In a such condition, it is recommended to use volblocksize of 8KB for the ZFSSA LUNs and set the needed recordsize
for the ZFS filesystems on client side.
For more details, see
- bug Bug 15623912Bug 15623912Bug 15623912bbu15623912 Bug 15623912Bug 15623912Bug 15623912Bug 15623912Bug 15623912Bug 156239 : For 4k sector support, ZFS needs to use DKIOCGMEDIAINFOEXT
- bug 15619680 : comstar exported lun geometry should have cylinders aligned on exported block device (still not fixed on 30-Sep-2015).
- PSARC-2008-769 : http://psarc.us.oracle.com/arc/PSARC/2008/769/final_spec.txt
- As per http://www.c0t0d0s0.org/archives/7370-A-little-change-of-queues.html, we are limited to 10 IOs in the pending queue for the only LUN used in 'data' share.
We could add a LUN to get a maximum of 20 outstanding IOs, but we can increase the maximum of outstanding IOs for ZFS as well. Note that increasing the number of possible outstanding IOs to 35 is good for a throughput workload but may not help if latency is a concern.
host # echo zfs_vdev_max_pending/W0t35 | mdb –kw
host # echo "set zfs:zfs_vdev_max_pending=35" >> /etc/system
- Using these changes, we are able to sustain good throughput with excellent latencies. Instead of 'dd' performed in parallel, use a benchmark tool like filebench.
See http://sourceforge.net/apps/mediawiki/filebench/index.php?title=Filebench.
Results for a 'fileserver' workload with 8KB blocksize run against 'data' filesystem : 286MB/s for both IO read and writes, with a 14.8ms latency
statfile1 1051ops/s 0.0mb/s 0.5ms/op 385us/op-cpu
deletefile1 1051ops/s 0.0mb/s 4.1ms/op 880us/op-cpu
closefile3 1051ops/s 0.0mb/s 0.0ms/op 12us/op-cpu
readfile1 1051ops/s 142.1mb/s 1.3ms/op 819us/op-cpu
openfile2 1051ops/s 0.0mb/s 0.5ms/op 410us/op-cpu
closefile2 1051ops/s 0.0mb/s 0.0ms/op 12us/op-cpu
appendfilerand1 1051ops/s 8.3mb/s 2.4ms/op 446us/op-cpu
openfile1 1052ops/s 0.0mb/s 0.6ms/op 420us/op-cpu
closefile1 1052ops/s 0.0mb/s 0.0ms/op 13us/op-cpu
wrtfile1 1052ops/s 136.1mb/s 30.5ms/op 3910us/op-cpu
createfile1 1052ops/s 0.0mb/s 4.6ms/op 1493us/op-cpu
2492: 109.065:
IO Summary: 694029 ops, 11566.6 ops/s, (1051/2103 r/w) 286.5mb/s, 3379us cpu/op, 14.8ms latency
References
<NOTE:1392492.1> - Oracle ZFS Storage Appliance: Performance Issue when Pool is almost Full
<NOTE:1213714.1> - Sun ZFS Storage Appliance: Performance clues and considerations
Attachments
This solution has no attachment