Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1507737.1
Update Date:2018-01-05
Keywords:

Solution Type  Problem Resolution Sure

Solution  1507737.1 :   Sun Storage 7000 Unified Storage System: Tuning Solaris hosts using ZFS filesystem (exported from 7000 Series NAS)  


Related Items
  • Sun ZFS Storage 7420
  •  
  • Sun Storage 7410 Unified Storage System
  •  
  • Sun ZFS Storage 7120
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Sun ZFS Storage 7320
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-6337737318>

Applies to:

Sun ZFS Storage 7420 - Version All Versions and later
Sun ZFS Storage 7120 - Version All Versions and later
Sun Storage 7410 Unified Storage System - Version All Versions and later
Sun Storage 7310 Unified Storage System - Version All Versions and later
Sun ZFS Storage 7320 - Version All Versions and later
7000 Appliance OS (Fishworks)

Symptoms

A solaris host uses ZFS filesystem. Only 1 LUN is used from the ZFS appliance. This LUN has a volblocksize of 8KB (defined on the ZFS appliance). The data filesystem has a recordsize of 128KB.

 host # zpool status
   pool: data
  state: ONLINE
   scan: none requested
 config:

       NAME                     STATE     READ WRITE CKSUM
       data                     ONLINE       0     0     0
         c5t21000024FF2D607Fd2  ONLINE       0     0     0 <<< LUN exported by a 7120 (zfs appliance)

Performance is not as expected. We cannot sustain more than 50 MB/s for IO writes.

Cause

  • Partition alignment is really important to get correct performance. This has to be checked on client side. Solaris hosts using ZFS will have to force alignment using modifications to /kernel/drv/sd.conf and/or ssd.conf.
  • Limited number of maximum outstanding IOs for a LUN. This can be changed dynamically using kernel tuning.

Solution

Here are some recommendations to get the good performance. A filebench result is provided as well.

  • Use only 2 pools on the 7120, so that each pool will be able to use 2 logzilla (SSD for IO writes) in a stripe layout. Then, each pool can sustain up to 200MB/s for IO writes
  • On the client side, always check for good alignment.
    • For LUNs with an EFI label, we should have something like :
  * First                          Sector     Last
  * Partition Tag Flags     Sector Count      Sector     Mount Directory
    0           4    00        256 7549730527 7549730782 << first sector is a multiple of 16, 256 works for any LUN blocksize defined on the appliance
    8          11    00 7549730783 16384      7549747166 
    • /kernel/drv/sd.conf and ssd.conf files should contain a blocksize value matching with the ZFS appliance LUN blocksize (here 8KB). This workaround can be implemented since S10u11 (142910-17: SunOS 5.10_x86, 142909-17: SunOS 5.10 sparc) and S11. A reboot is needed to take effect.
      This change allows sd/ssd to override the disk's advertised physical block size for a given VID/PID tuple. The overridden value can then be used by other consumers like ZFS to determine the appropriate ashift value.
  sd.conf  :  sd-config-list="SUN     ZFS Storage 7120", "physical-block-size:8192";
                              Vendor  Product          <<< Vendor has to be with 8 chars, information captured from format -e + inq
                              01234567
  ssd.conf : ssd-config-list="SUN     ZFS Storage 7120", "physical-block-size:8192";
                              Vendor  Product          <<< Vendor has to be with 8 chars, information captured from format -e + inq
                              01234567
if different devices have to be used, do as follows (comma used between entries) :

  sd.conf  :  sd-config-list="SUN     ZFS Storage 7120", "physical-block-size:8192", "SUN     ZFS Storage 7320", "physical-block-size:4096";
  ssd.conf : ssd-config-list="SUN     ZFS Storage 7120", "physical-block-size:8192", "SUN     ZFS Storage 7320", "physical-block-size:4096";
    •  from the client side, the VID/PID tuple can be retrieved thanks to the 'format -e command' :
 # format -e
 [select the LUN]
 format> inq
   Vendor: SUN
   Product: ZFS Storage 7120
   Revision: 1.0

if different devices have to be used, do as follows (comma used between entries) :

  sd.conf  :  sd-config-list="SUN     ZFS Storage 7120", "physical-block-size:8192", "SUN     ZFS Storage 7320", "physical-block-size:4096";
  ssd.conf : ssd-config-list="SUN     ZFS Storage 7120", "physical-block-size:8192", "SUN     ZFS Storage 7320", "physical-block-size:4096";

Important notes :
- the client must be rebooted to make the sd/ssd driver aware of the change
- the pool must be recreated for this tuning to take effect
- one may need to use several LUNs from the ZFSSA to create filesystems on top of them (with different
  recordsize). This is typical for database where redo may be configured with a recordsize of 1MB and datafiles with 8KB.
  In a such condition, it is recommended to use volblocksize of 8KB for the ZFSSA LUNs and set the needed recordsize
  for the ZFS filesystems on client side.

 

For more details, see
- bug Bug 15623912Bug 15623912Bug 15623912bbu15623912 Bug 15623912Bug 15623912Bug 15623912Bug 15623912Bug 15623912Bug 156239 : For 4k sector support, ZFS needs to use DKIOCGMEDIAINFOEXT
- bug 15619680 : comstar exported lun geometry should have cylinders aligned on exported block device (still not fixed on 30-Sep-2015).
- PSARC-2008-769 : http://psarc.us.oracle.com/arc/PSARC/2008/769/final_spec.txt

  • As per http://www.c0t0d0s0.org/archives/7370-A-little-change-of-queues.html, we are limited to 10 IOs in the pending queue for the only LUN used in 'data' share.
    We could add a LUN to get a maximum of 20 outstanding IOs, but we can increase the maximum of outstanding IOs for ZFS as well. Note that increasing the number of possible outstanding IOs to 35 is good for a throughput workload but may not help if latency is a concern.
 host # echo zfs_vdev_max_pending/W0t35 | mdb –kw
 host # echo "set zfs:zfs_vdev_max_pending=35" >> /etc/system
  • Using these changes, we are able to sustain good throughput with excellent latencies. Instead of 'dd' performed in parallel, use a benchmark tool like filebench.
    See http://sourceforge.net/apps/mediawiki/filebench/index.php?title=Filebench.

    Results for a 'fileserver' workload with 8KB blocksize run against 'data' filesystem : 286MB/s for both IO read and writes, with a 14.8ms latency

       statfile1                1051ops/s   0.0mb/s      0.5ms/op      385us/op-cpu
       deletefile1              1051ops/s   0.0mb/s      4.1ms/op      880us/op-cpu
       closefile3               1051ops/s   0.0mb/s      0.0ms/op       12us/op-cpu
       readfile1                1051ops/s 142.1mb/s      1.3ms/op      819us/op-cpu
       openfile2                1051ops/s   0.0mb/s      0.5ms/op      410us/op-cpu
       closefile2               1051ops/s   0.0mb/s      0.0ms/op       12us/op-cpu
       appendfilerand1          1051ops/s   8.3mb/s      2.4ms/op      446us/op-cpu
       openfile1                1052ops/s   0.0mb/s      0.6ms/op      420us/op-cpu
       closefile1               1052ops/s   0.0mb/s      0.0ms/op       13us/op-cpu
       wrtfile1                 1052ops/s 136.1mb/s     30.5ms/op     3910us/op-cpu
       createfile1              1052ops/s   0.0mb/s      4.6ms/op     1493us/op-cpu 

       2492: 109.065:
       IO Summary:      694029 ops, 11566.6 ops/s, (1051/2103 r/w) 286.5mb/s,   3379us cpu/op,  14.8ms latency

References

<NOTE:1392492.1> - Oracle ZFS Storage Appliance: Performance Issue when Pool is almost Full
<NOTE:1213714.1> - Sun ZFS Storage Appliance: Performance clues and considerations

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback