Oracle ZFS Storage Appliance: Best Practices when using the ZFS-SA for Databases

Asset ID:	1-72-2079993.1
Update Date:	2018-04-27
Keywords:

Solution Type Problem Resolution Sure

Solution 2079993.1 : Oracle ZFS Storage Appliance: Best Practices when using the ZFS-SA for Databases

Applies to:

Sun ZFS Storage 7420 - Version All Versions to All Versions [Release All Releases]
Oracle ZFS Storage ZS4-4 - Version All Versions to All Versions [Release All Releases]
Oracle ZFS Storage ZS3-BA - Version All Versions to All Versions [Release All Releases]
Oracle ZFS Storage ZS3-2 - Version All Versions to All Versions [Release All Releases]
Oracle ZFS Storage ZS3-4 - Version All Versions to All Versions [Release All Releases]
7000 Appliance OS (Fishworks)

Symptoms

The ZFS appliance is suited for any kind of workloads such as video streaming, backups, Virtual Machines or databases. This document describes the rules to observe to get the best possible performance when running databases on top of shares or volumes exported by the ZFS appliance.

Changes

Some performance issues can be reported on databases using shares or volumes exported by the ZFS appliance. This can happen some weeks after the initial usage, just because the load is getting higher.

Cause

To have correct performance when running databases onto shares or volumes exported by the ZFS appliance, some rules have to be followed.

The datasets logbias and recordsize/volblocksize settings have to be configured correctly, mis-alignment can cost extra I/O reads and writes causing the ZFS-SA SAS-2 drives to become overloaded.

To absorb redologs activity, SSD log devices (logzillas) will have to be added in the ZFS-SA pool.

For OLTP databases, some SSD cache devices (readzillas) will be of great benefit.

Solution

Pool layout

For optimal latencies, a mirror pool layout has to be used. This leads to an high number of spindles so that I/O parallelism can happen. The caveat is the pool size which is the half of the disks capacity. If pool size is problematic, a possible option is to use raidz1 but with less spindles (4 disks per spindle). The big advantage of the mirror pool layout is that I/O reads can happen in parallel on each sub-mirror for different data. This is something not possible for raidz1. Unless you have a big number of disks, do not use raidz2.
In order to absorb redologs activity, at least 1 SSD log devices must be used in a pool. For redundancy, use 2 SSD log devices per pool.
For OLTP databases, SSD cache devices will be of great benefit. Since 2013.1.2 release, data is evicted from DRAM (ARC or L1ARC) to SSD cache devices (L2ARC) only if ZFS detects it's random I/O and if I/O blocksize <= 32K. As the OLTP datafiles are often set to 8KB and most of the IOS are random, SSD cache devices will help to diminish the SAS2 drives activity. 2 SSD cache devices should be used per pool.

Datasets recordsize and logbias settings

The logbias and recordsize settings are very important as we need to match with the database workload and avoid extra I/Os.

For OLTP workload, the General Recommendation is as follows :

Oracle Files   Record Size Sync Write Bias Read Cache Compression Example Share Name Storage Profile
Datafiles      32K        latency                  all                   LZ4        data                       Mirrored
Temp                 128K        latency               none                 LZ4                      temp                      Mirrored
Archive Logs         1M           throughput            none                 LZ4                     archive         Mirrored
Undo                 128K               throughput           none                 LZ4                     undo                      Mirrored
Index                32K                latency                  all                    LZ4                     index                      Mirrored
Redo Logs       128K                latency                 none                 off                       reco                        Mirrored
Control Files       32K              latency                  all                     off                      control                     Mirrored

Note : The above table is for Generic Database workload. Custom settings provided by the Database Provide can over-ride these values.



The ZFS record size for the datafile shares should be aligned to match the average network IO size as closely as possible.
This may be significantly larger than db_block_size.
For OLTP workloads the general recommendation is 32K (see Document 2087231.1 : Guidelines When Using ZFS Storage in an Exadata Environment).

Since Oracle database 12c, OISP can be used to take care automatically of these settings via dNFS.
The only remaining (but strong) recommendation is to have a separate share for the redologs activity.
Make sure the OISP configuration is done before creating the database or else OISP will not work correctly.
See Document 1943618.1 : Oracle ZFS Storage Appliance: How to Enable Oracle Intelligent Storage Protocol (OISP).

Possible miss-alignments

As introduced in Document 1213714.1 : Sun ZFS Storage Appliance: Performance clues and considerations, the mis-alignment can be a big penalty. Let's say the LUN volblocksize has been set to 8KB, but we have a mis-alignment of 512 bytes (very common case). When an I/O write of 8KB comes from the database, it will lead to 2 I/O reads + 2 I/O writes. The cost is high and can lead to busy disks.

The following link gives much more details : https://blogs.oracle.com/dlutz/entry/partition_alignment_guidelines_for_unified. The following MOS docs can be of interest, especially when the host is running Solaris :

Document 1507737.1 : Tuning Solaris hosts using ZFS filesystem
Document 2036559.1 : How to avoid mis-alignment when using ASM on Solaris clients
Document 2157669.1 : How to avoid mis-alignment when using ZFS on LDOM clients

If you want to determine if you suffer from a mis-alignment issue, please raise a Service Request, as the support engineer needs to run some dtrace scripts from inside the Solaris shell of the ZFS appliance.

Use the volalign_ak8.d script available from the PAE web site : https://pae.us.oracle.com/twiki/bin/view/Public/ZFSPerformanceDiagnosis

Back to Document 1213714.1 : Sun ZFS Storage Appliance: Performance clues and considerations

References

<NOTE:1213714.1> - Sun ZFS Storage Appliance: Performance clues and considerations

Attachments

This solution has no attachment