Asset ID: |
1-71-2157669.1 |
Update Date: | 2018-01-05 |
Keywords: | |
Solution Type
Technical Instruction Sure
Solution
2157669.1
:
Oracle ZFS Storage Appliance: How to avoid mis-alignment when using ZFS on LDOM clients
Related Items |
- Sun ZFS Storage 7320
- Oracle ZFS Storage Appliance Racked System ZS4-4
- Oracle ZFS Storage ZS3-BA
- Oracle ZFS Storage ZS5-4
- Oracle ZFS Storage ZS3-2
- Oracle ZFS Storage ZS3-4
- Oracle ZFS Storage ZS5-2
- Sun ZFS Storage 7420
- Oracle ZFS Storage ZS4-4
- Sun ZFS Storage 7120
|
Related Categories |
- PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
- Tools>Primary Use>Performance
|
In this Document
Applies to:
Oracle ZFS Storage ZS4-4 - Version All Versions to All Versions [Release All Releases]
Oracle ZFS Storage ZS3-2 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7320 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7420 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7120 - Version All Versions to All Versions [Release All Releases]
7000 Appliance OS (Fishworks)
Goal
This document provides the steps required to ensure there is no mis-alignment when LDOMs use ZFS on top of ZFSSA LUNs.
The procedure overview is as follows :
- ZFSSA : Create a LUN with a volblocksize of 8KB (default)
- Control domain : Make the LUN recognized
- Control domain : Use format to configure a simple disk layout
- Control domain : Assign the c0txd0s2 partition to an LDOM
- LDOM : Login to the LDOM and list the disks via format
- LDOM : Retrieve the vdc instance of the device(s) you want to use for a ZFS pool
- LDOM : Edit the /platform/sun4v/kernel/drv/vdc.conf
- LDOM : Reboot the LDOM
- LDOM : Create the pool
- LDOM : Create a ZFS filesystem and confirm correct filesystem operation
Misalignment can be expensive in term of extra IOs done on the ZFSSA side. For more details, see http://blogs.oracle.com/dlutz/entry/partition_alignment_guidelines_for_unified.
Since Solaris 11.1.11.4.0, it becomes a little easier to remove mis-alignment issues when LDOMs use ZFS on top of LUNS exported by a ZFSSA.
The enhancement comes from : ER 15824910 - Add support to configure vdisk physical block size in vdc.conf
Previously, the workaround was to create the zpool on the control domain with the original ssd-config-list entry for ZFSSA in ssd.conf and export zfs volume dataset to the guest domain.
Solution
- ZFSSA : Create a LUN with a volblocksize of 8KB (default) :
ZFSSA:shares (pool1) test/TEST_LUN> show
Properties:
checksum = fletcher4 (inherited)
compression = off (inherited)
dedup = false (inherited)
compressratio = 100
copies = 1 (inherited)
creation = Mon Jul 04 2016 14:04:12 GMT+0000 (UTC)
logbias = latency (inherited)
volblocksize = 8K
lunguid = 600144F087021A700000577A6CE20001
volsize = 50G
encryption = off
canonical_name = pool1/local/test/TEST_LUN
[..]
- Control domain : Make the LUN recognized :
root@T5-4 # echo | format
63. c0t600144F087021A700000577A6CE20001d0 <SUN-ZFS Storage 7330-1.0 cyl 1623 alt 2 hd 254 sec 254>
/scsi_vhci/ssd@g600144f087021a700000577a6ce20001
64. c0t600144F087021A700000555724AA003Cd0 <SUN-ZFS Storage 7330-1.0-300.00GB>
/scsi_vhci/ssd@g600144f087021a700000555724aa003c
- Control domain : Use format to configure a simple disk layout :
root@T5-4 # format
[..]
partition> p
Current partition table (unnamed):
Total disk cylinders available: 1623 + 2 (reserved cylinders)
Part Tag Flag Cylinders Size Blocks
0 unassigned wm 0 0 (0/0/0) 0
1 unassigned wu 0 0 (0/0/0) 0
2 backup wu 0 - 1622 49.93GB (1623/0/0) 104709468
3 unassigned wm 0 0 (0/0/0) 0
4 unassigned wm 0 0 (0/0/0) 0
5 unassigned wm 0 0 (0/0/0) 0
6 unassigned wm 0 0 (0/0/0) 0
7 unassigned wm 0 0 (0/0/0) 0
partition> label
Ready to label disk, continue? y
There is really nothing to be done on sd/ssd.conf files.
- Control domain : Assign the c0txd0s2 partition to an LDOM :
root@T5-4 # ldm add-vdsdev /dev/dsk/c0t600144F087021A700000577A6CE20001d0s2 TEST_LUN@primary-vds0
root@T5-4 # ldm list
NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME
primary active -n-cv- UART 64 64G 11% 10% 1d 17h 59m
LDOM1 active -n---- 5005 32 32G 0.0% 0.0% 1d 17h 59m
root@T5-4 # ldm add-vdisk pdisk TEST_LUN TEST_LUN@primary-vds0 LDOM1
root@T5-4 # ldm list-bindings ASR-Manager
NAME STATE FLAGS CONS VCPU MEMORY UTIL NORM UPTIME
LDOM1 active -n---- 5005 32 32G 0.3% 0.3% 1d 18h 1m
[..]
DISK
NAME VOLUME TOUT ID DEVICE SERVER MPGROUP
vdisk0 electra_auth_boot_disk@primary-vds0 0 disk@0 primary
TEST_LUN TEST_LUN@primary-vds0 1 disk@1 primary
- LDOM : Login to the LDOM and list the disks via format :
root@T5-4 # telnet 0 5005
Trying 0.0.0.0...
Connected to 0.
Escape character is '^]'.
root@LDOM1 # echo | format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c2d0 <Unknown-Unknown-0001-100.00GB>
/virtual-devices@100/channel-devices@200/disk@0
1. c2d1 <SUN-ZFS Storage 7330-1.0 cyl 1623 alt 2 hd 254 sec 254>
/virtual-devices@100/channel-devices@200/disk@1
- LDOM : Retrieve the vdc instance of the device(s) you want to use for a ZFS pool :
root@LDOM1 # cat /etc/path_to_inst
#
# Caution! This file contains critical kernel state
#
"/fcoe" 0 "fcoe"
"/iscsi" 0 "iscsi"
"/pseudo" 0 "pseudo"
"/scsi_vhci" 0 "scsi_vhci"
"/options" 0 "options"
"/virtual-devices@100" 0 "vnex"
"/virtual-devices@100/channel-devices@200" 0 "cnex"
"/virtual-devices@100/channel-devices@200/disk@0" 0 "vdc"
"/virtual-devices@100/channel-devices@200/pciv-communication@0" 0 "vpci"
"/virtual-devices@100/channel-devices@200/network@0" 0 "vnet"
"/virtual-devices@100/channel-devices@200/network@1" 1 "vnet"
"/virtual-devices@100/channel-devices@200/network@2" 2 "vnet"
"/virtual-devices@100/channel-devices@200/network@3" 3 "vnet"
"/virtual-devices@100/channel-devices@200/disk@1" 1 "vdc" << We want this one
- LDOM : Edit the /platform/sun4v/kernel/drv/vdc.conf :
block-size-list="1:8192";
Multiple instances can be included, if needed. Let's say we have an /etc/path_to_inst as follows and we want to take care of instances 1,2 and 5 :
"/virtual-devices@100/channel-devices@200/disk@1" 1 "vdc"
"/virtual-devices@100/channel-devices@200/disk@2" 2 "vdc"
"/virtual-devices@100/channel-devices@200/disk@3" 3 "vdc"
"/virtual-devices@100/channel-devices@200/disk@4" 4 "vdc"
"/virtual-devices@100/channel-devices@200/disk@5" 5 "vdc"
"/virtual-devices@100/channel-devices@200/disk@6" 6 "vdc"
The final, vdc.conf file, should contain this :
block-size-list="1:8192","2:8192","5:8192";
- LDOM : Reboot the LDOM.
- LDOM : Create the pool :
root@LDOM1 # zpool create mypool
Note that the zpool command will make slice0 start at sector 256
root@LDOM1 # prtvtoc /dev/rdsk/c2d1s2
* /dev/rdsk/c2d1s2 partition map
*
* Dimensions:
* 512 bytes/sector
* 104857600 sectors
* 104857533 accessible sectors
*
* Flags:
* 1: unmountable
* 10: read-only
*
* Unallocated space:
* First Sector Last
* Sector Count Sector
* 34 222 255
*
* First Sector Last
* Partition Tag Flags Sector Count Sector Mount Directory
0 4 00 256 104840927 104841182
8 11 00 104841183 16384 104857566
- LDOM : Create a ZFS filesystem and confirm correct filesystem operation :
root@LDOM1 # zfs create mypool/test1
root@LDOM1 # cd /mypool/test1
root@LDOM1 # df -h .
Filesystem Size Used Available Capacity Mounted on
mypool/test1 49G 288K 49G 1% /mypool/test1
root@LDOM1 # if dd if=/dev/zero of=bigfile bs=8k count=100000
An extra step would be to use "volalign_ak8" on the ZFSSA side to confirm there is no mis-alignment anymore (See https://pae.us.oracle.com/twiki/bin/view/Public/ZFSPerformanceDiagnosis).The test/TEST_LUN should not be reported.
Note :
Mixing usage of ssd-config-list in the ssd.conf file (control domain) and vdc.conf (LDOM) with not match to the ZFSSA LUN volblocksize may lead to issues to create pools on LDOM side.
If in ssd.conf of a control domain you have a block size (let's say for example 32k) bigger than the volblocksize of a ZFSSA LUN (let's say it uses 8k) and in LDOM we don't have the corresponding line in vdc.conf for that vdisk, we end up with weird corruption. We have seen that uberblock array get's corrupted, since we write with bigger blocks that contain only one valid uberblock. In the example, we have only 4 valid uberblocks (128k/32k).
When using vdc.conf, please avoid using ssd-config-list parameter of the ssd.conf file located in the control domain.
References
<NOTE:2036559.1> - Oracle ZFS Storage Appliance: How to avoid misalignment when using ASM on Solaris clients
<NOTE:1507737.1> - Sun Storage 7000 Unified Storage System: Tuning Solaris hosts using ZFS filesystem (exported from 7000 Series NAS)
Attachments
This solution has no attachment