Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2157669.1
Update Date:2018-01-05
Keywords:

Solution Type  Technical Instruction Sure

Solution  2157669.1 :   Oracle ZFS Storage Appliance: How to avoid mis-alignment when using ZFS on LDOM clients  


Related Items
  • Sun ZFS Storage 7320
  •  
  • Oracle ZFS Storage Appliance Racked System ZS4-4
  •  
  • Oracle ZFS Storage ZS3-BA
  •  
  • Oracle ZFS Storage ZS5-4
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Oracle ZFS Storage ZS5-2
  •  
  • Sun ZFS Storage 7420
  •  
  • Oracle ZFS Storage ZS4-4
  •  
  • Sun ZFS Storage 7120
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  
  • Tools>Primary Use>Performance
  •  




In this Document
Goal
Solution
References


Applies to:

Oracle ZFS Storage ZS4-4 - Version All Versions to All Versions [Release All Releases]
Oracle ZFS Storage ZS3-2 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7320 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7420 - Version All Versions to All Versions [Release All Releases]
Sun ZFS Storage 7120 - Version All Versions to All Versions [Release All Releases]
7000 Appliance OS (Fishworks)

Goal

This document provides the steps required to ensure there is no mis-alignment when LDOMs use ZFS on top of ZFSSA LUNs.

The procedure overview is as follows :

  1.  ZFSSA : Create a LUN with a volblocksize of 8KB (default)
  2.  Control domain : Make the LUN recognized
  3.  Control domain : Use format to configure a simple disk layout
  4.  Control domain : Assign the c0txd0s2 partition to an LDOM
  5.  LDOM : Login to the LDOM and list the disks via format
  6.  LDOM : Retrieve the vdc instance of the device(s) you want to use for a ZFS pool
  7.  LDOM : Edit the /platform/sun4v/kernel/drv/vdc.conf
  8.  LDOM : Reboot the LDOM
  9.  LDOM : Create the pool
  10.  LDOM : Create a ZFS filesystem and confirm correct filesystem operation

 

Misalignment can be expensive in term of extra IOs done on the ZFSSA side. For more details, see http://blogs.oracle.com/dlutz/entry/partition_alignment_guidelines_for_unified.

Since Solaris 11.1.11.4.0, it becomes a little easier to remove mis-alignment issues when LDOMs use ZFS on top of LUNS exported by a ZFSSA.

The enhancement comes from : ER 15824910 - Add support to configure vdisk physical block size in vdc.conf

Previously, the workaround was to create the zpool on the control domain with the original ssd-config-list entry for ZFSSA in ssd.conf and export zfs volume dataset to the guest domain.

 

Solution

  1.  ZFSSA : Create a LUN with a volblocksize of 8KB (default) :

    ZFSSA:shares (pool1) test/TEST_LUN> show
    Properties:
          checksum = fletcher4 (inherited)
       compression = off (inherited)
             dedup = false (inherited)
     compressratio = 100
            copies = 1 (inherited)
          creation = Mon Jul 04 2016 14:04:12 GMT+0000 (UTC)
           logbias = latency (inherited)
      volblocksize = 8K
           lunguid = 600144F087021A700000577A6CE20001
           volsize = 50G
        encryption = off
    canonical_name = pool1/local/test/TEST_LUN
    [..]
     

  2.  Control domain : Make the LUN recognized :

    root@T5-4 # echo | format
        63. c0t600144F087021A700000577A6CE20001d0 <SUN-ZFS Storage 7330-1.0 cyl 1623 alt 2 hd 254 sec 254>
            /scsi_vhci/ssd@g600144f087021a700000577a6ce20001
        64. c0t600144F087021A700000555724AA003Cd0 <SUN-ZFS Storage 7330-1.0-300.00GB>
            /scsi_vhci/ssd@g600144f087021a700000555724aa003c
     

  3.  Control domain : Use format to configure a simple disk layout :

    root@T5-4 # format
    [..]
    partition> p
    Current partition table (unnamed):
    Total disk cylinders available: 1623 + 2 (reserved cylinders)

    Part Tag        Flag Cylinders Size      Blocks
       0 unassigned wm   0         0         (0/0/0) 0
       1 unassigned wu   0         0         (0/0/0) 0
       2 backup     wu   0 - 1622 49.93GB    (1623/0/0) 104709468
       3 unassigned wm   0         0         (0/0/0) 0
       4 unassigned wm   0         0         (0/0/0) 0
       5 unassigned wm   0         0         (0/0/0) 0
       6 unassigned wm   0         0         (0/0/0) 0
       7 unassigned wm   0         0         (0/0/0) 0

    partition> label
    Ready to label disk, continue? y


    There is really nothing to be done on sd/ssd.conf files.

  4.  Control domain : Assign the c0txd0s2 partition to an LDOM :

    root@T5-4 # ldm add-vdsdev /dev/dsk/c0t600144F087021A700000577A6CE20001d0s2 TEST_LUN@primary-vds0
    root@T5-4 # ldm list
    NAME    STATE  FLAGS  CONS VCPU MEMORY UTIL NORM UPTIME
    primary active -n-cv- UART 64   64G    11%   10% 1d 17h 59m
    LDOM1   active -n---- 5005 32   32G    0.0% 0.0% 1d 17h 59m

    root@T5-4 # ldm add-vdisk pdisk TEST_LUN TEST_LUN@primary-vds0 LDOM1

    root@T5-4 # ldm list-bindings ASR-Manager
    NAME  STATE  FLAGS  CONS VCPU MEMORY UTIL NORM UPTIME
    LDOM1 active -n---- 5005 32   32G    0.3% 0.3% 1d 18h 1m
    [..]
    DISK
    NAME     VOLUME                              TOUT ID DEVICE SERVER MPGROUP
    vdisk0   electra_auth_boot_disk@primary-vds0 0       disk@0 primary
    TEST_LUN TEST_LUN@primary-vds0               1       disk@1 primary



  5.  LDOM : Login to the LDOM and list the disks via format :

    root@T5-4 # telnet 0 5005
    Trying 0.0.0.0...
    Connected to 0.
    Escape character is '^]'.

    root@LDOM1 # echo | format
    Searching for disks...done

    AVAILABLE DISK SELECTIONS:
      0. c2d0 <Unknown-Unknown-0001-100.00GB>
         /virtual-devices@100/channel-devices@200/disk@0
      1. c2d1 <SUN-ZFS Storage 7330-1.0 cyl 1623 alt 2 hd 254 sec 254>
         /virtual-devices@100/channel-devices@200/disk@1

     

  6.  LDOM : Retrieve the vdc instance of the device(s) you want to use for a ZFS pool :

    root@LDOM1 # cat /etc/path_to_inst
    #
    # Caution! This file contains critical kernel state
    #
    "/fcoe" 0 "fcoe"
    "/iscsi" 0 "iscsi"
    "/pseudo" 0 "pseudo"
    "/scsi_vhci" 0 "scsi_vhci"
    "/options" 0 "options"
    "/virtual-devices@100" 0 "vnex"
    "/virtual-devices@100/channel-devices@200" 0 "cnex"
    "/virtual-devices@100/channel-devices@200/disk@0" 0 "vdc"
    "/virtual-devices@100/channel-devices@200/pciv-communication@0" 0 "vpci"
    "/virtual-devices@100/channel-devices@200/network@0" 0 "vnet"
    "/virtual-devices@100/channel-devices@200/network@1" 1 "vnet"
    "/virtual-devices@100/channel-devices@200/network@2" 2 "vnet"
    "/virtual-devices@100/channel-devices@200/network@3" 3 "vnet"
    "/virtual-devices@100/channel-devices@200/disk@1" 1 "vdc" << We want this one
     

  7.  LDOM : Edit the /platform/sun4v/kernel/drv/vdc.conf :

    block-size-list="1:8192";

    Multiple instances can be included, if needed. Let's say we have an /etc/path_to_inst as follows and we want to take care of instances 1,2 and 5 :

    "/virtual-devices@100/channel-devices@200/disk@1" 1 "vdc"
    "/virtual-devices@100/channel-devices@200/disk@2" 2 "vdc"
    "/virtual-devices@100/channel-devices@200/disk@3" 3 "vdc"
    "/virtual-devices@100/channel-devices@200/disk@4" 4 "vdc"
    "/virtual-devices@100/channel-devices@200/disk@5" 5 "vdc"
    "/virtual-devices@100/channel-devices@200/disk@6" 6 "vdc"

    The final, vdc.conf file, should contain this :

    block-size-list="1:8192","2:8192","5:8192";
     

  8.  LDOM : Reboot the LDOM.

  9.  LDOM : Create the pool :

    root@LDOM1 # zpool create mypool 

    Note that the zpool command will make slice0 start at sector 256

    root@LDOM1 # prtvtoc /dev/rdsk/c2d1s2

    * /dev/rdsk/c2d1s2 partition map
    *
    * Dimensions:
    * 512 bytes/sector
    * 104857600 sectors
    * 104857533 accessible sectors
    *
    * Flags:
    * 1: unmountable
    * 10: read-only
    *
    * Unallocated space:
    * First Sector Last
    * Sector Count Sector
    * 34 222 255
    *
    * First Sector Last
    * Partition Tag Flags Sector    Count     Sector     Mount Directory
      0         4   00    256       104840927 104841182
      8         11  00    104841183 16384     104857566



  10.  LDOM : Create a ZFS filesystem and confirm correct filesystem operation :

    root@LDOM1 # zfs create mypool/test1
    root@LDOM1 # cd /mypool/test1
    root@LDOM1 # df -h .
    Filesystem Size Used Available Capacity Mounted on
    mypool/test1 49G 288K 49G 1% /mypool/test1

    root@LDOM1 # if dd if=/dev/zero of=bigfile bs=8k count=100000

     

An extra step would be to use "volalign_ak8" on the ZFSSA side to confirm there is no mis-alignment anymore (See https://pae.us.oracle.com/twiki/bin/view/Public/ZFSPerformanceDiagnosis).The test/TEST_LUN should not be reported.

 

Note :
Mixing usage of ssd-config-list in the ssd.conf file (control domain) and vdc.conf (LDOM) with not match to the ZFSSA LUN volblocksize may lead to issues to create pools on LDOM side.
If in ssd.conf of a control domain you have a block size (let's say for example 32k) bigger than the volblocksize of a ZFSSA LUN (let's say it uses 8k) and in LDOM we don't have the corresponding line in vdc.conf for that vdisk, we end up with weird corruption. We have seen that uberblock array get's corrupted, since we write with bigger blocks that contain only one valid uberblock. In the example, we have only 4 valid uberblocks (128k/32k).
When using vdc.conf, please avoid using ssd-config-list parameter of the ssd.conf file located in the control domain.

 

References

<NOTE:2036559.1> - Oracle ZFS Storage Appliance: How to avoid misalignment when using ASM on Solaris clients
<NOTE:1507737.1> - Sun Storage 7000 Unified Storage System: Tuning Solaris hosts using ZFS filesystem (exported from 7000 Series NAS)

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback