Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1628999.1
Update Date:2018-02-08
Keywords:

Solution Type  Technical Instruction Sure

Solution  1628999.1 :   Oracle ZFS Storage Appliance: How to set up Client Multipathing  


Related Items
  • Sun ZFS Storage 7420
  •  
  • Oracle ZFS Storage ZS3-2
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Sun ZFS Storage 7320
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: 7xxx NAS
  •  




In this Document
Goal
Solution
 Configuring FC Client Multipathing
 Tasks
 Configuring Solaris Initiators
 Configuring Windows Initiators
 Windows Tunables - Microsoft DSM Details
 Configuring Linux Initiators
 Configuring Mac Initiators
 Configuring OVM
 Troubleshooting
 See Also
 Configuring iSCSI/iSER Client Multipathing
 Solaris iSCSI/iSER and MPxIO Considerations
 Configuring SRP Client Multipathing
 VMWare 4.0
 Path Selection Plugin (psp)
 Storage Array Type Plugin (satp)
 VMWare ESX 4.0 Issues


Applies to:

Oracle ZFS Storage ZS3-2 - Version All Versions and later
Sun ZFS Storage 7420 - Version All Versions and later
Sun ZFS Storage 7320 - Version All Versions and later
Oracle ZFS Storage ZS3-4 - Version All Versions and later
7000 Appliance OS (Fishworks)

Goal

This document provides the reader with instructions for configuring client multipathing for the Oracle ZFS Storage Appliance environment.

Solution

Configuring FC Client Multipathing

The Oracle ZFS Storage appliance uses Asymmetric Logical Unit Access (ALUA) to provide FC target multipathing support. Please refer to SCSI Primary Commands (SPC) definition on t10 at http://www.t10.org if you need more information.

 

The following instructions provide a guide for setting up the Fibre Channel (FC) host clients that are connected to a FC target enabled clustered appliance.

Tasks

Configuring Solaris Initiators

MPxIO is enabled on Solaris x86 platforms but disabled on SPARC by default. The mpathadm show LU command shows the path state changing from active to standby or standby to active. Alternately, you can also use luxadm display to show path state.


The stmsboot utility enables and disables MPxIO, for example:

  • To enable MPxIO, run  stmsboot -D fp -e
  • To disable MPxIO, run  stmsboot -D fp -d
  • To verify the state, run  mpathadm show LU

Configuring Windows Initiators

ALUA multipathing is supported by native Windows 2008/R2 and 2012 GA/R2 MPIO only.  2008 R2 is required to avoid issues during cluster failover and takeover.  2008 SP2 is not supported.

  1. Verify that the FC HBA Windows driver is installed and the HBA is operational.
  2. Install or verify installation of the Windows Server 2008/2012 MPIO Optional Component. Configure multipath support for the appliance by issuing the mpclaim.exe -r -i -a "" command at a Windows Command Prompt. This will force a system reboot and is necessary to complete MPIO setup and ensure proper path/LUN discovery.
  3. Once the client has rebooted, verify that Windows Client can discover and access appliance LUN(s) and the correct number of paths and path states are displayed. This can be verified using the Windows Disk Management utility. For each LUN on the appliance there should be only one corresponding disk available in the Disk Management GUI.
  4. In the event of a appliance node failure, the default Microsoft DSM timer counters may be insufficient to ensure I/O continues uninterrupted. To alleviate this, we recommend setting the following Timer Counter values in the DSM details section of a disks Multi-Path Disk Device properties.

Windows Tunables - Microsoft DSM Details

Windows TunableDescriptionDefault ValueRecommended Value
PathVerifyEnabled Enables path verification by MPIO on all paths every “N” seconds . N depends on the value set in PathVerificationPeriod. Disabled Enabled
PathVerificationPeriod Indicates the periodicity (in seconds) with which MPIO has been requested to perform path verification. This field is only used if PathVerifyEnabled = TRUE. 30 seconds 5 seconds
RetryInterval Specifies the interval of time after which a failed request is retried (after the DSM has decided so, and assuming that the IO has been retried less number of times than RetryCount). 1 second 5 seconds
RetryCount Specifies the number of times a failed IO occurs before the DSM determines that a failing request must be retried. 3 300
PDORemovePeriod Controls the amount of time (in seconds) that the multipath pseudo-LUN will continue to remain in system memory, even after losing all paths to the device. 20 seconds 1500 seconds


Errata: Emulex HBAs and Windows Server 2008/2012: When using a Windows Server 2008/2012 client equipped with Emulex HBAs, a change to the HBA driver parameter is required. In order to ensure uninterrupted I/O during a cluster failover/failback operation, you must modify the Emulex HBA NodeTimeout value, setting it to 0. Use the Emulex OCManager Utility, available from http://www.emulex.com to adjust this parameter.

Configuring Linux Initiators

The following instructions apply to these clients:

  • Oracle Enterprise Linux/Red Hat Enterprise Linux 5.4 (OEL 5.4)
  • Oracle Linux/Red Hat Enterprise Linux 5.5 (OL 5.5) and later
  • Oracle Linux/Red Hat Enterprise Linux 6.0 (OL 6.0) and later
  • Oracle Linux/Red Hat Enterprise Linux 7.0 (OL 7.0) and later
  • SUSE Linux 11 SP 1


1. Ensure the correct device-mappers are installed.

2. Stop the multipathd service.

# service multipathd stop
Stopping multipathd daemon: [ OK ]

3. Add the appropriate /etc/multipath.conf stanza shown below, replacing PRODUCT with the output of the following CLI script:

run('cd /');
run('configuration version');
printf(get('product').replace(/^Sun /,"") + '\n');

For example:

spadefish:> script
("." to run)> run('cd /');
("." to run)> run('configuration version');
("." to run)> printf(get('product').replace(/^Sun /,"") + '\n');
("." to run)> .
ZFS Storage 7420
or
ZFS Storage 7330
  • For OEL/RHEL 5.4:
 device 
 { 
    vendor                     "SUN" 
    product                    "ZFS Storage.*" 
    getuid_callout             "/sbin/scsi_id -g -u -s /block/%n" 
    prio_callout               "/sbin/mpath_prio_alua /dev/%n" 
    hardware_handler           "0" 
    path_grouping_policy       group_by_prio 
    failback                   immediate 
    no_path_retry              queue 
    rr_min_io                  100 
    path_checker               tur 
    rr_weight                  uniform 
 } 

  • For OL/RHEL 5.5 and later (dmmp 0.4.7):
 defaults {  
   find_multipaths             yes  
   user_friendly_names         yes  
 }  
 
 devices {  
   device {  
     vendor                    "SUN"  
     product                   "ZFS Storage.*"  
     getuid_callout            "/sbin/scsi_id -g -u -p 0x83 -s /block/%n"  
     prio_callout              "/sbin/mpath_prio_alua /dev/%n"  
     hardware_handler          "1 alua"  
     path_grouping_policy      group_by_prio  
     failback                  immediate  
     no_path_retry             600  
     rr_min_io                 100  
     path_checker              tur  
     rr_weight                 uniform
     features 		       "0"  
   }  
 }  
  • For OL/RHEL 5.5 and later (dmmp 0.4.9):
 defaults {  
   find_multipaths             yes  
   user_friendly_names         yes  
 }  
 
 devices {  
   device {  
     vendor                    "SUN"  
     product                   "ZFS Storage.*"  
     getuid_callout            "/sbin/scsi_id -g -u -p 0x83 -s /block/%n"  
     prio	               alua  
     hardware_handler          "1 alua"  
     path_grouping_policy      group_by_prio  
     failback                  immediate  
     no_path_retry             600  
     rr_min_io                 100  
     path_checker              tur  
     rr_weight                 uniform
     features 		       "0"  
   }  
 }
  • For OL 6.0/RHEL 6.0 and later:
defaults {  
   find_multipaths             yes  
   user_friendly_names         yes  
 }

devices {  
   device {  
     vendor                    "SUN"  
     product                   "ZFS Storage.*"  
     getuid_callout            "/lib/udev/scsi_id --page=0x83 --whitelisted --device=/dev/%n"  
     prio                      alua  
     hardware_handler          "1 alua"  
     path_grouping_policy      group_by_prio  
     failback                  immediate  
     no_path_retry             600  
     rr_min_io_rq              100  
     path_checker              tur  
     rr_weight                 uniform
     features 		       "0"  
   }  
 }  

  • For OL/RHEL 7.0 and later:
defaults {  
   find_multipaths             yes  
   user_friendly_names         yes  
 }

devices {  
   device {  
     vendor                    "SUN"  
     product                   "ZFS Storage.*"        
     prio                      alua  
     hardware_handler          "1 alua"  
     path_grouping_policy      group_by_prio  
     path_selector             "round-robin 0" 
     failback                  immediate
     no_path_retry             600  
     rr_min_io_rq              100  
     path_checker              tur  
     rr_weight                 uniform
     features 		       "0"  
   }  
 }
  • For SUSE Linux 11 SP 1:
 defaults { 
   multipath_tool              "/sbin/multipath -v0" 
   udev_dir                    /dev 
   polling_interval            100 
   default_selector            "round-robin 0" 
   default_path_grouping_policy  group_by_prio 
   default_getuid_callout      "/lib/udev/scsi_id -g -u -d /dev/%n" 
   default_prio_callout        "/bin/true" 
   prio                        "alua" 
   default_features            "0" 
   rr_min_io                   100 
   failback                    immediate 
   user_friendly_names         yes 
   path_checker                tur 
   no_path_retry               1000 
 } 

 device { 
   vendor                      "SUN" 
   product                    "ZFS Storage.*" 
 }
  • For SUSE Linux 12 SP 3:
defaults {
   find_multipaths             yes
   user_friendly_names         yes
 }
devices {
   device {
     vendor                    "Sun"
     product                   "ZFS Storage ####"
     getuid_callout            "/lib/udev/scsi_id -g -u --device=/dev/%n"
     prio                      alua
     features                  "1 queue_if_no_path"
     hardware_handler          "1 alua"
     path_grouping_policy      group_by_prio
     failback                  immediate
     no_path_retry             600
     path_selector             "round-robin 0"
     rr_weight                 uniform
     rr_min_io_rq              100
     path_checker              tur
   }
}

4. Enable multipath and verify by starting the multipathd service.

#service multipathd start
Starting multipathd daemon:                                [  OK  ]

5. Run the multipath command after the SCSI bus rescan is finished to verify multipath I/O is enabled. Note that standby paths will be shown as [failed][faulty] due to a known Linux bug. For this reason, it is recommended that users verify the paths are actually operational before putting the system into production.

For more details, refer to the Troubleshooting section below. and/or the Appliance Online Help wiki (https://<NAS_IP_ADDRESS>:215/wiki/index.php/Configuration:SAN:FC#Troubleshooting).

#multipath –ll
sdd: checker msg is "tur checker reports path is down"
mpath1 (3600144f094f0bd0300004b31c88f0001) dm-2 SUN,Sun Storage 7420
[size=20G][features=1 queue_if_no_path][hwhandler=0][rw]
\_ round-robin 0 [prio=50][active]
\_ 2:0:0:0 sdb 8:16 [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 2:0:1:0 sdd 8:48 [failed][faulty]

Linux SCSI bus scanning

In current versions of Oracle Linux and Red Hat Enterprise Linux, getting the system to recognize additions, modifications, and removals of logical units attached over a SAN can be tricky.

The tool most widely used and recommended for doing this is /usr/bin/rescan-scsi-bus.sh. When adding new Fibre Channel logical units or modifying existing logical units, (e.g., resizing), the following usage purports to apply the changes within the host's SCSI subsystem:

    # rescan-scsi-bus.sh -i

And when removing Fibre Channel logical units, the following usage purports to eject the device from the host's SCSI subsystem:

    # rescan-scsi-bus.sh -ir

However we have found these commands to be finicky at best, and at times they appear not to work at all. Red Hat's documentation states the following about the rescan-scsi-bus.sh utility at https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/logical-unit-add-remove.html:


When using the rescan-scsi-bus.sh script, take note of the following known issues:

  • In order for rescan-scsi-bus.sh to work properly, LUN0 must be the first mapped logical unit. The rescan-scsi-bus.sh can only detect the first mapped logical unit if it is LUN0. The rescan-scsi-bus.sh will not be able to scan any other logical unit unless it detects the first mapped logical unit even if you use the --nooptscan option.
  • A race condition requires that rescan-scsi-bus.sh be run twice if logical units are mapped for the first time. During the first scan, rescan-scsi-bus.sh only adds LUN0; all other logical units are added in the second scan.
  • A bug in the rescan-scsi-bus.sh script incorrectly executes the functionality for recognizing a change in logical unit size when the --remove option is used.
  • The rescan-scsi-bus.sh script does not recognize ISCSI logical unit removals.

In our testing we have found it necessary to run rescan-scsi-bus.sh twice as a matter of routine, not just when logical units are mapped for the first time.

It's very important to wait until the utility has finished working before repeating the scan, or doing anything else with the devices. You can observe the utility's progress by tailing the kernel messages buffer. Thus, our procedure for rescanning looks like this:

    # rescan-scsi-bus.sh -i
    [wait until scanning finishes]
    # rescan-scsi-bus.sh -i

As noted above, the -r switch is supposed to make the utility remove the devices, but we've found that it doesn't work with either iSCSI or Fibre Channel. The solution for device removal is to remove them manually using the sysfs "delete" controls. For example, if want to remove the Fibre Channel device whose node is /dev/sdc, we'd use the following command:

    # echo '1' > /sys/block/sdc/device/delete

You must do this for the device node associated with every path to the logical unit that is to be removed.

Also note the importance of following the proper procedures for removing devices. For Red Hat based systems, these procedures can be found in Red Hat's online Storage Administration guide. The page for RH 7 is at https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Storage_Administration_Guide/removing_devices.html.

Configuring Mac Initiators

Sample Output: NOTE: MUST BE RUN AS ROOT!!! If you're not logged is as root you need to do that before the command will work!!!

sh-3.2# mpioutil

mpioutil

Utility to manage multiple paths to logical units

Usage:  mpioutil <verb> <direct object> <options>, where <verb> is as follows:
  
verb:

       list

       info

           required arguments:

               --id <lun identifier> | -i <lun identifier>

       modify

           direct objects:

               lun

                  --id <lun identifier> | -i <lun identifier>

                  --algorithm <load balancing algorithm> | -a <load balancing algorithm>

                    RoundRobin, LeastIO, LeastBytes

                  optional arguments:

                         --batchCount <round robin batch count> | -b <round robin batch count>

               path

                  --lun <lun identifier> | -l <lun identifier>

                  --id <path identifier> | -i <path identifier>

                  --enable <enable path> | -e <enable path>

                  --disable <disable path> | -d <disable path>

Display list of attached LUNs:

sh-3.2# mpioutil list

Alias            Vendor           Product          LUN Identifier                      Algorithm              

1 No Alias       SUN            ZFS Storage 7420   600144F08E65154A00005390C7A80002    Least Bytes 

Display multipath information for a specific LUN, based on its LUN identifier # from the previous command:

sh-3.2# mpioutil info --id 600144F08E65154A00005390C7A80002   

Alias            Vendor           Product               LUN Identifier                      Algorithm              

1 No Alias         SUN            ZFS Storage 7420      600144F08E65154A00005390C7A80002    Least Bytes              

Access State         Path Identifier                Interface      
    
Standby              020100000002                   Fibre Channel    <---(these are the individual paths. Use these to determine what path is mapped to which head. Useful when failovers occur to verify pathing).   
    
Standby              030100000001                   Fibre Channel   
    
Active Optimized     010000000001                   Fibre Channel   
    
Active Optimized     020000000002                   Fibre Channel      

Volumes:   

BSD Name        Name                                     Size                               

disk0s2         MAC_Test_1                               53.3 GB (53343117312 Bytes) 

Other Examples:

Lists all multipathed logical units.

#mpioutil list
              
Get information for a multipathed lun with id 22F2000155A508ED.

#mpioutil info --id 22F2000155A508ED
              
Modify a multipathed lun's algorithm to be RoundRobin with a batch count of 16.

#mpioutil modify lun --id 22F2___155A5_8ED --algorithm RoundRobin --batchCount 16
              
Modify a multipathed lun's algorithm to be LeastIO.

#mpioutil modify lun --id 22F2___155A5_8ED --algorithm LeastIO
              
Disable a path that belongs to a multipathed logical unit.

#mpioutil modify path --lun 22F2___155A5_8ED --id 26_1___155356_BF_______1 --disable

Configuring OVM

  • For OVM based off of OL 5 (UEK prior to 2.6.32):

devices {
    device {
        vendor "SUN"
        product "ZFS Storage.*"
        getuid_callout "/sbin/scsi_id -g -u -p 0x83 -s /block/%n"
        prio alua
        hardware_handler "1 alua"
        path_grouping_policy group_by_prio
        failback immediate
        no_path_retry 600
        rr_min_io 100
        path_checker tur
        rr_weight uniform
        features "0"
    }
}

  • For OVM based off of OL 5 (UEK 2.6.32 and later):

devices {
    device {
        vendor "SUN"
        product "ZFS Storage.*"
        getuid_callout "/sbin/scsi_id -g -u -p 0x83 -s /block/%n"
        prio alua
        hardware_handler "1 alua"
        path_grouping_policy group_by_prio
        failback immediate
        no_path_retry 600
        rr_min_io_rq 100
        path_checker tur
        rr_weight uniform
        features "0"
    }
}

 

  • For OVM based off OL 6.0 and later):

devices {
    device {
        vendor "SUN"
        product "ZFS Storage.*"
        getuid_callout "/lib/udev/scsi_id --page=0x83 --whitelisted --device=/dev/%n"
        prio alua
        hardware_handler "1 alua"
        path_grouping_policy group_by_prio
        failback immediate
        no_path_retry 600
        rr_min_io_rq 100
        path_checker tur
        rr_weight uniform
        features "0"
    }
}

Configuring VMware ESX Initiators

VMware ESX HCL

Refer to the VMware Compatibility Guide for supported Oracle products.

http://www.vmware.com/resources/compatibility/search.php?action=search&deviceCategory=san&key=Oracle

For vSphere 5:

http://www.oracle.com/technetwork/server-storage/sun-unified-storage/documentation/bestprac-zfssa-vsphere5-1940129.pdf

For ESX 4.1 servers:

No addition rules need to be added because the default driver is ALUA aware.

For ESX 4.0 servers:

1. Verify the current SATP plugin that is in use by issuing the esx nmp device list command

# esxcli nmp device list
naa.600144f0ed81720500004bb3c1f60002
   Device Display Name: SUN Fibre Channel Disk (naa.600144f0ed81720500004bb3c1f60002)
   Storage Array Type: VMW_SATP_DEFAULT_AA
   Storage Array Type Device Config:
   Path Selection Policy: VMW_PSP_FIXED
   Path Selection Policy Device Config: {preferred=vmhba0:C0:T1:L0;current=vmhba0:C0:T1:L0}
   Working Paths: vmhba0:C0:T1:L0

VMW_SATP_DEFAULT_AA is the default plugin. This plugin is not ALUA-capable.

2. Add rules to enable the ALUA plugin for the appliances by using the esxcli nmp satp addrule command.

# esxcli nmp satp addrule -s VMW_SATP_ALUA -e "ZFS Storage 7000" -V "SUN" -M "ZFS Storage 7120" -c "tpgs_on"
# esxcli nmp satp addrule -s VMW_SATP_ALUA -e "ZFS Storage 7000" -V "SUN" -M "ZFS Storage 7320" -c "tpgs_on"
# esxcli nmp satp addrule -s VMW_SATP_ALUA -e "ZFS Storage 7000" -V "SUN" -M "ZFS Storage 7420" -c "tpgs_on"

3. Verify the rule was correctly added.

# esxcli nmp satp listrules | grep SUN                     

VMW_SATP_ALUA SUN ZFS Storage 7120 tpgs_on ZFS Storage 7000 VMW_SATP_ALUA SUN ZFS Storage 7320 tpgs_on ZFS Storage 7000 VMW_SATP_ALUA SUN ZFS Storage 7420 tpgs_on ZFS Storage 7000

4. Reboot the VMware ESX server. When server has rebooted, check to ensure the correct plugin is now in effect with the esxcli nmp device list command.

# esxcli nmp device list
naa.600144f0ed81720500004bb3c1f60002
   Device Display Name: SUN Fibre Channel Disk (naa.600144f0ed81720500004bb3c1f60002)
   Storage Array Type: VMW_SATP_ALUA
   Storage Array Type Device Config: {implicit_support=on;explicit_support=off;
   explicit_allow=on;alua_followover=on;{TPG_id=0,TPG_state=STBY}{TPG_id=1,TPG_state=AO}}
   Path Selection Policy: VMW_PSP_MRU
   Path Selection Policy Device Config: Current Path=vmhba1:C0:T1:L0
   Working Paths: vmhba1:C0:T1:L0

Troubleshooting

This section describes troubleshooting known issues.

Multipath-tools version 0.4.7 bundled in OEL 5.4 is unable to recognize paths in ALUA standby access state

In SCSI spec, a target port which is in standby state does not respond to Test Unit Ready command, so standby paths are shown as [failed] in multipath command output.

The fix for this problem is committed into the multipath-tool source tree on 2009-04-21 (which is later than its 0.4.8 official release). Users have to obtain the latest version of the multipath-tool source code from: http://christophe.varoqui.free.fr/

Users should get the latest source code from its git repository. The multipath-tools-0.4.8.tar.bz2 tarball does not contain the fix.

Finally, the status shown in multipath command output does not impact functionality like I/O and failover/failback, so updating the package is not mandatory.

See Also

  • fcinfo man page

http://www.oracle.com/technetwork/documentation/oracle-unified-ss-193371.html

  • Solaris Fibre Channel and Storage Multipathing Administration Guide

http://www.oracle.com/technetwork/documentation/solaris-11-192991.html

  • Windows Server High Availability with Microsoft MPIO

http://www.microsoft.com/downloads/details.aspx?FamilyID=CBD27A84-23A1-4E88-B198-6233623582F3&displaylang=en

  • Using Device-Mapper Multipath - Red Hat

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/DM_Multipath/

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html-single/DM_Multipath/

Configuring iSCSI/iSER Client Multipathing

Solaris iSCSI/iSER and MPxIO Considerations

MPxIO supports target port aggregation and availability in Solaris iSCSI configurations that configure multiple sessions per target (MS/T) on the iSCSI initiator.

  • Use IPMP for aggregation and failover of two or more NICs.
  • A basic configuration for an iSCSI host is a server with two NICs that are dedicated to iSCSI traffic. The NICs are configured by using IPMP. Additional NICs are provided for non-iSCSI traffic to optimize performance.
  • Active multipathing can be achieved by using the Solaris iSCSI MS/T feature, and the failover and redundancy of an IPMP configuration.
    • If one NIC fails in an IPMP configuration, IPMP handles the failover. The MPxIO driver does not notice the failure. In a non-IPMP configuration, the MPxIO driver fails and offlines the path.
    • If one target port fails in an IPMP configuration, the MPxIO driver notices the failure and provides the failover. In a non-IPMP configuration, the MPxIO driver notices the failure and provides the failover.
  • For more information about using the Solaris iSCSI MS/T feature with IPMP and multipathing, see MOS Knowledge Article 1005479.1, Understanding an iSCSI MS/T multi-path configuration.
  • For information about configuring multiple sessions per target, see How to Enable Multiple iSCSI Sessions for a Target in the following document: http://download.oracle.com/docs/cd/E19253-01/817-5093/gcawf
  • For information about configuring IPMP, see Part VI, IPMP, in System Administration Guide: IP Services, in the following document: http://download.oracle.com/docs/cd/E19253-01/816-4554/ipmptm-1

Configuring SRP Client Multipathing

VMWare 4.0

The VMware Native MultiPath Plugin (nmp) has two components that can be changed on a device by device, path by path, or array basis.

Path Selection Plugin (psp)

This plugin controls which physical path is used for I/O:

     # esxcli nmp psp list
     Name           Description                        
     VMW_PSP_MRU    Most Recently Used Path Selection  
     VMW_PSP_RR     Round Robin Path Selection         
     VMW_PSP_FIXED  Fixed Path Selection  

Storage Array Type Plugin (satp)

This plugin controls how failover works:

The SATP has to be configured to recognize the array vendor or model string in order to change the basic failover mode from a default Active/Active type array to ALUA.

By default the appliance cluster was coming up as a Active/Active array only.

Use the ESX CLI to add rules to have the ALUA plugin claim the 7000 luns.

   # esxcli nmp satp addrule -s VMW_SATP_ALUA -e "ZFS Storage 7000" -V "SUN" -M "ZFS Storage 7120" -c "tpgs_on"
   # esxcli nmp satp addrule -s VMW_SATP_ALUA -e "ZFS Storage 7000" -V "SUN" -M "ZFS Storage 7320" -c "tpgs_on"
   # esxcli nmp satp addrule -s VMW_SATP_ALUA -e "ZFS Storage 7000" -V "SUN" -M "ZFS Storage 7420" -c "tpgs_on"

 

    options are:
    -s VMW_SATP_ALUA - for the ALUA SATP
    -e description of the rule
    -V Vendor 
    -M Model
    -c claim option for Target Portal Group (7000 seems to support implicit)

If no luns have been scanned/discovered, you can simply rescan the adapter to find new luns. The luns will be claimed by the ALUA plugin. If luns are already present, reboot the ESX host.

After the reboot, you will see the luns being listed under the VMW_SATP_ALUE array type.

   # esxcli nmp device list
   naa.600144f096bb823800004b707f2d0001
   Device Display Name: Local SUN Disk (naa.600144f096bb823800004b707f2d0001)
   Storage Array Type: VMW_SATP_ALUA
   Storage Array Type Device Config:
      {implicit_support=on;explicit_support=off;explicit_allow=on;
       alua_followover=on; {TPG_id=0,TPG_state=AO}{TPG_id=1,TPG_state=STBY}}
   Path Selection Policy: VMW_PSP_MRU
   Path Selection Policy Device Config: Current Path=vmhba_mlx4_1.1.1:C0:T1:L0
   Working Paths: vmhba_mlx4_1.1.1:C0:T1:L0

Relevant lun path lists will show an Active and a Standby path

   # esxcli nmp path list
   gsan.80fe53553e0100282100-gsan.80fe8f583e0100282100
   -naa.600144f096bb823800004b707f2d0001
   Runtime Name: vmhba_mlx4_1.1.1:C0:T2:L0
   Device: naa.600144f096bb823800004b707f2d0001
   Device Display Name: Local SUN Disk (naa.600144f096bb823800004b707f2d0001)
   Group State: standby
   Storage Array Type Path Config: 
   {TPG_id=1,TPG_state=STBY,RTP_id=256,RTP_health=UP}
   Path Selection Policy Path Config: {non-current path}
   gsan.80fe53553e0100282100-gsan.80fe73583e0100282100
   -naa.600144f096bb823800004b707f2d0001
   Runtime Name: vmhba_mlx4_1.1.1:C0:T1:L0
   Device: naa.600144f096bb823800004b707f2d0001
   Device Display Name: Local SUN Disk (naa.600144f096bb823800004b707f2d0001)
   Group State: active
   Storage Array Type Path Config: 
   {TPG_id=0,TPG_state=AO,RTP_id=2,RTP_health=UP}
   Path Selection Policy Path Config: {current path}

VMWare ESX 4.0 Issues

  • Standby and active paths may not be found

The esxcl nmp path list command will report an active and a standby path one each for the SRP targets in a cluster configuration.

[root@ib-client-5 vmware]# esxcli nmp path list
gsan.80fe53553e0100282100-gsan.80fe8f583e0100282100-  
naa.600144f096bb823800004b707f2d0001
   Runtime Name: vmhba_mlx4_1.1.1:C0:T2:L0
   Device: naa.600144f096bb823800004b707f2d0001
   Device Display Name: Local SUN Disk 
(naa.600144f096bb823800004b707f2d0001)
   Group State: standby
   Storage Array Type Path Config: 
{TPG_id=1,TPG_state=STBY,RTP_id=256,RTP_health=UP}
   Path Selection Policy Path Config: {non-current path}
gsan.80fe53553e0100282100-gsan.80fe73583e0100282100-
   naa.600144f096bb823800004b707f2d0001
   Runtime Name: vmhba_mlx4_1.1.1:C0:T1:L0
   Device: naa.600144f096bb823800004b707f2d0001
   Device Display Name: Local SUN Disk
(naa.600144f096bb823800004b707f2d0001)
   Group State: active
   Storage Array Type Path Config:
{TPG_id=0,TPG_state=AO,RTP_id=2,RTP_health=UP}
   Path Selection Policy Path Config: {current path}

When this problem occurs, the active or standby path may not be shown in the output of esxcli nmp path list.

Workaround: None

 

  • VMWare VM Linux guest may hang during cluster takeover

When this problem happens, the Linux guest system log will report in its /var/log/messages log:

Feb 10 16:10:00 ib-client-5 vmkernel: 1:21:41:36.385 cpu3:4421)<3>ib_srp:
Send tsk_mgmt target[vmhba_mlx4_1.1.1:2] out of TX_IU head 769313 tail 769313 lim 0

Workaround: Reboot guest VM

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback