![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1638517.1 : Oracle ZFS Storage Appliance: Oracle Linux/ Redhat Enterprise Linux - Client I/O error during FC LUN Failover
In this Document
Created from <SR 3-8466918751> Applies to:Sun ZFS Storage 7120 - Version All Versions to All Versions [Release All Releases]Sun Storage 7410 Unified Storage System - Version All Versions to All Versions [Release All Releases] Sun ZFS Storage 7320 - Version All Versions to All Versions [Release All Releases] Sun Storage 7310 Unified Storage System - Version All Versions to All Versions [Release All Releases] Sun ZFS Storage 7420 - Version All Versions to All Versions [Release All Releases] 7000 Appliance OS (Fishworks) SymptomsAppliance version: 7420 When testing takeover and failback on a ZFS appliance 7420 cluster there is a I/O error on the Redhat client logs for a short period. This also occurs on the reboot of one of the heads in the cluster. The SAN switch connection has four FC paths to the attached ZFS appliance and the clients are using the Redhat 6.2 multipath daemon. In all scenario, two paths remain active when writing data for 30-40 seconds The two configured FC LUN: Views for 600144F09F6B2DCA0000528C677C000C:
Data File : /dev/zvol/rdsk/db_pool_02/local/Composition/Oracle Host group : dc1plogdb01 Target Group : tgt-dc101 LUN : 0 Views for 600144F09F6B2DCA000052DD19DB0001: Data File : /dev/zvol/rdsk/db_pool_02/local/Composition/Archive Host group : dc1plogdb01 Target Group : tgt-dc101 LUN : 2 The logs captured during the Redhat client I/O offline messages: ### SCENARIO - 1 - Redhat Client messages after a reboot of the appliance: Jan 29 17:29:46 dc1plogdb02 kernel: rport-6:0-0: blocked FC remote port time out: removing target and saving binding
Jan 29 17:29:46 dc1plogdb02 kernel: sd 6:0:0:0: alua: Detached Jan 29 17:29:46 dc1plogdb02 kernel: scsi 6:0:0:0: rejecting I/O to offline device Jan 29 17:29:46 dc1plogdb02 kernel: scsi 6:0:0:0: rejecting I/O to offline device Jan 29 17:29:46 dc1plogdb02 kernel: scsi 6:0:0:0: rejecting I/O to offline device Jan 29 17:29:46 dc1plogdb02 kernel: scsi 6:0:0:0: [sdc] Unhandled error code Jan 29 17:29:46 dc1plogdb02 kernel: scsi 6:0:0:0: [sdc] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jan 29 17:29:46 dc1plogdb02 kernel: scsi 6:0:0:0: [sdc] CDB: Write(10): 2a 00 04 77 cc 88 00 00 08 00 Jan 29 17:29:46 dc1plogdb02 kernel: end_request: I/O error, dev sdc, sector 74960008 Jan 29 17:29:46 dc1plogdb02 kernel: scsi 6:0:0:0: [sdc] CDB: Write(10): 2a 00 04 77 ce 90 00 02 00 00 Jan 29 17:29:46 dc1plogdb02 kernel: end_request: I/O error, dev sdc, sector 74960528 Jan 29 17:29:46 dc1plogdb02 kernel: device-mapper: multipath: Failing path 8:32. Jan 29 17:29:46 dc1plogdb02 kernel: rport-3:0-4: blocked FC remote port time out: removing target and saving binding Jan 29 17:29:46 dc1plogdb02 kernel: lpfc 0000:0d:00.0: 0:(0):0203 Devloss timeout on WWPN 21:00:00:24:ff:35:93:f6 NPort x010e00 Data: x0 x7 x0 Jan 29 17:29:46 dc1plogdb02 kernel: sd 3:0:1:0: alua: Detached Jan 29 17:29:46 dc1plogdb02 kernel: scsi 3:0:1:0: rejecting I/O to offline device Jan 29 17:29:46 dc1plogdb02 kernel: scsi 3:0:1:0: rejecting I/O to dead device Jan 29 17:29:46 dc1plogdb02 kernel: scsi 3:0:1:0: rejecting I/O to dead device Jan 29 17:29:46 dc1plogdb02 kernel: scsi 3:0:1:0: [sdb] Unhandled error code Jan 29 17:29:46 dc1plogdb02 kernel: scsi 3:0:1:0: [sdb] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK Jan 29 17:29:46 dc1plogdb02 kernel: scsi 3:0:1:0: [sdb] CDB: Write(10): 2a 00 04 76 df 40 00 00 08 00 Jan 29 17:29:46 dc1plogdb02 kernel: end_request: I/O error, dev sdb, sector 74899264 Jan 29 17:29:47 dc1plogdb02 multipathd: mpathc: load table [0 262144000 multipath 1 queue_if_no_path 1 alua 1 1 round-robin 0 2 1 8:48 1 8:64 1] Jan 29 17:29:47 dc1plogdb02 kernel: sd 3:0:0:0: alua: port group 01 state A supports toluSnA Jan 29 17:29:47 dc1plogdb02 kernel: sd 6:0:1:0: alua: port group 01 state A supports toluSnA Jan 29 17:29:47 dc1plogdb02 multipathd: sdb [8:16]: path removed from map mpathc ### See http://www.sourceware.org/lvm2/wiki/MultipathUsageGuide for details.
### Multipath.txt SCENARIO - 1 - Multipath -LL mpathc (3600144f09164a66c000052d4fa290003) dm-2 SUN,ZFS Storage 7420
size=125G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw |-+- policy='round-robin 0' prio=0 status=enabled | |- 3:0:1:0 sdb 8:16 failed faulty running | `- #:#:#:# - #:# failed faulty running <<<< failed `-+- policy='round-robin 0' prio=130 status=active |- 3:0:0:0 sdd 8:48 active ready running `- 6:0:1:0 sde 8:64 active ready running mpathc (3600144f09164a66c000052d4fa290003) dm-2 , size=125G features='0' hwhandler='1 alua' wp=rw |-+- policy='round-robin 0' prio=0 status=enabled | |- #:#:#:# - #:# failed faulty running | `- #:#:#:# - #:# failed faulty running `-+- policy='round-robin 0' prio=130 status=active |- 3:0:0:0 sdd 8:48 active ready running `- 6:0:1:0 sde 8:64 active ready running mpathc (3600144f09164a66c000052d4fa290003) dm-2 , size=125G features='0' hwhandler='1 alua' wp=rw |-+- policy='round-robin 0' prio=0 status=enabled | |- #:#:#:# - #:# failed faulty running | `- #:#:#:# - #:# failed faulty running `-+- policy='round-robin 0' prio=130 status=active |- 3:0:0:0 sdd 8:48 active ready running `- 6:0:1:0 sde 8:64 active ready running
Changes
CauseThis is expected behavior - as there is an interruption of the I/O for 30 seconds for a single LUN during a LUN failover There is always a delay when failing over LUNs. mpathc (3600144f09164a66c000052d4fa290003) dm-2 ,
size=125G features='0' hwhandler='1 alua' wp=rw |-+- policy='round-robin 0' prio=0 status=enabled | |- #:#:#:# - #:# failed faulty running | `- #:#:#:# - #:# failed faulty running `-+- policy='round-robin 0' prio=130 status=active |- 3:0:0:0 sdd 8:48 active ready running `- 6:0:1:0 sde 8:64 active ready running
mpathc (3600144f09164a66c000052d4fa290003) dm-2 SUN,ZFS Storage 7420
size=125G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw |-+- policy='round-robin 0' prio=130 status=active | |- 3:0:0:0 sdd 8:48 active ready running | `- 6:0:1:0 sde 8:64 active ready running `-+- policy='round-robin 0' prio=1 status=enabled |- 3:0:1:0 sdb 8:16 active ready running `- 6:0:0:0 sdc 8:32 active ready running
Solution### Please refer to the Oracle Technical White Paper - January 2014 "Understanding the Use of Fibre Channel in the Oracle ZFS Storage Appliance"
Failover of the data traffic to the node with the standby path will occur when a node failover is initiated.
The failover for a single LUN takes about 30 seconds
=================================================================================== ### For Oracle Linux please refer to these documents for guidance: White Paper: Configuring Multipathing for Oracle Linux and the Oracle ZFS Storage Appliance: Sun Storage J4500 Array System Overview - Enabling and Disabling Multipathing in the Linux Operating System http://docs.oracle.com/cd/E19122-01/j4500.array/820-3163/bcghjife.html ### Redhat 6.2 client settings ### The I/O timeout can be changed by tuning the file multipath.conf:
Red Hat Enterprise Linux 6.2 includes the following documentation and feature updates and changes. ====================================================================================
Checked for relevancy - 10-May-2018 References<NOTE:1628999.1> - Oracle ZFS Storage Appliance: How to set up Client Multipathinghttp://www.sourceware.org/lvm2/wiki/MultipathUsageGuide https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/DM_Multipath/MPIO_Overview.html#s1-ov-newfeatures-6.2-dmmultipath https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/pdf/DM_Multipath/Red_Hat_Enterprise_Linux-6-DM_Multipath-en-US.pdf https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/DM_Multipath/config_file_defaults.html Attachments This solution has no attachment |
||||||||||||||||||||
|