Asset ID: |
1-72-1472932.1 |
Update Date: | 2017-09-26 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1472932.1
:
StorageTek FC Devices - How to Troubleshoot Fibre Channel Device Installation and Configuration Issues
Related Items |
- Sun StorageTek SL500 Modular Library System
|
Related Categories |
- PLA-Support>Sun Systems>TAPE>Tape Hardware>SN-TP: SL500 Library
|
In this Document
Applies to:
Sun StorageTek SL500 Modular Library System - Version All Versions and later
Information in this document applies to any platform.
Symptoms
The host server cannot see the tape device.
Changes
Any one of these events could have happened:
1. Added a new tape device to the configuration
2. Replaced a tape device
3. Upgraded the switch
4. Upgraded the OS or application software
Cause
Any one of these events could have happened:
1. Added a new tape device to the configuration
2. The tape device disappeared from the fabric
3. Switch issues
4. HBA issues
Solution
These troubleshooting steps are designed for resolving installation and configuration issues involving Solaris host servers connected to
FC/SCSI tape storage devices. Most of the information derived from running the commands specified below can also be obtained from the
Sun Explorer output. Refer to Document:
1359037.1 Tape - How to Pull Solaris Patch Information From the Explorer Application
A. Verify HBA installation and configuration state
A.1. Run the fcinfo command to display information about the HBA ports on the host server. For example, from an x86 system:
#fcinfo hba-port -l
HBA Port WWN: 210000e08b138e17 * Local HBA#1 port WWN which should be mapped to the remote tape device WWN in the FC switch
OS Device Name: /dev/cfg/c2 * This HBA is in slot c2 of the pci card
Manufacturer: QLogic Corp. * This is a QLogic HBA
Model: QLA2342
Firmware Version: 03.03.28
FCode/BIOS Version: BIOS: 1.34;
Serial Number: not available
Driver Name: qlc
Driver Version: 20100301-3.00
Type: N-port *Node port indicates connection to a FC switch
State: online *HBA is online
Supported Speeds: 1Gb 2Gb
Current Speed: 2Gb
Node WWN: 200000e08b138e17
Link Error Statistics: *These stats indicate the quality of the communication channel
Link Failure Count: 0
Loss of Sync Count: 1
Loss of Signal Count: 1
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 0
Invalid CRC Count: 0
...
...
Tech Note:
- For currently supported HBA cards and drivers, check the Fibre Channel HBA Support Matrix -
http://twiki.us.oracle.com/bin/view/Main/HBASupportMatrix
- The command 'rpcinfo remote-port -slp <local HBA Port WWN>' can also be used to lists the remote-port
information for remote ports visible to the HBA
A.2. Run the prtdiag command to identify the PCI slots occupied by the HBA devices. The prtdiag information presented
on a SPARC server is different from that of a x86 server.
X86 example:
#prtdiag -v
==== Upgradeable Slots ============
ID Status Type Description
--- --------- ---------- ----------------------------
1 available PCI PCIX1
2 in use PCI PCIX2 *associated with /dev/cfg/c2 in the fcinfo example above
3 available PCI PCIX3
4 in use PCI PCIX4
5 in use PCI PCIX5
6 available PCI PCIX6
Sparc example:
#prtdiag -v
================================ IO Devices ================================
Slot + Bus Name + Model
Status Type Path
----------------------------------------------------------------------------
...
MB/PCI_MEZZ/PCIX4 PCIX scsi-pci1000,30 LSI,1030
/pci@0/pci@0/pci@8/pci@0/pci@2/pci@0/scsi@1
MB/PCI_MEZZ/PCIX4 PCIX scsi-pci1000,30 LSI,1030
/pci@0/pci@0/pci@8/pci@0/pci@2/pci@0/scsi@1,1
MB/PCI_MEZZ/PCIX3 PCIX SUNW,emlxs-pci10df,fc10 LP11002-S
/pci@0/pci@0/pci@8/pci@0/pci@2/pci@0,2/SUNW,emlxs@1
MB/PCI_MEZZ/PCIX3 PCIX SUNW,emlxs-pci10df,fc10 LP11002-S
/pci@0/pci@0/pci@8/pci@0/pci@2/pci@0,2/SUNW,emlxs@1,1
Action Note: A problem at this point will mean an HBA or internal PCI problem. Please contact the SAN,
SPARC or x86 support group.
A.3. Verify the path to the FC devices using the luxadm command. For example:
# luxadm -e port
/devices/pci@3,0/pci1077,101@2/fp@0,0:devctl CONNECTED
/devices/pci@3,0/pci1077,101@2,1/fp@0,0:devctl NOT CONNECTED
Tech Note:
"CONNECTED" indicates that the HBA has established communication with some node (or FC device)
"NOT CONNECTED" may indicate that the HBA port may not be zoned to a target node in the switch
A.4. Verify the device drivers that manage the tape devices. For example:
# prtconf -D
...
pci, instance #3 (driver name: pci) * this shows the PCI bus driver
pci1077,101, instance #0 (driver name: qlc) * this shows the device driver of the HBA connected to the tape drives
this also shows that the Sun-branded Qlogic "qlc" driver is installed
fp, instance #0 (driver name: fp) * this show the fibre channel port driver
tape, instance #6 (driver name: st) * these show the device driver of the tape drives connected to the HBA
tape, instance #8 (driver name: st) * ---
tape, instance #18 (driver name: st) * ---
tape, instance #11 (driver name: st) * ---
tape, instance #17 (driver name: st) * ---
tape, instance #12 (driver name: st) * ---
pci1077,101, instance #1 (driver name: qlc) * this shows the device driver of the HBA connected to the FC/SCSI tape library
fp, instance #1 (driver name: fp) * this shows the fibre channel port driver
medium-changer, instance #4 (driver name: sgen) * this shows a generic SCSI device driver
mchanger, instance #1 (driver name: mchanger) * this shows the device driver that controls the mchanger (STK tape library ) device
...
Tech Note:
1. If the driver is not attached to the device, the device will not appear in the prtconf -D output.
2. The Sun-branded Qlogic "qlc" driver is not equivalent to the Qlogic "ql" driver. Oracle only support the "qlc" driver.
3. For more detailed information, run 'prtconf -Dv'
Action Note:
1. If the OS does not see any HBA associated with the tape library; or, if no device driver is attached to the tape device, please open a service request
with the SAN DISK group to assist with the troubleshooting process.
2. If the OS sees the HBA and the device drivers associated with the HBA and devices attached to it, continue with the troubleshooting steps below.
If the OS sees the HBA and the device drivers associated with the HBA and devices attached to it, continue with the troubleshooting steps below.
B. Check the device driver packages and patches installed
B.1. Verify the driver packages installed. For example:
# pkginfo | grep ql
system SUNWqlc Qlogic ISP Fibre Channel Device Driver and GLDv3 NIC driver
system SUNWqlcu Qlogic Fibre Channel Adapter Utilities (usr)
# pkginfo | grep emlx
system SUNWemlxs Emulex-Sun driver kit for Fibre Channel and Converged Network Adapters (root)
system SUNWemlxu Emulex-Sun LightPulse Fibre Channel Adapter Utilties (usr)
Tech Note:
1. For a Qlogic HBA, we should see at least SUNWqlc pakage
2. For an Emulex HBA, we should have at least SUNWemlxs pakage
B.2. Verify the driver version level. For example:
# pkginfo -l SUNWqlc
PKGINST: SUNWqlc
NAME: Qlogic ISP Fibre Channel Device Driver and GLDv3 NIC driver
CATEGORY: system
ARCH: i386
VERSION: 11.10.0,REV=2005.01.04.14.30
BASEDIR: /
VENDOR: Sun Microsystems, Inc.
DESC: Qlogic ISP Fibre Channel Device Driver and GLDv3 NIC driver
PSTAMP: on10ptchfeatx20100714003410
INSTDATE: May 09 2011 12:50
HOTLINE: Please contact your local service provider
STATUS: completely installed
FILES: 9 installed pathnames
3 shared pathnames
3 directories
4 executables
7172 blocks used (approx)
Tech Note:
1. For currently supported HBA cards and drivers, check the Fibre Channel HBA Support Matrix -
http://twiki.us.oracle.com/bin/view/Main/HBASupportMatrix
C. Verify the devices that are connected to the HBA
C.1 Display the devices that are currently seen by the HBA. For example:
# fcinfo hba-port | grep 'Port WWN'
HBA Port WWN: 210000e08b138e17
HBA Port WWN: 210100e08b338e17
For each HBA Port WWN in the list above, run:
# fcinfo remote-port -slp 210000e08b138e17
HBA 210000e08b138e17
Remote Port WWN: 500104f0009e92ab
Active FC4 Types: SCSI
SCSI Target: yes
Node WWN: 500104f0009e92aa
Link Error Statistics:
Link Failure Count: 0
Loss of Sync Count: 148
Loss of Signal Count: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 0
Invalid CRC Count: 0
LUN: 0
Vendor: HP
Product: Ultrium 3-SCSI
OS Device Name: Unknown
# fcinfo remote-port -slp 210100e08b338e17
HBA 210100e08b338e17
Remote Port WWN: 500104f0009e929e
Active FC4 Types:
SCSI Target: yes
Node WWN: 500104f0009e929d
Link Error Statistics:
Link Failure Count: 0
Loss of Sync Count: 0
Loss of Signal Count: 0
Primitive Seq Protocol Error Count: 0
Invalid Tx Word Count: 0
Invalid CRC Count: 0
LUN: 0
Vendor: STK
Product: SL500
OS Device Name: /devices/pci@3,0/pci1077,101@2,1/fp@0,0/medium-changer@w500104f0009e929e,0
Tech Note:
1. The information generated above can also be obtained by running this command:
# fcinfo hba-port | grep 'Port WWN' | awk '{print$4}'|xargs -i sh -c "echo HBA {}; fcinfo remote-port -slp {}"
2. If Link Error Statistics show high count values, there could be connectivity issues concerning the FC cable,
the switch or the HBA. Open a collaboration service request with the Disk or SAN support group.
3. Make sure that the connected device is online and available.
Action Note:
1. If Link Error Statistics show high count values, there could be connectivity issues concerning the FC cable, the switch or the HBA.
Open a collaboration service request with the Disk or SAN support group.
2. Make sure that the connected device is online and available.
C.2 Verify if the device drivers are added correctly and attached to the associated devices. For example:
# prtconf -D
...
pci, instance #3 (driver name: pci)
pci1077,101, instance #0 (driver name: qlc)
fp, instance #0 (driver name: fp)
tape, instance #6 (driver name: st)
tape, instance #17 (driver name: st)
tape, instance #18 (driver name: st)
tape, instance #11 (driver name: st)
tape, instance #12 (driver name: st)
tape, instance #8 (driver name: st)
pci1077,101, instance #1 (driver name: qlc)
fp, instance #1 (driver name: fp)
medium-changer, instance #4 (driver name: sgen)
mchanger, instance #1 (driver name: mchanger)
...
Tech Note:
1. The examples above shows a single HBA with one port communicating with 6 tape drives and another
port communicating to an mchanger device (i.e., a FC/SCSI tape library)
2. sgen is the generic SCSI device driver installed with Solaris. For details, see this link:
http://docs.oracle.com/cd/E19082-01/819-2254/sgen-7d/index.html
3. mchanger is the device driver for StorageTek FC/SCSI libraries installed with the ACSLS software.
4. pci controls the PCI bus
5. qlc controls the Qlogic HBA
6. fp is the FC port driver
7. st is the tape device driver
C.3 Verify from the system messages file if the OS recognizes the attached devices during boot up time. The system messages
file would also show error messages concerning communication failures between the host server and the connected devices.
For example:
# dmesg
Jun 27 16:10:05 cl-lib04 genunix: [ID 408114 kern.info] /pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f0009e92ab,0 (st6) offline
Jun 27 16:10:05 cl-lib04 genunix: [ID 408114 kern.info] /pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f0009e92a8,0 (st8) offline
Jun 27 16:10:35 cl-lib04 fctl: [ID 517869 kern.warning] WARNING: fp(1)::OFFLINE timeout
Jun 27 16:10:54 cl-lib04 scsi: [ID 243001 kern.info] /pci@3,0/pci1077,101@2,1/fp@0,0 (fcp1):
Jun 27 16:10:54 cl-lib04 offlining lun=0 (trace=0), target=ef (trace=2800004)
Jun 27 16:10:54 cl-lib04 genunix: [ID 408114 kern.info] /pci@3,0/pci1077,101@2,1/fp@0,0/medium-changer@w500104f0009e929e,0 (sgen4) offline
Jun 27 16:13:45 cl-lib04 fctl: [ID 517869 kern.warning] WARNING: fp(0)::N_x Port with D_ID=10000, PWWN=500104f0009e92a8 reappeared in fabric
Jun 27 16:13:45 cl-lib04 scsi: [ID 799468 kern.info] st8 at fp0: name w500104f0009e92a8,0, bus address 10000
Jun 27 16:13:45 cl-lib04 genunix: [ID 936769 kern.info] st8 is /pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f0009e92a8,0
Jun 27 16:13:46 cl-lib04 scsi: [ID 365881 kern.info] /pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f0009e92a8,0 (st8):
Jun 27 16:13:46 cl-lib04 <HP Ultrium LTO 3>
Jun 27 16:13:46 cl-lib04 genunix: [ID 408114 kern.info] /pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f0009e92a8,0 (st8) online
Jun 27 16:13:47 cl-lib04 fctl: [ID 517869 kern.warning] WARNING: fp(0)::N_x Port with D_ID=10100, PWWN=500104f0009e92ab reappeared in fabric
Jun 27 16:13:47 cl-lib04 scsi: [ID 799468 kern.info] st6 at fp0: name w500104f0009e92ab,0, bus address 10100
Jun 27 16:13:47 cl-lib04 genunix: [ID 936769 kern.info] st6 is /pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f0009e92ab,0
Jun 27 16:13:48 cl-lib04 scsi: [ID 365881 kern.info] /pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f0009e92ab,0 (st6):
Jun 27 16:13:48 cl-lib04 <HP Ultrium LTO 3>
Jun 27 16:13:48 cl-lib04 genunix: [ID 408114 kern.info] /pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f0009e92ab,0 (st6) online
Jun 27 16:15:51 cl-lib04 qlc: [ID 630585 kern.info] NOTICE: Qlogic qlc(1): Loop ONLINE
Jun 27 16:15:51 cl-lib04 fctl: [ID 517869 kern.warning] WARNING: fp(1)::fp_plogi_intr: fp 1 pd ef
Jun 27 16:15:53 cl-lib04 scsi: [ID 799468 kern.info] sgen4 at fp1: name w500104f0009e929e,0, bus address ef
Jun 27 16:15:53 cl-lib04 genunix: [ID 936769 kern.info] sgen4 is /pci@3,0/pci1077,101@2,1/fp@0,0/medium-changer@w500104f0009e929e,0
Jun 27 16:15:53 cl-lib04 genunix: [ID 408114 kern.info] /pci@3,0/pci1077,101@2,1/fp@0,0/medium-changer@w500104f0009e929e,0 (sgen4) online
Jul 3 17:21:14 cl-lib04 genunix: [ID 935449 kern.info] ATA DMA off: disabled. Control with "atapi-cd-dma-enabled" property
C.4. Verify if the physical device is recognized by the OS and is mapped by the kernel to an instance name. The OS uses the instance name to reference the
physical device in the system messages file, prtconf and sysdef commands. For example:
# cat /etc/path_to_inst
#
# Caution! This file contains critical kernel state
#
...
"/scsi_vhci" 0 "scsi_vhci"
"/scsi_vhci/medium-changer@g500104f0009e929d" 0 "sgen"
...
"/pci@0,0" 0 "pci"
"/pci@0,0/display@2" 0 "vgatext"
...
"/pci@0,0/pci8086,4e@4" 0 "e1000g"
"/pci@0,0/pci-ide@f,1" 0 "pci-ide"
"/pci@0,0/pci-ide@f,1/ide@0" 0 "ata"
"/pci@0,0/pci-ide@f,1/ide@0/cmdk@0,0" 0 "cmdk"
...
"/pci@3,0" 3 "pci"
"/pci@3,0/pci1077,101@2" 0 "qlc"
"/pci@3,0/pci1077,101@2/fp@0,0" 0 "fp"
"/pci@3,0/pci1077,101@2/fp@0,0/medium-changer@w500110a0008c35ba,1" 1 "sgen"
"/pci@3,0/pci1077,101@2/fp@0,0/tape@w500110a0008c35ba,0" 4 "st"
"/pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f000795ed9,0" 5 "st"
"/pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f0009e92ab,0" 6 "st" <<<- referred to as "st6" in dmesg example above
"/pci@3,0/pci1077,101@2/fp@0,0/medium-changer@w500104f0009e929e,0" 2 "sgen"
"/pci@3,0/pci1077,101@2/fp@0,0/tape@w50060b00006287e0,0" 7 "st"
"/pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f0009e92a8,0" 8 "st"
"/pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f000795f3c,0" 9 "st"
"/pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f000795ee2,0" 10 "st"
"/pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f000795edf,0" 12 "st"
"/pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f000795f51,0" 11 "st"
"/pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f000795f48,0" 13 "st"
"/pci@3,0/pci1077,101@2/fp@0,0/tape@w500110a001446148,0" 15 "st"
"/pci@3,0/pci1077,101@2/fp@0,0/tape@w50060b00059dc184,0" 16 "st"
"/pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f000795f6c,0" 17 "st"
"/pci@3,0/pci1077,101@2/fp@0,0/tape@w500104f000795f2d,0" 18 "st"
"/pci@3,0/pci1077,101@2,1" 1 "qlc"
"/pci@3,0/pci1077,101@2,1/fp@0,0" 1 "fp"
"/pci@3,0/pci1077,101@2,1/fp@0,0/tape@w500104f0009e92ab,0" 1 "st"
"/pci@3,0/pci1077,101@2,1/fp@0,0/tape@w500104f000795ee5,0" 0 "st"
"/pci@3,0/pci1077,101@2,1/fp@0,0/tape@w500104f000795ed9,0" 2 "st"
"/pci@3,0/pci1077,101@2,1/fp@0,0/tape@w500104f000795ef1,0" 3 "st"
"/pci@3,0/pci1077,101@2,1/fp@0,0/tape@w500104f0009e92a8,0" 14 "st"
"/pci@3,0/pci1077,101@2,1/fp@0,0/medium-changer@w100000e002231482,0" 3 "sgen"
"/pci@3,0/pci1077,101@2,1/fp@0,0/mchanger@w100000e002231482,0" 0 "mchanger"
"/pci@3,0/pci1077,101@2,1/fp@0,0/medium-changer@w500104f0009e929e,0" 4 "sgen" <<<- referred to as "sgen4" in dmesg example above
"/pci@3,0/pci1077,101@2,1/fp@0,0/mchanger@w500104f0009e929e,0" 1 "mchanger"
"/pci@1,0" 1 "pci"
"/pci@1,0/pci14e4,9@3" 0 "bge"
"/pci@2,0" 2 "pci"
"/pci@2,0/pci1000,5110@2" 0 "mpt"
"/pci@2,0/pci1000,5110@2/st@4,0" 19 "st"
"/pci@2,0/pci1000,5110@2/sgen@4,1" 5 "sgen"
"/pci@2,0/pci8086,1179@3" 1 "e1000g"
"/pci@2,0/pci8086,1179@3,1" 2 "e1000g"
"/pci@4,0" 4 "pci"
"/agpgart" 0 "agpgart"
Tech Note:
1. Do not edit /etc/path_to_inst file. This file is used by the OS to keep instance numbers persistent across reboots.
This file is read only at boot time and is updated by add_drv and devfsadm commands.
2. If the FC device being configured is not in this file, the OS did not recognize that device during start-up.
Action Note:
If fcinfo, dmesg and cfgadm do not see the device but path_to_inst includes the device, verify HBA to device connection:
- Switch Zoning / SAN configuration issues
- Stuck (or hung) End Device
- Port Switch issues
C.5. Verify if logical to physical device links for the FC tape devices are created in /dev directory. For tape devices, the link
files are created as /dev/rmt/* filenames. For FC/SCSI tape library devices, the link files are created as /dev/mchanger* filenames.
For example:
# ls -l /dev/rmt/*
# ls -l /dev/mchanger*
Tech Note:
If no device links are created, verify the /dev/devlink.tab file. This file specifies rules that devfsadm and add_drv uses
to create links in the /dev directory. For example:
#cat /dev/devlink.tab
type=ddi_pseudo;minor1=cpqary3 cpqary3\M2
#START ADD SUNWxsvc Mon May 9 12:41:38 CLT 2011
type=ddi_pseudo;name=xsvc \M0
#END ADD SUNWxsvc Mon May 9 12:41:38 CLT 2011
#START ADD SYMhisl Mon May 9 12:47:12 CLT 2011
type=symsl_ctl;minor=symhislctl symhislctl\N0
#END ADD SYMhisl Mon May 9 12:47:12 CLT 2011
type=pcmcia:event;name=pem pem
type=ddi_pseudo;name=vboxdrv \D
type=mc_driver;name=mchanger;minor=character mchanger\N0
type=ddi_byte:tape;addr=w500104f0009e92ab,0; rmt/40\M0
Tech Note:
1. In the example above, /dev/rmt/40 is permanently bound to a tape drive with WWN w500104f0009e92ab.
Also, /dev/mchanger[0-9] is bound to the tape library mchanger device(s)
2. The mchanger entry in /dev/devlink.tab is added by the ACSLS install_scsi_sol.sh script
D. The following notes are summary of hardware troubleshooting and diagnostic Device, Switch and HBA (direct connect)
issues. ( For details on specific device, please search the tech notes for such device. )
D.1 See if the device has an active FC configuration.
- SL3000 & SL500
For tape drives, verify from this SL Console info - (System Details->Select the drive -> Open Property Tab)
The PORT A Link Status should show "Initialized".
For tape libraries, verify from this SL Console info - (System Details->Select the library-> Open Tab Properties-> Open Tab SCSI)
The State should be "Health Available; Access State Active Optimized" and there should be a Port Speed of 1,2 or 4 GBit.
D.2 -SL24 & SL48
For Drive log to the Web interface, verify FC info - (Identity-> Drive )
Verify if there is a transfer speed set Port Type Automatic Speed 2 Gb/s. If there is a Speed Set the FC link has been set and is working!!!
E. At this point, the problem may be attributed to the device hardware or the FC connection to the switch port or host port. To troubleshoot, try these
steps:
- Disconnect the FC connector to the end device (i.e., tape drive or library). Then, connect it back again after about a minute wait. This should reinitialize the
communication link to the end device
- To rule out a problem with the communication port, connect the end device to another port on the switch; or, if directly connected to the host, connect to
another HBA port.
If the problem persists, rule out a problem with the end device. Try these steps:
- Reboot the switch port to reinitialize the link and clear stuck switch port issues
- Reboot the end device
Tech Note:
If the HBA is automatically setting the transport speed to a lower value (e.g., a 4Gb device is working at 2 Gb), try setting the port speed to a fixed higher
value (e.g., to 4Gb).
F. Considerations for zoning issues
If the end device, communication ports and communication links are all working fine, the customer should verify if the devices are zoned correctly to the
hosts in the SAN configuration.
References
<NOTE:1371222.1> - Tape - How To Diagnose a Fibre Channel Tape Drive Configuration Issue on Solaris 10
Attachments
This solution has no attachment