![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1370677.1 : FC HBA (Invalid Tx Word Count Errors Are Increasing)
How to troubleshoot suspected FC HBA problems where a link is bouncing, timeouts are seen and the Invalid Tx Word counter is increasing. In this Document
Created from <SR 3-4817403391> Applies to:Qlogic FC HBA - Version All Versions to All Versions [Release All Releases]Emulex FC HBA - Version All Versions to All Versions [Release All Releases] Sun SPARC Enterprise M5000 Server - Version All Versions and later Sun Storage FC HBA - Version Not Applicable and later Information in this document applies to any platform. SymptomsYou may see one or more of the following symptoms: scsi: [ID 243001 kern.warning] WARNING: /scsi_vhci (scsi_vhci0): /scsi_vhci/ssd@g6006016077a0290080b06a0303eae011 (ssd45): Command Timeout on path /pci@3,700000/SUNW,emlxs@0/fp@0,0 (fp2) /scsi_vhci/ssd@g600a0b80005b8bcc00000c434a8d400d (ssd75): Command Timeout on path fp1/ssd@w202300a0b85b8bda,7 emlxs: [ID 349649 kern.info] [ 5.031F]emlxs0: NOTICE: 710: Link down. scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g6006016074a02900b69d3895b1b3e011 (ssd31): SCSI transport failed: reason 'tran_err': retrying command scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g6006016077a0290080b06a0303eae011 (ssd45): SCSI transport failed: reason 'tran_err': retrying command scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g6006016074a02900965ecbcb92b8e011 (ssd40): SCSI transport failed: reason 'tran_err': retrying command emlxs: [ID 349649 kern.info] [ 5.0549]emlxs0: NOTICE: 720: Link up. (4Gb, fabric, initiator)
# fcinfo hba-port -l
HBA Port WWN: 10000000c991afba OS Device Name: /dev/cfg/c1 Manufacturer: Emulex Model: LPe11002-S Firmware Version: 2.82a4 (Z3F2.82A4) FCode/BIOS Version: Boot:5.02a1 Fcode:1.50a9 Serial Number: 0999BT0-094200059G Driver Name: emlxs Driver Version: 2.60h (2010.10.22.16.55) Type: N-port State: online Supported Speeds: 1Gb 2Gb 4Gb Current Speed: 4Gb Node WWN: 20000000c991afba Link Error Statistics: Link Failure Count: 0 Loss of Sync Count: 177 Loss of Signal Count: 0 Primitive Seq Protocol Error Count: 0 Invalid Tx Word Count: 1337580037 Invalid CRC Count: 0 # fcinfo hba-port -l HBA Port WWN: 10000000c991afba OS Device Name: /dev/cfg/c1 Manufacturer: Emulex Model: LPe11002-S Firmware Version: 2.82a4 (Z3F2.82A4) FCode/BIOS Version: Boot:5.02a1 Fcode:1.50a9 Serial Number: 0999BT0-094200059G Driver Name: emlxs Driver Version: 2.60h (2010.10.22.16.55) Type: N-port State: online Supported Speeds: 1Gb 2Gb 4Gb Current Speed: 4Gb Node WWN: 20000000c991afba Link Error Statistics: Link Failure Count: 0 Loss of Sync Count: 177 Loss of Signal Count: 0 Primitive Seq Protocol Error Count: 0 Invalid Tx Word Count: 1337583533 Invalid CRC Count: 0
# luxadm -e rdls /dev/cfg/c1
Link Error Status information for loop: al_pa lnk fail sync loss signal loss sequence err invalid word CRC 20000 4 9 5 0 1020 0 20800 5 95 5 0 1275 0 31700 0 0 0 0 1 0 31c00 0 0 0 0 2 0 21200 0 177 0 0 1337583533 0 # luxadm -e rdls /dev/cfg/c1 Link Error Status information for loop: al_pa lnk fail sync loss signal loss sequence err invalid word CRC 20000 4 9 5 0 1020 0 20800 5 95 5 0 1275 0 31700 0 0 0 0 1 0 31c00 0 0 0 0 2 0 21200 0 177 0 0 1337625025 0 Changes
CauseThe increasing error counters here are reported against incoming signal decoding violations. In other words, in these situations where the counters are seen increasing on the host side, the most likely cause of the errors is some component outside the server sending the traffic INTO the HBA. Therefore the SFP on the switch side and/or the cable should be examined first for properly secured connections. Please note: If the HBA is QLogic, and the only indication of a fault is the luxadm and fcinfo Invalid Word counts then please also check <Document 1594320.1> Incorrect Invalid Tx Word Counts may be reported against QLogic HBAs
SolutionCheck fc switch port error counters by logging onto the fc switch to see if they indicate a issue on fc switch side. Note. It has been found situations were fc switch port error counters are not increased, but only increasing when looking from server side with luxadm and fcinfo. In these cases, look into sfpshow (on Brocade switches) for Tx and Rx values, a lower value of Rx may indicate a wrong type of FC cable used, see: Brocade FC Switch Port RX Power Shows Low Value - FC Cable Types - SFP Types (Doc ID 2306903.1)
Note: To help isolate issue faster, if possible, use a known good fc cable and connect directly from fc hba port to the fc switch port in order to bypass all patch panels, splices etc. between them and then monitor for a few days. If no longer have issue then that would point to issue being in the cabling, patch panels, etc.
2. If server has Qlogic fc hba cards check also doc: Incorrect Invalid Tx Word Counts may be reported against QLogic HBAs (Doc ID 1594320.1)
If still have issue: 1. Verify fc hba is Oracle branded, see doc: 2. Open a Oracle Service Request (SR) and provide error count samples from each step and a new explorer output: 3. Collect and upload FC Switch port error counters and port/sfp light power levels 4. Collect and upload FC HBA light port power levels <Document 2345039.1> How to check Fibre Channel (FC) HBA port Light Tx and Rx Power Levels 5. Provide details on connection between the fc hbaport and the fc switch port, is there just a single cable or are patch panels, splices, etc. involved? 6. Verify and provide server address/location and site contact person information. Attachments This solution has no attachment |
||||||||||||||||||||
|