Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1667248.1
Update Date:2018-03-07
Keywords:

Solution Type  Problem Resolution Sure

Solution  1667248.1 :   FC HBA qlc Driver - isr, Internal Parity/Pause Error  


Related Items
  • Solaris Operating System
  •  
  • Sun SPARC Enterprise T5240 Server
  •  
  • Qlogic FC HBA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>HBA>SN-DK: FC HBA
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-8779044281>

Applies to:

Sun SPARC Enterprise T5240 Server - Version All Versions and later
Qlogic FC HBA - Version All Versions and later
Solaris Operating System - Version 10 3/05 to 10 3/05 [Release 10.0]
Information in this document applies to any platform.

Symptoms

This Solaris 10 server T5240 with two Oracle 4GB FC HBAs Dual port, all four ports direct connected to a SE6180 storage array;

C# INST# PORT WWN MODEL FCODE STATUS DEVICE PATH
-- ----- -------- ----- ----- ------ -----------
c2 qlc0 2100001b328e49bb SG-XPCIE2FC-QF4 2.01 CONNECTED /pci@500/pci@0/pci@c/SUNW,qlc@0
c3 qlc1 2101001b32ae49bb SG-XPCIE2FC-QF4 2.01 CONNECTED /pci@500/pci@0/pci@c/SUNW,qlc@0,1
c4 qlc2 2100001b328e6dc4 SG-XPCIE2FC-QF4 2.01 CONNECTED /pci@500/pci@0/pci@d/SUNW,qlc@0
c5 qlc3 2101001b32ae6dc4 SG-XPCIE2FC-QF4 2.01 CONNECTED /pci@500/pci@0/pci@d/SUNW,qlc@0,1

c2 = qlc0 (fp4) -> /devices/pci@500/pci@0/pci@c/SUNW,qlc@0/fp@0,0:devctl
================================================================================
Pos AL_PA ID Hard_Addr Port WWN Node WWN Type
0 e1 4 e1 20150080e51853b8 20040080e51853b8 0x0 (Disk device)
1 1 7d 0 2100001b328e49bb 2000001b328e49bb 0x1f (Unknown Type,Host Bus Adapter)


c3 = qlc1 (fp0) -> /devices/pci@500/pci@0/pci@c/SUNW,qlc@0,1/fp@0,0:devctl
================================================================================
Pos AL_PA ID Hard_Addr Port WWN Node WWN Type
0 e0 5 e0 20250080e51853b8 20040080e51853b8 0x0 (Disk device)
1 1 7d 0 2101001b32ae49bb 2001001b32ae49bb 0x1f (Unknown Type,Host Bus Adapter)


c4 = qlc2 (fp2) -> /devices/pci@500/pci@0/pci@d/SUNW,qlc@0/fp@0,0:devctl
================================================================================
Pos AL_PA ID Hard_Addr Port WWN Node WWN Type
0 ef 0 ef 20140080e51853b8 20040080e51853b8 0x0 (Disk device)
1 1 7d 0 2100001b328e6dc4 2000001b328e6dc4 0x1f (Unknown Type,Host Bus Adapter)


c5 = qlc3 (fp1) -> /devices/pci@500/pci@0/pci@d/SUNW,qlc@0,1/fp@0,0:devctl
================================================================================
Pos AL_PA ID Hard_Addr Port WWN Node WWN Type
0 e8 1 e8 20240080e51853b8 20040080e51853b8 0x0 (Disk device)
1 1 7d 0 2101001b32ae6dc4 2001001b32ae6dc4 0x1f (Unknown Type,Host Bus Adapter)

 

No FC errors reported by luxadm rdls :

luxadm_-e_rdls-devices-pci@500-pci@0-pci@c-SUNW,qlc@0-fp@0,0:devctl.out
::::::::::::::
Link Error Status information for loop:
al_pa lnk fail sync loss signal loss sequence err invalid word CRC
e1 5 17 10 0 0 0
1 0 0 0 0 67108864 0  <<--
NOTE: These LESB counts are not cleared by a reset, only power cycles.
These counts must be compared to previously read counts.


67108864 decimal = 100000000000000000000000000 binary
explained by document: Incorrect Invalid Tx Word Counts may be reported against QLogic HBAs (Doc ID 1594320.1)

 


luxadm_-e_rdls-devices-pci@500-pci@0-pci@c-SUNW,qlc@0,1-fp@0,0:devctl.out
::::::::::::::
Link Error Status information for loop:
al_pa lnk fail sync loss signal loss sequence err invalid word CRC
e0 4 13 9 0 0 0
1 0 0 0 0 0 0
NOTE: These LESB counts are not cleared by a reset, only power cycles.
These counts must be compared to previously read counts.

luxadm_-e_rdls-devices-pci@500-pci@0-pci@d-SUNW,qlc@0-fp@0,0:devctl.out
::::::::::::::
Link Error Status information for loop:
al_pa lnk fail sync loss signal loss sequence err invalid word CRC
ef 4 14 9 0 0 0
1 0 0 0 0 0 0
NOTE: These LESB counts are not cleared by a reset, only power cycles.
These counts must be compared to previously read counts.

luxadm_-e_rdls-devices-pci@500-pci@0-pci@d-SUNW,qlc@0,1-fp@0,0:devctl.out
::::::::::::::
Link Error Status information for loop:
al_pa lnk fail sync loss signal loss sequence err invalid word CRC
e8 4 14 10 0 0 0
1 0 0 0 0 0 0
NOTE: These LESB counts are not cleared by a reset, only power cycles.
These counts must be compared to previously read counts.



Two LUNs from 6180 array:

  10. c6t60080E50001853B8000007325088A09Bd0
  /scsi_vhci/ssd@g60080e50001853b8000007325088a09b
  11. c6t60080E50001871EC000007ED5088A3BBd0
  /scsi_vhci/ssd@g60080e50001871ec000007ed5088a3bb



The problem was observed here, only one error, no other scsi errors or fc errors, just this isolated error,
with no other reason :

Mar 28 22:56:35 server1 qlc: [ID 262021 kern.warning] WARNING: qlc(0): isr, Internal Parity/Pause Error - hccr=0h, stat=e38113h, count=0
Mar 28 22:56:41 server1 qlc: [ID 630585 kern.info] NOTICE: Qlogic qlc(0): Loop OFFLINE
Mar 28 22:56:43 server1 qlc: [ID 630585 kern.info] NOTICE: Qlogic qlc(0): Loop ONLINE
Mar 28 22:56:43 server1 scsi: [ID 107833 kern.warning] WARNING: /scsi_vhci/ssd@g60080e50001871ec000007ed5088a3bb (ssd14):
Mar 28 22:56:43 server1 Error for Command: read(10) Error Level: Retryable
Mar 28 22:56:43 server1 scsi: [ID 107833 kern.notice] Requested Block: 235590744 Error Block: 235590744
Mar 28 22:56:43 server1 scsi: [ID 107833 kern.notice] Vendor: SUN Serial Number:
Mar 28 22:56:43 server1 scsi: [ID 107833 kern.notice] Sense Key: Unit Attention
Mar 28 22:56:43 server1 scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0

After that FC HBA continued to work with no issue.

Cause

We have seen this behavior before with old qlc driver versions, and it could be due to a software driver problem,
Solaris Systems Running VERITAS (VxVM) 4.1/5.0 With Certain HBAs Installed Are Unable to Complete I/O Operations and May Become Unresponsive With Certain Targets (Doc ID 1000428.1)

On our case we don't have vxvm, but this is an example of how this type of errors are not necessary a HW issue.

On this particular example, this was a punctual error (an isolated error occurring only at that point in time),
no other indication of a problem, no other errors on messages files, FC HBA is working fine,
so most probably this was due to old version of qlc driver customer has installed : qlc patch 143957-03 from Aug 2010
 

 

Solution

Recommendation is applying latest qlc driver patch 149175 (SPARC) or 149176 (x86), which requires a reboot.

Monitor this server for the next days and look for qlc "isr, Internal Parity/Pause Error"


Might also be helpful to have qlc driver extended logging enabled. Reference:
How to Enable the Logging of Extended (debug) Troubleshooting Messages for Oracle Qlogic Fibre Channel (FC) HBA card(s) on a Solaris Server (Doc ID 1587828.1)


If error persists, THEN, replace FC HBA card on the server, but be aware this is not necessary a HW problem, so in addition to replace the FC HBA, install latest latest qlc driver patch 149175 (SPARC) or 149176 (x86), which requires a reboot.
 

References

<BUG:15351223> - SUNBT6472115-SOLARIS_11 MANY THOUSANDS OF 'HBA16: ISR, INTERNAL PARITY/PAUSE ERR
<NOTE:1000428.1> - Solaris Systems Running VERITAS (VxVM) 4.1/5.0 With Certain HBAs Installed Are Unable to Complete I/O Operations and May Become Unresponsive With Certain Targets
<BUG:15635934> - SUNBT6943063 SYSTEM IS SEEING "QLC: ISR, INTERNAL PARITY/PAUSE ERROR" EVEN AF

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback