![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1645310.1 : Red Hat Linux - FC HBA Qla2xxx Abort Command Issued, qla2xxx_eh_bus_reset: reset succeeded To Sun ZFS Storage 7320 Through Brocade 300
In this Document
Created from <SR 3-8386161861> Applies to:Qlogic FC HBA - Version All Versions to All Versions [Release All Releases]Emulex FC HBA - Version All Versions and later Sun Fire X4270 M2 Server - Version All Versions and later Information in this document applies to any platform. SymptomsRed Hat Enterprise Linux Server release 5.9 , on Oracle SUN FIRE X4270 M2 SERVER with Oracle FC HBAs, accessing Oracle Storage 7320 servers object of the problem are server01 and server02 /var/log/messages:
Jan 22 08:05:11 server02 auditd[17338]: Audit daemon rotating log files Jan 22 09:36:03 server02 kernel: qla2xxx 0000:21:00.0: scsi(7:1:12): Abort command issued -- 1 1370fe4b 2002. Jan 22 09:36:14 server02 kernel: qla2xxx 0000:21:00.0: scsi(7:1:12): Abort command issued -- 1 1370fe4b 2002. Jan 22 09:36:14 server02 kernel: qla2xxx 0000:21:00.0: scsi(7:1:12): DEVICE RESET ISSUED. Jan 22 09:36:14 server02 kernel: qla2xxx 0000:21:00.0: scsi(7:1:12): DEVICE RESET SUCCEEDED. Jan 22 09:36:25 server02 kernel: qla2xxx 0000:21:00.0: scsi(7:1:12): Abort command issued -- 1 1370fe4b 2002. Jan 22 09:36:25 server02 kernel: qla2xxx 0000:21:00.0: scsi(7:1:12): LOOP RESET ISSUED. Jan 22 09:36:25 server02 kernel: qla2xxx 0000:21:00.0: qla2xxx_eh_bus_reset: reset succeeded Jan 22 09:36:46 server02 kernel: qla2xxx 0000:21:00.0: scsi(7:1:12): Abort command issued -- 1 1370fe4b 2002. Jan 22 09:36:46 server02 kernel: qla2xxx 0000:21:00.0: scsi(7:1:12): ADAPTER RESET ISSUED. Jan 22 09:36:46 server02 kernel: qla2xxx 0000:21:00.0: Performing ISP error recovery - ha= ffff81047f78c4f8. Jan 22 09:36:47 server02 kernel: qla2xxx 0000:21:00.0: LIP reset occured (f700). Jan 22 09:36:47 server02 kernel: qla2xxx 0000:21:00.0: LOOP UP detected (4 Gbps). Jan 22 09:36:47 server02 kernel: qla2xxx 0000:21:00.0: qla2xxx_eh_host_reset: reset succeeded Jan 22 09:37:08 server02 kernel: qla2xxx 0000:21:00.0: scsi(7:1:12): Abort command issued -- 1 1370fe4b 2002. Jan 22 09:37:08 server02 kernel: sd 7:0:1:12: scsi: Device offlined - not ready after error recovery Jan 22 09:37:08 server02 kernel: sd 7:0:1:12: rejecting I/O to offline device Jan 22 09:37:08 server02 kernel: Buffer I/O error on device sdd, logical block 0 Jan 22 09:37:08 server02 kernel: sd 7:0:1:12: rejecting I/O to offline device Jan 22 09:37:08 server02 kernel: Buffer I/O error on device sdd, logical block 0 Jan 22 09:38:08 server02 kernel: qla2xxx 0000:2f:00.0: scsi(9:1:12): Abort command issued -- 1 3b8a77d 2002. Jan 22 09:38:18 server02 kernel: qla2xxx 0000:2f:00.0: scsi(9:1:12): Abort command issued -- 1 3b8a77d 2002. Jan 22 09:38:18 server02 kernel: qla2xxx 0000:2f:00.0: scsi(9:1:12): DEVICE RESET ISSUED. Jan 22 09:38:18 server02 kernel: qla2xxx 0000:2f:00.0: scsi(9:1:12): DEVICE RESET SUCCEEDED. Jan 22 09:38:28 server02 kernel: qla2xxx 0000:2f:00.0: scsi(9:1:12): Abort command issued -- 1 3b8a77d 2002. Jan 22 09:38:28 server02 kernel: qla2xxx 0000:2f:00.0: scsi(9:1:12): LOOP RESET ISSUED. Jan 22 09:38:28 server02 kernel: qla2xxx 0000:2f:00.0: qla2xxx_eh_bus_reset: reset succeeded Jan 22 09:38:48 server02 kernel: qla2xxx 0000:2f:00.0: scsi(9:1:12): Abort command issued -- 1 3b8a77d 2002. Jan 22 09:38:48 server02 kernel: qla2xxx 0000:2f:00.0: scsi(9:1:12): ADAPTER RESET ISSUED. Jan 22 09:38:48 server02 kernel: qla2xxx 0000:2f:00.0: Performing ISP error recovery - ha= ffff81047d0584f8. Jan 22 09:38:49 server02 kernel: qla2xxx 0000:2f:00.0: LIP reset occured (f700). Jan 22 09:38:49 server02 kernel: qla2xxx 0000:2f:00.0: LIP occured (f700). Jan 22 09:38:49 server02 kernel: qla2xxx 0000:2f:00.0: LIP reset occured (f7f7). Jan 22 09:38:49 server02 kernel: qla2xxx 0000:2f:00.0: LOOP UP detected (4 Gbps). Jan 22 09:38:50 server02 kernel: qla2xxx 0000:2f:00.0: qla2xxx_eh_host_reset: reset succeeded Jan 22 09:39:11 server02 kernel: qla2xxx 0000:2f:00.0: scsi(9:1:12): Abort command issued -- 1 3b8a77d 2002. Jan 22 09:39:11 server02 kernel: sd 9:0:1:12: scsi: Device offlined - not ready after error recovery Jan 22 09:39:11 server02 kernel: sd 9:0:1:12: rejecting I/O to offline device Jan 22 09:39:11 server02 kernel: Buffer I/O error on device sdf, logical block 0 Jan 22 09:39:11 server02 kernel: sd 9:0:1:12: rejecting I/O to offline device Jan 22 09:39:11 server02 kernel: Buffer I/O error on device sdf, logical block 0
qla2xxx 0000:2f:00.0:
QLogic Fibre Channel HBA Driver: 8.03.07.09.05.08-k QLogic QLE2562 - Sun StorageTek 8Gb FC PCIe HBA, dual port ISP2532: PCIe (5.0Gb/s x4) @ 0000:2f:00.0 hdma+, host#=9, fw=5.06.03 (90d5) GSI 32 sharing vector 0x74 and IRQ 32 ACPI: PCI Interrupt 0000:2f:00.1[B] -> GSI 42 (level, low) -> IRQ 116 qla2xxx 0000:2f:00.1: Found an ISP2532, irq 116, iobase 0xffffc200100ec000 qla2xxx 0000:2f:00.1: Configuring PCI space... PCI: Setting latency timer of device 0000:2f:00.1 to 64 qla2xxx 0000:2f:00.1: Configure NVRAM parameters... qla2xxx 0000:2f:00.1: Verifying loaded RISC code... qla2xxx 0000:2f:00.1: Allocated (64 KB) for EFT... qla2xxx 0000:2f:00.1: Allocated (1414 KB) for firmware dump... scsi10 : qla2xxx qla2xxx 0000:2f:00.1: QLogic Fibre Channel HBA Driver: 8.03.07.09.05.08-k QLogic QLE2562 - Sun StorageTek 8Gb FC PCIe HBA, dual port ISP2532: PCIe (5.0Gb/s x4) @ 0000:2f:00.1 hdma+, host#=10, fw=5.06.03 (90d5) qla2xxx 0000:21:00.0: LIP reset occured (f700). qla2xxx 0000:21:00.0: LIP occured (f700).
We have focused analysis on server server01
We have replaced the FC HBAs on server server01 Also replaced the SFP on port6 of Brocade switches where the FC HBAs of this server are connected. Customer also replaced the FC cables
Mar 3 14:28:01 server01 kernel: scsi(7): RFT_ID exiting normally.
Mar 3 14:28:01 server01 kernel: scsi(7): RFF_ID exiting normally. Mar 3 14:28:01 server01 kernel: scsi(7): RNN_ID exiting normally. Mar 3 14:28:01 server01 kernel: scsi(7): RSNN_NN exiting normally. Mar 3 14:28:01 server01 kernel: scsi(7): GID_PT entry - nn 20000024ff4a8b58 pn 21000024ff4a8b58 portid=010600. Mar 3 14:28:01 server01 kernel: scsi(7): GID_PT entry - nn 20000024ff34235e pn 21000024ff34235e portid=010100. Mar 3 14:28:01 server01 kernel: scsi(7): GID_PT entry - nn 20000024ff38c330 pn 21000024ff38c330 portid=010000. Mar 3 14:28:01 server01 kernel: scsi(7): GPSC failed, rejected request: Mar 3 14:28:01 server01 kernel: 0 1 2 3 4 5 6 7 8 9 Ah Bh Ch Dh Eh Fh Mar 3 14:28:01 server01 kernel: -------------------------------------------------------------- Mar 3 14:28:02 server01 kernel: 01 00 00 00 fa 01 00 00 80 01 00 00 00 0b 00 00 Mar 3 14:28:02 server01 kernel: scsi(7): GPSC command unsupported, disabling query... Mar 3 14:28:02 server01 kernel: scsi(7): device wrap (010000) Mar 3 14:28:02 server01 kernel: scsi(7): Trying Fabric Login w/loop id 0x0081 for port 010100. Mar 3 14:28:02 server01 kernel: scsi(7): Trying Fabric Login w/loop id 0x0082 for port 010000. Mar 3 14:28:02 server01 kernel: scsi(7): LOOP READY Mar 3 14:28:02 server01 kernel: scsi(7): qla2x00_loop_resync - end Mar 3 14:28:02 server01 kernel: scsi(7): Asynchronous PORT UPDATE ignored 0000/0004/0600. Mar 3 14:28:02 server01 kernel: scsi(7): Asynchronous PORT UPDATE ignored 0000/0007/0b00. Mar 3 14:28:04 server01 kernel: qla2xxx_eh_abort(9): aborting sp ffff8103917cc840 from RISC. pid=28264. Mar 3 14:28:04 server01 kernel: qla2xxx 0000:2f:00.0: scsi(9:1:4) FCP command status: 0x5-0x0 (0x80000) portid=010000 oxid=0x23 ser=0x6e68 cdb=008000 len=0x0 rsp_info=0x0 resid=0x0 fw_resid=0x0 Mar 3 14:28:04 server01 kernel: qla2xxx 0000:2f:00.0: scsi(9:1:4): Abort command issued -- 1 6e68 2002. Mar 3 14:28:04 server01 kernel: qla2xxx 0000:2f:00.0: scsi(9:1:4): LOOP RESET ISSUED. Mar 3 14:28:04 server01 kernel: qla2x00_wait_for_hba_online return_status=0 Mar 3 14:28:04 server01 kernel: qla2xxx 0000:2f:00.0: qla2xxx_eh_bus_reset: reset succeeded Mar 3 14:28:04 server01 kernel: scsi(9): Asynchronous P2P MODE received. Mar 3 14:28:04 server01 kernel: scsi(9): qla2x00_reset_marker() Mar 3 14:28:04 server01 kernel: scsi(9): Asynchronous PORT UPDATE. Mar 3 14:28:04 server01 kernel: scsi(9): Port database changed ffff 0006 0000. Mar 3 14:28:04 server01 kernel: scsi(9): qla2x00_loop_resync() Mar 3 14:28:04 server01 kernel: scsi(9): F/W Ready - OK Mar 3 14:28:04 server01 kernel: scsi(9): fw_state=3 (3, 0, 4, 0) curr time=1149a59d8. Mar 3 14:28:04 server01 kernel: scsi(9): Configure loop -- dpc flags =0x5260 Mar 3 14:28:04 server01 kernel: scsi(9): RSCN queue entry[0] = [00/000000]. Mar 3 14:28:04 server01 kernel: scsi(9): device_resync: rscn overflow. Mar 3 14:28:04 server01 kernel: scsi(9): RFT_ID exiting normally. Mar 3 14:28:04 server01 kernel: scsi(9): RFF_ID exiting normally. Mar 3 14:28:04 server01 kernel: scsi(9): RNN_ID exiting normally. Mar 3 14:28:04 server01 kernel: scsi(9): RSNN_NN exiting normally. Mar 3 14:28:04 server01 kernel: scsi(9): GID_PT entry - nn 20000024ff35ce36 pn 21000024ff35ce36 portid=010600. Mar 3 14:28:04 server01 kernel: scsi(9): GID_PT entry - nn 20000024ff34235f pn 21000024ff34235f portid=010100. Mar 3 14:28:04 server01 kernel: scsi(9): GID_PT entry - nn 20000024ff38c331 pn 21000024ff38c331 portid=010000. Mar 3 14:28:04 server01 kernel: scsi(9): GPSC failed, rejected request: Mar 3 14:28:04 server01 kernel: 0 1 2 3 4 5 6 7 8 9 Ah Bh Ch Dh Eh Fh Mar 3 14:28:04 server01 kernel: --------------------------------------------------------------
scsi 9:0:1:4: scsi: Device offlined - not ready after error recovery
scsi 9:0:1:4: timing out command, waited 22s scsi 9:0:1:0: Unexpected response from lun 4 while scanning, scan aborted qla2xxx 0000:2f:00.0: scsi(9:1:4): Abort command issued -- 1 3bd 2002. qla2xxx 0000:21:00.0: scsi(7:1:4): Abort command issued -- 1 1a900 2002. scsi 7:0:1:4: scsi: Device offlined - not ready after error recovery scsi 7:0:1:4: timing out command, waited 22s scsi 7:0:1:0: Unexpected response from lun 4 while scanning, scan aborted qla2xxx 0000:2f:00.0: scsi(9:1:4): Abort command issued -- 1 3bd 2002. qla2xxx 0000:2f:00.0: scsi(9:1:4): DEVICE RESET ISSUED. qla2xxx 0000:2f:00.0: scsi(9:1:4): DEVICE RESET SUCCEEDED. qla2xxx 0000:21:00.0: scsi(7:1:4): Abort command issued -- 1 1b362 2002. qla2xxx 0000:2f:00.0: scsi(9:1:4): Abort command issued -- 1 3bd 2002. qla2xxx 0000:2f:00.0: scsi(9:1:4): LOOP RESET ISSUED. qla2xxx 0000:2f:00.0: qla2xxx_eh_bus_reset: reset succeeded qla2xxx 0000:21:00.0: scsi(7:1:4): Abort command issued -- 1 1b362 2002. qla2xxx 0000:21:00.0: scsi(7:1:4): DEVICE RESET ISSUED. qla2xxx 0000:21:00.0: scsi(7:1:4): DEVICE RESET SUCCEEDED. qla2xxx 0000:21:00.0: scsi(7:1:4): Abort command issued -- 1 1b362 2002. qla2xxx 0000:21:00.0: scsi(7:1:4): LOOP RESET ISSUED. qla2xxx 0000:21:00.0: qla2xxx_eh_bus_reset: reset succeeded qla2xxx 0000:2f:00.0: scsi(9:1:4): Abort command issued -- 1 3bd 2002. qla2xxx 0000:2f:00.0: scsi(9:1:4): ADAPTER RESET ISSUED. qla2xxx 0000:2f:00.0: Performing ISP error recovery - ha= ffff81047acdc4f8. qla2xxx 0000:2f:00.0: LIP reset occured (f700). qla2xxx 0000:2f:00.0: LIP occured (f700). qla2xxx 0000:2f:00.0: LIP reset occured (f7f7). qla2xxx 0000:2f:00.0: LOOP UP detected (4 Gbps). qla2xxx 0000:2f:00.0: qla2xxx_eh_host_reset: reset succeeded qla2xxx 0000:21:00.0: scsi(7:1:4): Abort command issued -- 1 1b362 2002. qla2xxx 0000:21:00.0: scsi(7:1:4): ADAPTER RESET ISSUED. qla2xxx 0000:21:00.0: Performing ISP error recovery - ha= ffff81047f5944f8. qla2xxx 0000:21:00.0: LIP reset occured (f700). qla2xxx 0000:21:00.0: LIP occured (f700). qla2xxx 0000:21:00.0: LIP reset occured (f7f7). qla2xxx 0000:21:00.0: LOOP UP detected (4 Gbps). qla2xxx 0000:21:00.0: qla2xxx_eh_host_reset: reset succeeded qla2xxx 0000:2f:00.0: scsi(9:1:4): Abort command issued -- 1 3bd 2002.
switchName: swA
Index Port Address Media Speed State Proto ============================================== 6 6 010600 id N4 Online FC F-Port 21:00:00:24:ff:4a:8b:58 switchName: swB Index Port Address Media Speed State Proto ============================================== 6 6 010600 id N4 Online FC F-Port 21:00:00:24:ff:35:ce:36
These ports present enc_out values being way above normal threshold, also high values for link fail, loss sync, loss sig : Switch Name: swA Version: v6.4.0c
fabriclog --show :
Time Stamp Input and *Action S, P Sn,Pn Port Xid =================================================================================== Switch 0; Thu Feb 27 10:00:38 2014 GMT (GMT+0:00) 10:00:38.349377 *Removing all nodes from port D2,P0 D2,P0 6 NA 10:00:38.349494 SCN Port F_PORT D2,P1 D2,P0 6 NA 10:01:25.450826 SCN Port Offline;g=0x381e8 D2,P0 D2,P0 6 NA 10:01:25.450843 *Removing all nodes from port D2,P0 D2,P0 6 NA 10:01:25.464843 SCN LR_PORT (0);g=0x381e8 D2,P0 D2,P0 6 NA 10:01:25.464883 SCN Port Online;g=0x381e8 D2,P0 D2,P1 6 NA 10:01:25.465060 Port Elp engaged D2,P1 D2,P0 6 NA 10:01:25.465123 *Removing all nodes from port D2,P0 D2,P0 6 NA 10:01:25.465243 SCN Port F_PORT D2,P1 D2,P0 6 NA 10:01:45.453616 SCN Port Offline;g=0x381ea D2,P0 D2,P0 6 NA 10:01:45.453633 *Removing all nodes from port D2,P0 D2,P0 6 NA 10:01:46.460183 SCN LR_PORT (0);g=0x381ea D2,P0 D2,P0 6 NA 10:01:46.463966 SCN Port Online;g=0x381ea D2,P0 D2,P1 6 NA 10:01:46.464143 Port Elp engaged D2,P1 D2,P0 6 NA 10:01:46.464193 *Removing all nodes from port D2,P0 D2,P0 6 NA 10:01:46.464309 SCN Port F_PORT D2,P1 D2,P0 6 NA
qla2xxx 0000:21:00.0: LIP reset occured (f700).
qla2xxx 0000:21:00.0: LIP occured (f700). qla2xxx 0000:21:00.0: LIP reset occured (f7f7). qla2xxx 0000:21:00.0: LOOP UP detected (4 Gbps). qla2xxx 0000:21:00.0: qla2xxx_eh_host_reset: reset succeeded qla2xxx 0000:2f:00.0: scsi(9:1:4): Abort command issued -- 1 3bd 2002.
8bit/10bit encoding errors occurred in words (ordered sets) outside of the FC frame. Words outside of frames are encoded, if this encoding is corrupted or an
error is detected enc_out is generated. It indicates a problem if it increments faster than the link-bit error rate allows, approximately once every 20 minutes for 1 Gbit/s. Statistically, enc_out errors on their own imply a cable/connector problem. Enc_out errors and crc_err together imply GBIC/SFP problem. Such errors are also expected every time a user brings a port down and up (i.e. reboot host, power-cycle storage subsystem, unplug/plug cable or portdisable/portenable etc).Such errors will also be generated on a link which has a 1Gbit/s port connected to a 2Gbit/s port when autonegotiation is turned off.
CauseOn the Storage 7320 , we observed that we have high load on system: over 9 load average considering only 8 CPU cores available. If problem recur, next time can try to compare time stamps on qlc messages with switch messages to work out which messages happened first, and also we can try to enable on linux servers qla2xxx extended logging to see if any ASC/ASCQ sense data is received: Doc ID 1452418.1 - How to configure detailed error logging for QLogic/Emulex (qla2xxx/lpfc) HBA driver on Oracle Linux & Oracle VMDelete Reference SolutionUpgrade the 7320 storage to the latest firmware. And take into account the next document: References<NOTE:1582197.1> - Very Slow I/O Performance On Storage - Qlogic HBA Reporting Dropped Frame And I/O Error<NOTE:1003754.1> - How to Gather Brocade SupportSave or SupportShow Logs <NOTE:1452418.1> - How to configure detailed error logging for QLogic/Emulex (qla2xxx/lpfc) HBA driver on Oracle Linux & Oracle VM <NOTE:1531816.1> - Enable Error Logging for QLogic/Emulex (qla2xxx/lpfc) HBA Driver on Oracle Linux & Oracle VM <NOTE:1612676.1> - How to get Fibre Channel HBA information from Linux SOSreport Attachments This solution has no attachment |
||||||||||||||||||
|