![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1602837.1 : FC HBA Emlxs ERROR: 420: Adapter Hardware Error. (Host Error Attention: Status=0x40000000
In this Document
Created from <SR 3-8084961071> Applies to:Sun SPARC Enterprise T5140 Server - Version All Versions and laterSolaris Operating System - Version 8.0 and later Emulex FC HBA - Version Not Applicable and later Information in this document applies to any platform. SymptomsSolaris 10 SPARC server shows one FC HBA LPe11000-S as NOT CONNECTED C# INST# PORT WWN MODEL FCODE STATUS DEVICE PATH
-- ----- -------- ----- ----- ------ ----------- c2 emlxs0 10000000c981b0f2 LPe11000-S 1.50a9 CONNECTED /pci@400/pci@0/pci@c/SUNW,emlxs@0 c3 emlxs1 10000000c9b03dd4 LPe11000-S 1.50a9 NOT CONNECTED /pci@500/pci@0/pci@9/SUNW,emlxs@0 <<--Problem
The fcinfo command does not return any errors, just that the port c3 state is Offline, and firmware is below HBA Port WWN: 10000000xxxxxxx2 HBA Port WWN: 10000000xxxxxxx4
The emlxs patch 149173-03 was installed, rebooted the server and upgraded firmware on that FC HBA. The link is up during booting : Nov 13 14:09:47 host1 emlxs: [ID 349649 kern.info] [13.0303]emlxs1: NOTICE: 200: Adapter initialization. (Firmware update not needed.)
Nov 13 14:09:48 host1 emlxs: [ID 349649 kern.info] [ B.1A84]emlxs1: NOTICE: 100: Driver attach. (Emulex-S s10-64 sparc v2.80.8.0 (2012.09.17.15.45)) Nov 13 14:09:48 host1 emlxs: [ID 349649 kern.info] [ B.1A87]emlxs1: NOTICE: 100: Driver attach. (LPe11000-S Dev_id:fc20 Sub_id:fc21 Id:25) Nov 13 14:09:48 host1 emlxs: [ID 349649 kern.info] [ B.1A94]emlxs1: NOTICE: 100: Driver attach. (Firmware:2.82a4 (Z3D2.82A4) Boot:5.02a1 Fcode:1.50a9) Nov 13 14:09:48 host1 emlxs: [ID 349649 kern.info] [ B.1AC4]emlxs1: NOTICE: 100: Driver attach. (SLI:3 MSI:2 NPIV:0 FCA) Nov 13 14:09:48 host1 emlxs: [ID 349649 kern.info] [ B.1ACC]emlxs1: NOTICE: 100: Driver attach. (WWPN:10000000C9B03DD4 WWNN:20000000C9B03DD4) Nov 13 14:09:48 host1 pcieb: [ID 586369 kern.info] PCIE-device: SUNW,emlxs@0, emlxs1 Nov 13 14:09:48 host1 genunix: [ID 936769 kern.info] emlxs1 is /pci@500/pci@0/pci@9/SUNW,emlxs@0 Nov 13 14:09:48 host1 pcieb: [ID 586369 kern.info] PCIE-device: SUNW,emlxs@0, emlxs1 Nov 13 14:09:48 host1 emlxs: [ID 349649 kern.info] [ B.0680]emlxs1: NOTICE: 720: Link up. (4Gb, fabric, initiator) Nov 13 14:09:48 host1 genunix: [ID 936769 kern.info] fp4 is /pci@500/pci@0/pci@9/SUNW,emlxs@0/fp@0,0
Nov 13 14:15:52 host1 emlxs: [ID 349649 kern.info] [13.11EE]emlxs1: ERROR: 420: Adapter hardware error. (HS_FFER1 cleared)
Nov 13 14:15:52 host1 emlxs: [ID 349649 kern.info] [13.1208]emlxs1: ERROR: 420: Adapter hardware error. (Host Error Attention: status=0x40000000 status1=0x93994 status2=0x6000000d) Nov 13 14:15:52 host1 emlxs: [ID 349649 kern.info] [ 5.03DD]emlxs1: NOTICE: 710: Link down. Nov 13 14:15:54 host1 emlxs: [ID 349649 kern.info] [ 6.0901]emlxs1:WARNING: 231: Adapter shutdown. (Reboot required.) Nov 13 14:16:14 host1 emlxs: [ID 349649 kern.info] [13.0303]emlxs1: NOTICE: 200: Adapter initialization. (Firmware update not needed.) Nov 13 14:16:24 host1 genunix: [ID 408114 kern.info] /pci@500/pci@0/pci@9/SUNW,emlxs@0 (emlxs1) down Nov 13 14:17:10 host1 emcp: [ID 801593 kern.notice] Error: Path Bus 3076 Tgt 500009740825F959 Lun 3 to 000292602430 is dead. Nov 13 14:17:10 host1 emcp: [ID 801593 kern.notice] Error: Killing bus 3076 to Symmetrix 000292602430 port 7fB. Nov 13 14:17:10 host1 emcp: [ID 801593 kern.notice] Error: Path Bus 3076 Tgt 500009740825F959 Lun 6 to 000292602430 is dead. Nov 13 14:17:10 host1 emcp: [ID 801593 kern.notice] Error: Path Bus 3076 Tgt 500009740825F959 Lun 5 to 000292602430 is dead. Nov 13 14:17:10 host1 emcp: [ID 801593 kern.notice] Error: Path Bus 3076 Tgt 500009740825F959 Lun 4 to 000292602430 is dead. Nov 13 14:17:10 host1 emcp: [ID 801593 kern.notice] Error: Path Bus 3076 Tgt 500009740825F959 Lun 1 to 000292602430 is dead. Nov 13 14:17:10 host1 emcp: [ID 801593 kern.notice] Error: Path Bus 3076 Tgt 500009740825F959 Lun 2 to 000292602430 is dead. Nov 13 14:17:22 host1 fctl: [ID 517869 kern.warning] WARNING: fp(4)::OFFLINE timeout
Due to that fma reports these errors: bash-3.2$ more fmdump-e.out
TIME CLASS Nov 13 14:15:54.8141 ereport.io.device.inval_state Nov 13 14:15:54.8142 ereport.io.service.lost bash-3.2$ more fmdump-eV.out
Nov 13 2013 14:15:54.814112899 ereport.io.device.inval_state nvlist version: 0 class = ereport.io.device.inval_state ena = 0xab7324acf9e09c01 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /pci@500/pci@0/pci@9/SUNW,emlxs@0 (end detector) __ttl = 0x1 __tod = 0x5283515a 0x30866083 Nov 13 2013 14:15:54.814246955 ereport.io.service.lost nvlist version: 0 class = ereport.io.service.lost ena = 0xab7324cdc5609c01 detector = (embedded nvlist) nvlist version: 0 version = 0x0 scheme = dev device-path = /pci@500/pci@0/pci@9/SUNW,emlxs@0 (end detector) __ttl = 0x1 __tod = 0x5283515a 0x30886c2b
NOTE. Observation of the problem shows that if the server reboots, the link comes up online , but again after some minutes the link fails with the same errors. Changes BROCADE-SW1:admin> portshow 4/16
portIndex: 28 portName: host1 - 10000000xxxxxxx4 portHealth: No Fabric Watch License Authentication: None portDisableReason: None portCFlags: 0x1 portFlags: 0x4001 PRESENT U_PORT LED LocalSwcFlags: 0x0 portType: 10.0 portState: 2 Offline Protocol: FC portPhys: 4 No_Light portScn: 2 Offline port generation number: 5018 state transition count: 22 portId: 082b00 portIfId: 43220013 portWwn: 20:1c:00:05:1e:36:00:02 portWwn of device(s) connected: Distance: normal portSpeed: N4Gbps LE domain: 0 FC Fastwrite: OFF Interrupts: 216 Link_failure: 4 Frjt: 0 Unknown: 3 Loss_of_sync: 34 Fbsy: 0 Lli: 160 Loss_of_sig: 59 Proc_rqrd: 530 Protocol_err: 0 Timed_out: 0 Invalid_word: 533481 <<<---- Rx_flushed: 0 Invalid_crc: 0 Tx_unavail: 0 Delim_err: 0 Free_buffer: 0 Address_err: 0 Overrun: 0 Lr_in: 11 Suspended: 0 Lr_out: 2 Parity_err: 0 Ols_in: 2 2_parity_err: 0 Ols_out: 11 CMI_bus_err: 0
Cause1) Failed Oracle Emulex FC HBA or Bug 24320491 - LPe12002-S ERROR: 420:Adapter hardware error. emlxs1: ERROR: 420: Adapter hardware error. (HS_FFER1 cleared)
emlxs1: ERROR: 420: Adapter hardware error. (Host Error Attention: status=0x40000000 status1=0x93994 status2=0x6000000d) This indicates that an interrupt has occurred and From a recent Bug 24320491 - LPe12002-S ERROR: 420:Adapter hardware error. From the error code in the emlxs driver messages we can state that the (Host Error Attention: status=0x40000000 status1= 0x9ee1a4 status2=0x6000000e) Specifically, "trap code of "0x6000000e" indicates that a parity error In the case of a parity error, Emulex does recommend that an adapter be reset
2) It has been found other cases with a similar failure, where the FC HBA has not been replaced and continues to work with no issues: Feb 14 06:39:07 server01 emlxs: [ID 349649 kern.info] [13.1225]emlxs1: ERROR: 420: Adapter hardware error. (HS_FFER1 cleared)
Feb 14 06:39:07 server01 emlxs: [ID 349649 kern.info] [13.123F]emlxs1: ERROR: 420: Adapter hardware error. (Host Error Attention: status=0x20000000 status1=0x1e78 status2=0x168200) Feb 14 06:39:07 server01 emlxs: [ID 349649 kern.info] [ 5.0401]emlxs1: NOTICE: 710: Link down. Feb 14 06:39:09 server01 emlxs: [ID 349649 kern.info] [ 6.0987]emlxs1:WARNING: 231: Adapter shutdown. (Reboot required.) Feb 14 06:39:12 server01 emlxs: [ID 349649 kern.info] [13.0315]emlxs1: NOTICE: 200: Adapter initialization. (Firmware update not needed.) Feb 14 06:39:39 server01 genunix: [ID 408114 kern.info] /pci@340/pci@1/pci@0/pci@c/SUNW,emlxs@0,1 (emlxs1) down
Feb 14 06:39:40 server01 fmd: [ID 377184 daemon.error] SUNW-MSG-ID: PCIEX-8000-0A, TYPE: Fault, VER: 1, SEVERITY: Critical Feb 14 06:39:40 server01 EVENT-TIME: Sat Feb 14 06:39:39 CET 2015 Feb 14 06:39:40 server01 PLATFORM: unknown, CSN: unknown, HOSTNAME: server01 Feb 14 06:39:40 server01 SOURCE: eft, REV: 1.16 Feb 14 06:39:40 server01 EVENT-ID: d1317dc3-aec4-4283-8da7-c859d4a1307d Feb 14 06:39:40 server01 DESC: A problem was detected for a PCIEX device. Feb 14 06:39:40 server01 AUTO-RESPONSE: One or more device instances may be disabled Feb 14 06:39:40 server01 IMPACT: Loss of services provided by the device instances associated with this fault Feb 14 06:39:40 server01 REC-ACTION: Use 'fmadm faulty' to provide a more detailed view of this event. Please refer to the associated reference document at http://support.oracle.com/msg/PCIEX-8000-0A for the latest service procedures and policies regarding this diagnosis. Feb 14 06:39:43 server01 genunix: [ID 631017 kern.notice] NOTICE: Device: already retired: /pci@340/pci@1/pci@0/pci@c/SUNW,emlxs@0,1
Feb 14 06:39:53 server01 genunix: [ID 390243 kern.info] Creating /etc/devices/retire_store Feb 14 06:40:05 server01 emlxs: [ID 349649 kern.info] [13.0315]emlxs1: NOTICE: 200: Adapter initialization. (Firmware update not needed.)
The fma fault associated: Feb 14 06:39:39 d1317dc3-aec4-4283-8da7-c859d4a1307d PCIEX-8000-0A Critical
Problem Status : isolated Diag Engine : eft / 1.16 System Manufacturer : unknown Name : unknown Part_Number : unknown Serial_Number : unknown Host_ID : 84fafc23 ---------------------------------------- Suspect 1 of 1 : Fault class : fault.io.pciex.device-interr Certainty : 100% Affects : dev:////pci@340/pci@1/pci@0/pci@c/SUNW,emlxs@0,1 Status : faulted and taken out of service FRU Location : "PCIE4" Manufacturer : unknown Name : unknown Part_Number : unknown Revision : unknown Serial_Number : unknown Chassis Manufacturer : Oracle Corporation Name : SPARC T5-4 Part_Number : 31930909+7+1 Serial_Number : AK00117917 Status : faulty Description : A problem was detected for a PCIEX device. Response : One or more device instances may be disabled Impact : Loss of services provided by the device instances associated with this fault Action : Use 'fmadm faulty' to provide a more detailed view of this event. Please refer to the associated reference document at http://support.oracle.com/msg/PCIEX-8000-0A for the latest service procedures and policies regarding this diagnosis.
Clear the error , see: How to repair FMA module errors seen in 'fmadm faulty' (Doc ID 1332409.1) # fmadm repaired dev:////pci@340/pci@1/pci@0/pci@c/SUNW,emlxs@0,1
# fmadm acquit d1317dc3-aec4-4283-8da7-c859d4a1307d # fmadm flush PCIE4 Then reboot the server and check it has been removed from retired list: Instead of rebooting the server, before that the FC HBA can be reseted to see if this makes it work again: # luxadm -e port
/devices/pci@340/pci@1/pci@0/pci@c/SUNW,emlxs@0/fp@0,0:devctl NOT CONNECTED /devices/pci@340/pci@1/pci@0/pci@c/SUNW,emlxs@0,1/fp@0,0:devctl CONNECTED # luxadm -e offline /devices/pci@340/pci@1/pci@0/pci@c/SUNW,emlxs@0 # luxadm -e port /devices/pci@340/pci@1/pci@0/pci@c/SUNW,emlxs@0,1/fp@0,0:devctl CONNECTED # luxadm -e online /devices/pci@340/pci@1/pci@0/pci@c/SUNW,emlxs@0 # luxadm -e port /devices/pci@340/pci@1/pci@0/pci@c/SUNW,emlxs@0/fp@0,0:devctl CONNECTED <--it worked, now is connected again /devices/pci@340/pci@1/pci@0/pci@c/SUNW,emlxs@0,1/fp@0,0:devctl CONNECTED "Resetting a boot adapter may cause system instability.
Emulex assumes no responsibility for the consequences of reseting a boot adapter. Do you want to continue? Yes No " --> select Yes After reboot, if the problem persist, replace the FC HBA.
Please refer emlxs_messages.h which has the messages, located here:
SolutionIf you are facing the scenario presented above, recommended action is to collect firmware dump (if possible) to troubleshoot this problem further (see below) It has been found other cases with a similar failure, where the FC HBA has not been replaced and continues to work with no issues (see cause section) For example, in the case of a parity error (a trap code of "0x6000000e" indicates that)
Note. To troubleshoot this problem further, a firmware dumps can be collected and send Please pick up and install the OCM version 11.1.218.18-1 for your OS from https://www.broadcom.com/support/oem/oracle-fc/fibre-channel-8gb/sg-xpcie2fc-em8-z In this particular case (the FW-detected parity error), reboot if there's a Getting a useful dump: In order to help us provide you with the best possible support, please download and install OCM as directed above. The installation of this CLI will start the elxhbamgrd daemon process, which will ensure that upon failure, a usable firmware (FW) dump will be available to This is needed for all operating systems, and the default location for the firmware dump varies by OS: - Windows : In the Dump directory under the OneCommand Manager Installation Directory \Util\Dump\ Processes around creation and collection of firmware dumps after a fatal firmware error varies by generation of adapter, but in all cases, OCM In the case of the 8Gb adapters, manual FW dumps apparently do not collect fatal FW errors,
Some other notes from Emulex: If a hardware error is noted in the emlxs driver messages but a dump is not OCM must be installed. Existing fw dumps can be captured by restarting OCM 1. /opt/ELXocm/stop_ocmanager Server reboots will also collect firmware dumps to /opt/ELXocm/Dump/ Firmware dumps initiated through OCM (with hbacmd) There is a Bug closed to address this issue with hbacmd, see Bug 24450164 - Firmware dump collected by emlxs driver in kernel needs to be usuable. this bug was opened after the comments made by Emulex on this other bug: Bug 24320491 - LPe12002-S ERROR: 420:Adapter hardware error. --> this bug has been closed as not reproducible References<BUG:24320491> - LPE12002-S ERROR: 420:ADAPTER HARDWARE ERROR.<BUG:24450164> - FIRMWARE DUMP COLLECTED BY EMLXS DRIVER IN KERNEL NEEDS TO BE USUABLE <NOTE:1629921.1> - How To Get a Firmware Dump From an Emulex FC HBA <NOTE:1356876.1> - Firmware Update Required. (A Manual Hba Reset Or Link Reset (Using Luxadm Or Fcadm) Is Required <BUG:18940856> - ADAPTER HARDWARE ERROR / FMA ERROR <NOTE:1399644.1> - How to Locate FC HBA Manual to Get Oracle Fibre Channel (FC) HBA Port LED Patterns and Other HBA information Attachments This solution has no attachment |
||||||||||||||||||||
|