Asset ID: |
1-72-1568256.1 |
Update Date: | 2018-03-07 |
Keywords: | |
Solution Type
Problem Resolution Sure
Solution
1568256.1
:
Error messages of a failed Fibrechannel Host Bus Adaptor, identify a reason for SUN4-8000-5A fault
Related Items |
- Emulex LP11000 4Gb FC PCI-X HBA, 1 Port
- Sun Blade 6000 System
|
Related Categories |
- PLA-Support>Sun Systems>DISK>HBA>SN-DK: FC HBA
|
To provide information on how to identify a failed Host Bus Adaptor in a Solaris Sparc System
In this Document
Created from <SR 3-7494796125>
Applies to:
Emulex LP11000 4Gb FC PCI-X HBA, 1 Port - Version All Versions to All Versions [Release All Releases] Sun Blade 6000 System - Version All Versions to All Versions [Release All Releases] Oracle Solaris on SPARC (64-bit)
Symptoms
Fault Management Architecture on a T6320 Blade reported a SUN4-8000-5A fault followed by a PCIEX-8000-0A fault. The PCIEX-8000-0A fault was cleared by Administrator with "fmadm repair" command but the SUN4-8000-5A remained active.
The document provides information on how to identify a failed Host Bus Adaptor as root cause for the reported faults.
Cause
1) PCIEX-8000-0A fault
- the PCIEX-8000-0A fault could be retrieved with command
fmadm faulty -a
- the device marked as faulty was a Fibrechannel HBA in slot PCI-EM1
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Jun 27 11:16:44 5c9c0dee-ec3f-4b4f-b3c1-f26b856b071d PCIEX-8000-0A Critical
Host : myhost
Platform : SUNW,Sun-Blade-T6320 Chassis_id :
Product_sn :
Fault class : fault.io.pciex.device-interr
Affects : dev:////pci@0/pci@0/pci@8/fibre-channel@0
faulted and taken out of service
FRU : "PCI-EM1" (hc://:product-id=SUNW,Sun-Blade-T6320:server-id=myhost:chassis-id=1244CCR-08260002BR/motherboard=0/chip=0/hostbridge=0/pciexrc=0/pciexbus=2/pciexdev=0/pciexfn=0/pciexbus=3/pciexdev=8/pciexfn=0/pciexbus=8/pciexdev=0)
not present
2) SUN4-8000-5A fault
- the SUN4-8000-5A fault was still active as shown by fmadm faulty
--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Jun 27 11:16:39 514d5fc0-5ae4-65b8-e881-ac115d311c55 SUN4-8000-5A Critical
Host : myhost
Platform : SUNW,Sun-Blade-T6320 Chassis_id :
Product_sn :
Fault class : defect.io.fire.pciex.driver
Affects : mod:///mod-name=px/mod-id=34
faulted but still in service
Problem in : hc://:product-id=SUNW,Sun-Blade-T6320:server-id=myhost:chassis-id=1244CCR-08260002BR/motherboard=0/chip=0/hostbridge=0/pciexrc=0
faulted but still in service
- The SUN4-8000-5A fault is reported for a problem with the "PCI-Express subsystem software" which could lead to the assumption that PCI / Fibrechannel drivers caused the faults while
the root cause of both faults was a failing Host Bus Adaptor (HBA)
Solution
To identify the root cause the following command outputs / files were used:
- prtdiag -v
...
================================= IO Devices =================================
Slot + Bus Name + Model Speed
Status Type Path
------------------------------------------------------------------------------
MB/REM/SASHBA PCIE LSILogic,sas-pciex1000,58 LSI,1068E 2.5GTx4
/pci@0/pci@0/pci@2/LSILogic,sas@0
MB/PCI-EM1 PCIE fibre-channel-pciex10df,fe00 2.5GTx4
/pci@0/pci@0/pci@8/fibre-channel
MB/PCI-EM1 PCIE fibre-channel-pciex10df,fe00 LPem11002-S 2.5GTx4
/pci@0/pci@0/pci@8/fibre-channel@0,1
MB/PCI-EM0 PCIE SUNW,emlxs-pci10df,fc20 LPem11002-S 2.5GTx4
/pci@0/pci@0/pci@9/SUNW,emlxs@0
MB/PCI-EM0 PCIE SUNW,emlxs-pci10df,fc20 LPem11002-S 2.5GTx4
/pci@0/pci@0/pci@9/SUNW,emlxs@0,1
MB/NET0 PCIE network-pciex8086,105e 2.5GTx4
/pci@0/pci@0/pci@c/network@0
MB/NET1 PCIE network-pciex8086,105e 2.5GTx4
...
Prtdiag shows LPem11002-S - Oracle branded HBA ports. For help on HBA brand identification, please see Document 1282491.1.
While two LPem11002-S ports were reported for the Express Module in slot PCI-EM0 in the "Model" row, only one LPem11002-S port was reported for Express Module in slot PCI-EM1. The other difference was that PCI-EM0 had the "SUNW,emlxs" driver attached while PCI-EM1 had the generic "fibre-channel" driver attached.
- messages
Jun 27 09:23:33 myhost emlxs: [ID 349649 kern.info] [13.0812]emlxs0: ERROR: 240: Adapter reset failed. (Timeout: status=0x1)
Jun 27 09:23:33 myhost emlxs: [ID 349649 kern.info] [13.00FB]emlxs0: ERROR: 201: Adapter initialization failed. (Unable to init hba.)
Jun 27 09:23:33 myhost emlxs: [ID 349649 kern.info] [ 5.067D]emlxs0: ERROR: 201: Adapter initialization failed. (status=5)
Jun 27 09:23:33 myhost emlxs: [ID 349649 kern.info] [ B.19B2]emlxs0: ERROR: 101: Driver attach failed. (Unable to initialize adapter.)
The messages above show a failed Adapter initialization. The SUNW,emlxs driver instance 0 could not communicate with the Adapter port.
Jun 27 09:23:36 myhost emlxs: [ID 349649 kern.info] [13.02E9]emlxs1: NOTICE: 200: Adapter initialization. (Firmware update not needed.)
...
Jun 27 09:23:37 myhost emlxs: [ID 349649 kern.info] [ B.17C7]emlxs1: NOTICE: 100: Driver attach. (Emulex-S s10-64 sparc v2.60k (2011.03.24.16.45))
Jun 27 09:23:37 myhost emlxs: [ID 349649 kern.info] [ B.17CA]emlxs1: NOTICE: 100: Driver attach. (LPem11002-S Dev_id:fc20 Sub_id:fc2e Id:31)
Jun 27 09:23:37 myhost emlxs: [ID 349649 kern.info] [ B.17D7]emlxs1: NOTICE: 100: Driver attach. (Firmware:2.82a4 (Z3F2.82A4) Boot:5.02a1 Fcode:none)
Jun 27 09:23:37 myhost emlxs: [ID 349649 kern.info] [ B.1800]emlxs1: NOTICE: 100: Driver attach. (SLI:3 MSI:2 NPIV:0 FCA)
Jun 27 09:23:37 myhost emlxs: [ID 349649 kern.info] [ B.1806]emlxs1: NOTICE: 100: Driver attach. (WWPN:10000000C98560C3 WWNN:20000000C98560C3)
Jun 27 09:23:37 myhost pcieb: [ID 586369 kern.info] PCIE-device: fibre-channel@0,1, emlxs1
Jun 27 09:23:37 myhost genunix: [ID 936769 kern.info] emlxs1 is /pci@0/pci@0/pci@8/fibre-channel@0,1
Jun 27 09:23:37 myhost pcieb: [ID 586369 kern.info] PCIE-device: fibre-channel@0,1, emlxs1
The messages above show that port 1 of the Adapter could be initialized but the driver did not attach properly. Path is /pci@0/pci@0/pci@8/fibre-channel@0,1 instead of the expected /pci@0/pci@0/pci@8/SUNW,emlxs@0,1
- /etc/path_to_inst
# grep emlxs /etc/path_to_inst
"/pci@0/pci@0/pci@8/fibre-channel@0" 0 "emlxs" "/pci@0/pci@0/pci@8/fibre-channel@0,1" 1 "emlxs"
"/pci@0/pci@0/pci@9/SUNW,emlxs@0" 2 "emlxs" "/pci@0/pci@0/pci@9/SUNW,emlxs@0/fp@0,0" 1 "fp" "/pci@0/pci@0/pci@9/SUNW,emlxs@0/fp@1,0" 4 "fp"
"/pci@0/pci@0/pci@9/SUNW,emlxs@0,1" 3 "emlxs" "/pci@0/pci@0/pci@9/SUNW,emlxs@0,1/fp@0,0" 3 "fp" ...
Look up "emlxs" in /etc/path_to_inst to retrieve the path related to the driver instance ("/pci@0/pci@0/pci@8/fibre-channel@0" -> driver instance 0 of driver "emlxs"), which is Express Module PCI-EM1 per prtdiag output.
The Firmware of the Express Module in slot PCI-EM0 did not respond properly so the SUNW,emlxs driver failed to attach. Replacement of the Express Module resolved the issue and cleared the SUN4-8000-5A fault.
References<NOTE:1021362.1> - SUN4-8000-5A - Defective PCI-Express Device Driver
Attachments
This solution has no attachment
|