Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1568256.1
Update Date:2018-03-07
Keywords:

Solution Type  Problem Resolution Sure

Solution  1568256.1 :   Error messages of a failed Fibrechannel Host Bus Adaptor, identify a reason for SUN4-8000-5A fault  


Related Items
  • Emulex LP11000 4Gb FC PCI-X HBA, 1 Port
  •  
  • Sun Blade 6000 System
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>HBA>SN-DK: FC HBA
  •  


To provide information on how to identify a failed Host Bus Adaptor in a Solaris Sparc System

In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-7494796125>

Applies to:

Emulex LP11000 4Gb FC PCI-X HBA, 1 Port - Version All Versions to All Versions [Release All Releases]
Sun Blade 6000 System - Version All Versions to All Versions [Release All Releases]
Oracle Solaris on SPARC (64-bit)

Symptoms

Fault Management Architecture on a T6320 Blade reported a SUN4-8000-5A fault followed by a PCIEX-8000-0A fault.
The PCIEX-8000-0A fault was cleared by Administrator with "fmadm repair" command but the SUN4-8000-5A remained active.

The document provides information on how to identify a failed Host Bus Adaptor as root cause for the reported faults.

Cause

1) PCIEX-8000-0A fault

    • the PCIEX-8000-0A fault could be retrieved with command

      fmadm faulty -a

 

  • the device marked as faulty was a Fibrechannel HBA in slot PCI-EM1

    --------------- ------------------------------------  -------------- ---------
    TIME            EVENT-ID                              MSG-ID         SEVERITY
    --------------- ------------------------------------  -------------- ---------
    Jun 27 11:16:44 5c9c0dee-ec3f-4b4f-b3c1-f26b856b071d  PCIEX-8000-0A  Critical  
    
    Host        : myhost
    Platform    : SUNW,Sun-Blade-T6320	Chassis_id  : 
    Product_sn  : 
    Fault class : fault.io.pciex.device-interr
    Affects     : dev:////pci@0/pci@0/pci@8/fibre-channel@0
                      faulted and taken out of service
    FRU         : "PCI-EM1" (hc://:product-id=SUNW,Sun-Blade-T6320:server-id=myhost:chassis-id=1244CCR-08260002BR/motherboard=0/chip=0/hostbridge=0/pciexrc=0/pciexbus=2/pciexdev=0/pciexfn=0/pciexbus=3/pciexdev=8/pciexfn=0/pciexbus=8/pciexdev=0)
                      not present


2) SUN4-8000-5A fault

  • the SUN4-8000-5A fault was still active as shown by fmadm faulty

    --------------- ------------------------------------  -------------- ---------
    TIME            EVENT-ID                              MSG-ID         SEVERITY
    --------------- ------------------------------------  -------------- ---------
    Jun 27 11:16:39 514d5fc0-5ae4-65b8-e881-ac115d311c55  SUN4-8000-5A   Critical  
    
    Host        : myhost
    Platform    : SUNW,Sun-Blade-T6320	Chassis_id  : 
    Product_sn  : 
    
    Fault class : defect.io.fire.pciex.driver
    Affects     : mod:///mod-name=px/mod-id=34
                      faulted but still in service
    Problem in  : hc://:product-id=SUNW,Sun-Blade-T6320:server-id=myhost:chassis-id=1244CCR-08260002BR/motherboard=0/chip=0/hostbridge=0/pciexrc=0
                      faulted but still in service
    
     
  • The SUN4-8000-5A fault is reported for a problem with the "PCI-Express subsystem software" which could lead to the assumption that PCI / Fibrechannel drivers caused the faults while
    the root cause of both faults was a failing Host Bus Adaptor (HBA)

 

 

Solution

To identify the root cause the following command outputs / files were used:

  • prtdiag -v

    ...
    ================================= IO Devices =================================
    Slot +            Bus   Name +                            Model        Speed 
    Status            Type  Path                                                 
    ------------------------------------------------------------------------------
    MB/REM/SASHBA     PCIE  LSILogic,sas-pciex1000,58         LSI,1068E    2.5GTx4
                            /pci@0/pci@0/pci@2/LSILogic,sas@0           
    
    MB/PCI-EM1        PCIE  fibre-channel-pciex10df,fe00                   2.5GTx4
                            /pci@0/pci@0/pci@8/fibre-channel            
    MB/PCI-EM1        PCIE  fibre-channel-pciex10df,fe00      LPem11002-S  2.5GTx4
                            /pci@0/pci@0/pci@8/fibre-channel@0,1        
    
    MB/PCI-EM0        PCIE  SUNW,emlxs-pci10df,fc20           LPem11002-S  2.5GTx4
                            /pci@0/pci@0/pci@9/SUNW,emlxs@0             
    MB/PCI-EM0        PCIE  SUNW,emlxs-pci10df,fc20           LPem11002-S  2.5GTx4
                            /pci@0/pci@0/pci@9/SUNW,emlxs@0,1           
    
    MB/NET0           PCIE  network-pciex8086,105e                         2.5GTx4
                            /pci@0/pci@0/pci@c/network@0                
    MB/NET1           PCIE  network-pciex8086,105e                         2.5GTx4
    ...
    
     
    Prtdiag shows LPem11002-S - Oracle branded HBA ports. For help on HBA brand identification, please see Document 1282491.1.

    While two LPem11002-S ports were reported for the Express Module in slot PCI-EM0 in the "Model" row, only one LPem11002-S port was reported for Express Module in slot PCI-EM1.
    The other difference was that PCI-EM0 had the "SUNW,emlxs" driver attached while PCI-EM1 had the generic "fibre-channel" driver attached.


  • messages

    Jun 27 09:23:33 myhost emlxs: [ID 349649 kern.info] [13.0812]emlxs0:  ERROR: 240: Adapter reset failed. (Timeout: status=0x1)
    Jun 27 09:23:33 myhost emlxs: [ID 349649 kern.info] [13.00FB]emlxs0:  ERROR: 201: Adapter initialization failed. (Unable to init hba.)
    Jun 27 09:23:33 myhost emlxs: [ID 349649 kern.info] [ 5.067D]emlxs0:  ERROR: 201: Adapter initialization failed. (status=5)
    Jun 27 09:23:33 myhost emlxs: [ID 349649 kern.info] [ B.19B2]emlxs0:  ERROR: 101: Driver attach failed. (Unable to initialize adapter.)
    
     
    The messages above show a failed Adapter initialization. The SUNW,emlxs driver instance 0 could not communicate with the Adapter port.

    Jun 27 09:23:36 myhost emlxs: [ID 349649 kern.info] [13.02E9]emlxs1: NOTICE: 200: Adapter initialization. (Firmware update not needed.)
    ...
    Jun 27 09:23:37 myhost emlxs: [ID 349649 kern.info] [ B.17C7]emlxs1: NOTICE: 100: Driver attach. (Emulex-S s10-64 sparc v2.60k (2011.03.24.16.45))
    Jun 27 09:23:37 myhost emlxs: [ID 349649 kern.info] [ B.17CA]emlxs1: NOTICE: 100: Driver attach. (LPem11002-S Dev_id:fc20 Sub_id:fc2e Id:31)
    Jun 27 09:23:37 myhost emlxs: [ID 349649 kern.info] [ B.17D7]emlxs1: NOTICE: 100: Driver attach. (Firmware:2.82a4 (Z3F2.82A4) Boot:5.02a1 Fcode:none)
    Jun 27 09:23:37 myhost emlxs: [ID 349649 kern.info] [ B.1800]emlxs1: NOTICE: 100: Driver attach. (SLI:3 MSI:2 NPIV:0 FCA)
    Jun 27 09:23:37 myhost emlxs: [ID 349649 kern.info] [ B.1806]emlxs1: NOTICE: 100: Driver attach. (WWPN:10000000C98560C3 WWNN:20000000C98560C3)
    
    Jun 27 09:23:37 myhost pcieb: [ID 586369 kern.info] PCIE-device: fibre-channel@0,1, emlxs1
    Jun 27 09:23:37 myhost genunix: [ID 936769 kern.info] emlxs1 is /pci@0/pci@0/pci@8/fibre-channel@0,1
    Jun 27 09:23:37 myhost pcieb: [ID 586369 kern.info] PCIE-device: fibre-channel@0,1, emlxs1

    The messages above show that port 1 of the Adapter could be initialized but the driver did not attach properly. Path is /pci@0/pci@0/pci@8/fibre-channel@0,1
    instead of the expected /pci@0/pci@0/pci@8/SUNW,emlxs@0,1

  • /etc/path_to_inst

    # grep emlxs /etc/path_to_inst

    "/pci@0/pci@0/pci@8/fibre-channel@0" 0 "emlxs"
    "/pci@0/pci@0/pci@8/fibre-channel@0,1" 1 "emlxs"

    "/pci@0/pci@0/pci@9/SUNW,emlxs@0" 2 "emlxs"
    "/pci@0/pci@0/pci@9/SUNW,emlxs@0/fp@0,0" 1 "fp"
    "/pci@0/pci@0/pci@9/SUNW,emlxs@0/fp@1,0" 4 "fp"

    "/pci@0/pci@0/pci@9/SUNW,emlxs@0,1" 3 "emlxs"
    "/pci@0/pci@0/pci@9/SUNW,emlxs@0,1/fp@0,0" 3 "fp"
    ...
     
    Look up "emlxs" in /etc/path_to_inst to retrieve the path related to the driver instance ("/pci@0/pci@0/pci@8/fibre-channel@0" -> driver instance 0  of driver "emlxs"), which is Express Module PCI-EM1 per prtdiag output.


    The Firmware of the Express Module in slot PCI-EM0 did not respond properly so the SUNW,emlxs driver failed to attach.
    Replacement of the Express Module resolved the issue and cleared the SUN4-8000-5A fault.

 

 

 

 

References

<NOTE:1021362.1> - SUN4-8000-5A - Defective PCI-Express Device Driver

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback