Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1624749.1
Update Date:2017-10-05
Keywords:

Solution Type  Problem Resolution Sure

Solution  1624749.1 :   Emulex HBA faulted following ereport.io.pciex.pl.re events when operating at PCIe Gen2  


Related Items
  • SPARC T3-4
  •  
  • SPARC T4-4
  •  
  • SPARC SuperCluster T4-4
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T4
  •  




In this Document
Symptoms
Cause
Solution


Created from <SR 3-8218008443>

Applies to:

SPARC T4-4 - Version All Versions to All Versions [Release All Releases]
SPARC T3-4 - Version All Versions to All Versions [Release All Releases]
SPARC SuperCluster T4-4 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

Systems report repeated FMA faults against Emulex Dual FC/Gigabit ethernet host adapters operating at PCIe Gen2 link speed.

Impacted component details;

371-4666-02 / SG-XPCIEFCGBE-E8-Z
Emulex LPem12002E-S

Fault events will look similar to the following reported by 'fmadm faulty';

--------------- ------------------------------------ -------------- ---------
TIME EVENT-ID MSG-ID SEVERITY
--------------- ------------------------------------ -------------- ---------
Feb 08 22:51:10 d57ba133-abf6-c03c-8419-a4663472040b PCIEX-8000-J5 Major

Host : t4test
Platform : ORCL,SPARC-T4-4 Chassis_id :
Product_sn :

Fault class : fault.io.pciex.device-interr-corr
Affects : dev:////pci@600/pci@1/pci@0/pci@4/pci@0
faulted but still in service
FRU : "PCI-EM4" (hc://:product-id=ORCL,SPARC-T4-4:product-sn=1204BDY993:server-id=t4cdrl01:chassis-id=1204BDY993/chassis
=0/motherboard=0/hostbridge=2/pciexrc=4/pciexbus=1/pciexdev=0/pciexfn=0/pciexbus=2/pciexdev=4/pciexfn=0/pciexbus=4/pciexdev=0)
faulty

Description : Too many recovered internal errors have been detected within the
specified PCIEX device. This may degrade into a non-recoverable
fault.
Refer to http://sun.com/msg/PCIEX-8000-J5 for more information.

Response : One or more device instances may be disabled

Impact : Loss of services provided by the device instances associated with
this fault

Action : Schedule a repair procedure to replace the affected device. Use
fmadm faulty to identify the device or contact Sun for support.

 

All 371-4666-02 Emulex HBAs should operate at PCIe Gen1 (2.5GT/s), if they are not then there is a risk they will suffer from excessive PCIe correctable errors. In order to confirm the cards configured operating speed first check prtdiag;

/SYS/PCI-EM4 PCIE SUNW,assigned-device-pciex10df,fc40 5.0GTx4
/pci@600/pci@1/pci@0/pci@4/pci@0/pci@3/SUNW,assigned-device@0
/SYS/PCI-EM4 PCIE SUNW,assigned-device-pciex10df,fc40 5.0GTx4
/pci@600/pci@1/pci@0/pci@4/pci@0/pci@3/SUNW,assigned-device@0,1

 

In this example the impacted device is operating at the incorrect PCIe Gen2 speed (5.0GT/s) and should be replaced.

Alternatively run operations POST mode, the PCIe switch configuration including link speed and width will be reported towards the end of POST.

 

On the SP;

set /HOST/diag mode=ops0

set /HOST/diag level=max

set /HOST/diag verbosity=max

Cause

Emulex HBA (371-4666-02/SG-XPCIEFCGBE-E8-Z) should negotiate at PCIe Gen1 (2.5GT/s), however if configured to operate at PCIe Gen2 (5.0GT/s) the cards can generate excessive PCIe correctable errors which will result in a FMA fault event if a sufficient density of events occur.
 

Solution

 Replace impacted Emulex HBA with part number 371-4666-02 or 7053435.

 

This issue is documented in Bug 18068575. There was a manufacturing process issue at Emulex where a number of the LPem12002E-S HBAs were flashed with the incorrect parameters that allowed the HBA to negotiate to PCIe Gen2 speeds. Unlike the process to update the preload table on QLogic HBAs, there is no way to update these flash parameters outside of the Emulex manufacturing environment, so as documented in Doc 1624749.1, the HBA must be replaced. Emulex has added an additional test at manufacturing to screen for the the correct flash parameters such that this issue does not reoccur in the future.


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback