Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-2007819.1
Update Date:2017-11-27
Keywords:

Solution Type  Sun Alert Sure

Solution  2007819.1 :   Solaris 10 and Solaris 11 Fault Management Architecture (FMA) on SPARC T5-2 Systems May Report PCIEX-8000-YJ Major Events  


Related Items
  • Sun Software - Generic
  •  
  • Solaris Operating System
  •  
  • SPARC T5-2
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun Alert
  •  




In this Document
Description
Occurrence
Symptoms
Workaround
Patches
History
References


Applies to:

Solaris SPARC Operating System
Sun Software - Generic
SPARC T5-2
SPARC
Information in this document applies to any platform.
_________________________________________



Date of Workaround Release: 07-May-2015
Date of Resolved Release: 18-May-2015
_________________________________________

Description

Solaris 10 and Solaris 11 Fault Management Architecture (FMA) on SPARC T5-2 systems may report 'PCIEX-8000-YJ'  major events in response to a series of 'ereport.io.pciex.dl.rto' ereports, which may lead to unnecessary hardware replacement.

Occurrence

This issue can occur in the following releases on SPARC T5-2 Systems:

SPARC Platform:

  • Solaris 10 without patch 149279-04
  • Solaris 11.0.0.2.0 through Solaris 11.2.9.5.0

Note 1: Solaris 8 and Solaris 9 will not be evaluated regarding the potential impact of the issue described in this document.

Note 2: This issue only affects SPARC T5-2 platforms; other SPARC T-series and x86 platforms are not affected.

Symptoms

If the described issue occurs, fmdump(1M) will show 'ereport.io.pciex.dl.rto' ereports, and  fmadm(1M) will show a device fault similar to the following:

    % fmdump -e | grep 'dl.rto'
    Mar 20 09:07:49.5799 ereport.io.pciex.dl.rto        
    Mar 20 09:09:54.3963 ereport.io.pciex.dl.rto        
    Mar 20 09:30:07.2231 ereport.io.pciex.dl.rto        
    Mar 20 09:52:47.2987 ereport.io.pciex.dl.rto        
    Mar 20 10:06:18.8544 ereport.io.pciex.dl.rto        
    Mar 20 10:09:16.1113 ereport.io.pciex.dl.rto        
    Mar 20 10:10:07.0947 ereport.io.pciex.dl.rto        
    Mar 20 10:22:56.2874 ereport.io.pciex.dl.rto        
    Mar 20 10:30:08.9394 ereport.io.pciex.dl.rto

    % fmadm faulty
    --------------- ------------------------------------  -------------- ---------
    TIME            EVENT-ID                              MSG-ID         SEVERITY
    --------------- ------------------------------------  -------------- ---------
    Mar 20 10:30:09 b81624b5-a939-6195-e9a2-b5b1ff54af60  PCIEX-8000-YJ  Major

    Host        : xxx
    Platform    : sun4v-platform    Chassis_id  : xxx
    Product_sn  : xxx

    Fault class : fault.io.pciex.device-pcie-ce 67%
                  fault.io.pciex.bus-linkerr-corr 33%
    Affects     : dev:////pci@340/pci@1/pci@0
                  faulted but still in service
    FRU         : "/SYS/MB" (hc://:product-id=sun4v-platform:product-sn=xxx:server-id=xxx:chassis-id=xxx:serial=xxx:part=7076601:revision=01/chassis=0/motherboard=0)
                  faulty

    Description : Too many recovered bus errors have been detected, which indicates
                  a problem with the specified bus or with the specified
                  transmitting device. This may degrade into an unrecoverable
                  fault.

    Response    : One or more device instances may be disabled

    Impact      : Loss of services provided by the device instances associated with
                  this fault

    Action      : Use 'fmadm faulty' to provide a more detailed view of this event.
                  If a plug-in card is involved check for badly-seated cards or
                  bent pins. Please refer to the associated reference document at
                  http://sun.com/msg/PCIEX-8000-YJ for the latest service
                  procedures and policies regarding this diagnosis.

Workaround

This issue is addressed in the following releases:

SPARC Platform:

  • Solaris 10 with patch 149279-04 or later
  • Solaris 11.2.10.5.0 or later

Patches

 <PATCH:149279-04>

History

07-May-2015: Document released, status Workaroud.
15-May-2015: Updated with Solaris 10 patch information.
18-May-2015: Updated with Solaris 11 fix. Status Resolved.

Internal Section: Comments:

The issue here is that the PCIe Replay Time Out (RTO) is exceeded so the FMA Soft Error Rate Discriminator (SERD) engine generates
an FMA fault. The fix for this issue is an increase to the RTO FMA SERD threshold to 18 events in 2 hours (previously 6 events in 2 hours).

Note that 20245857 is actually a SW workaround for T5-2 HW CR 18504976 - which is investigating the underlying cause of the RTOs.

Questions regarding this document should be addressed to
sunalertpublication_us_grp@oracle.com and copy the
responsible engineer listed below.

Internal Contributor/Submitter: daniel.ice@oracle.com
Internal Eng Responsible Engineer:  daniel.ice@oracle.com
Oracle Knowledge Analyst: jeff.folla@oracle.com
Internal Eng Business Unit Group: Systems RPE
Internal Associated SRs: 3-10448066231, 3-10448380851, 3-10466059695, 3-10240009291,
3-10093660061, 3-10087820551, 3-10024267795, 3-10343354651, 3-10308431283, 3-10489080881
Internal Pending Patches:
Internal Resolution Patches:149279-04, 11.2.10.5.0

References

<BUG:20245857> - FAULT.IO.PCIEX.DEVICE-PCIE-CE & FAULT.IO.PCIEX.BUS-LINKERR-CORR REPORT ON T5-2


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback