Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1369835.1
Update Date:2018-02-13
Keywords:

Solution Type  Sun Alert Sure

Solution  1369835.1 :   Solaris 10 SPARC Kernel Patch 137137-09 May Cause Erroneous PCIEX-8000-KP/-J5 Reports During PCIE Correctable Events  


Related Items
  • Sun SPARC Enterprise M9000-64 Server
  •  
  • Sun SPARC Enterprise M9000-32 Server
  •  
  • Sun SPARC Enterprise M5000 Server
  •  
  • Sun SPARC Enterprise M8000 Server
  •  
  • Sun Hardware - Generic
  •  
  • Solaris Operating System
  •  
  • Sun SPARC Enterprise M4000 Server
  •  
  • Sun SPARC Enterprise M5000 Server
  •  
  • Solaris Operating System
  •  
  • Sun SPARC Enterprise M3000 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>Sun_Other>Sun Collections>SN-OTH: Sun Alert
  •  
  • _Old GCS Categories>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  




In this Document
Description
Occurrence
Symptoms
Workaround
Patches
History
References


Applies to:

Sun SPARC Enterprise M8000 Server - Version Not Applicable and later
Sun SPARC Enterprise M9000-32 Server - Version Not Applicable and later
Sun SPARC Enterprise M4000 Server - Version Not Applicable and later
Sun Microsystems > Operating Systems > Solaris Operating System
Solaris Operating System - Version 10 10/09 U8 to 10 10/09 U8 [Release 10.0]
Information in this document applies to any platform.
_____________________



Date of Resolved Release: 21-Oct-2011
____________________________________


Description

An issue with the Fault Management Architecture (FMA) in Solaris 10 SPARC kernel patch 137137-09 and certain Solaris 11 Express builds may cause erroneous PCIEX-8000-KP/PCIEX-8000-J5 reports during PCIE correctable events. These erroneous reports may result in unnecessary hardware replacement.

Occurrence

This issue can occur in the following releases:

SPARC Platform

  • Solaris 10 with patch 137137-09 and without patch 147705-01
  • Solaris 11 Express based upon builds snv_87 through snv_170
Note 1: All SPARC platforms with PCI-E I/O Expansion Slots are impacted by this issue.

Note 2: Solaris 8, Solaris 9, and Solaris on the x86 platform are not impacted by this issue.

Note 3: Solaris 11 Express distributions may include additional bug fixes above and beyond the build from which it was derived. The base build can be derived as follows:

   $ uname -v
   snv_151

If the output is of the format 151.x.x.x, then the build installed is snv_151.

Symptoms

When patch 137137-09 is installed, or a system is upgraded to a release that includes this patch or to an affected Solaris 11 Express build, FMA may report correctable errors not previously observed on the system. Eventually suspect devices may be reported faulty if Soft Error Rate Discrimination (SERD) thresholds are exceeded.

If the described issue occurs, the following message will be seen on the system console:

    SUNW-MSG-ID: PCIEX-8000-KP, TYPE: Fault, VER: 1, SEVERITY: Major
    EVENT-TIME: Tue Mar  29 21:03 PDT 2011
    PLATFORM: SUNW,SPARC-Enterprise , CSN: -, HOSTNAME: -
    SOURCE: eft, REV: 1.16
    EVENT-ID: af46a1fb-a712-617b-cab3-fc57b79a1dd9
    DESC: Too many recovered bus errors have been detected, which indicates a problem with the specified bus
    or with the specified transmitting device. This may degrade into an unrecoverable fault.
    
    Refer to http://sun.com/msg/PCIEX-8000-KP for more information.
    
    AUTO-RESPONSE: One or more device instances may be disabled

    IMPACT: Loss of services provided by the device instances associated with this fault

    REQ-ACTION: If a plug-in card is involved check for badly-seated cards or bent pins. Otherwise schedule a repair procedure 
    to replace the affected device.

Use fmadm(1M) faulty to identify the device or contact Oracle for support.

    # fmadm faulty
    --------------- ------------------------------------  -------------- ---------
    TIME            EVENT-ID                              MSG-ID         SEVERITY
    --------------- ------------------------------------  -------------- ---------
    Mar 29 21:22:03 af46a1fb-a712-617b-cab3-fc57b79a1dd9  PCIEX-8000-KP  Major

    Host        : xyz1
    Platform    : SUNW,SPARC-Enterprise     Chassis_id  : xyz2400L

    Fault class : fault.io.pciex.device-interr-corr max 15%
                  fault.io.pciex.bus-linkerr-corr max 8%
    Affects     : dev:////pci@12,600000/network@0,3
                  dev:////pci@12,600000/network@0
                  dev:////pci@12,600000/network@0,1
                  dev:////pci@12,600000/network@0,2
                  dev:////pci@12,600000
                  faulted but still in service

    FRU         : "iou#1-pci#3" (hc://:product-id=SUNW,SPARC-Enterprise:chassis-id=xyz2400L:server-id=xyz1305/chassis=0/ioboard=1/hostbridge=1/pciexrc=0
                  /pciexbus=2/pciexdev=0) max 15%
                  "iou#1-pci#3" (hc:///component=iou#1-pci#3) 8%
                  faulty

    Description: Too many recovered bus errors have been detected, which indicates 
    a problem with the specified bus or with the specified transmitting device. This may degrade into an unrecoverable fault.
    Refer to http://sun.com/msg/PCIEX-8000-KP for more information.

    Response: One or more device instances may be disabled.
    
    Impact: Loss of services provided by the device instances associated with this fault

    Action: If a plug-in card is involved check for badly-seated cards or 
    bent pins. Otherwise schedule a repair procedure to replace the 
    affected device.  Use fmadm(1M) faulty to identify the device or contact Oracle for support.

Then execute the fmstat(1M) command to determine if a SERD threshold has been exceeded.

Note: The output seen when encountering this issue will vary depending upon the patch level and affected SERD threshold as follows:

For patch 142909-17 or patch 147440-01:

    # fmstat -s -m eft
    NAME >N     T CNT                DELTA STAT
    serd.io.device.nonfatal_bdllp@... >6    2h   3    ...
    serd.io.pciex.corrlink-bus_bdllp@... >6    2h   3   ...

or
    serd.io.device.nonfatal_btlp@... >6    2h   3     ...
    serd.io.pciex.corrlink-bus_btlp@ >6    2h   3    ...

or

    serd.io.device.nonfatal_re@... >6    2h   3      ...
    serd.io.pciex.corrlink-bus_re@... >6    2h   3     ...

For patch 141444-09:

    # fmstat -s -m eft
    NAME >N     T CNT                DELTA STAT
    serd.io.device.nonfatal_corr@.. >6    2h   3    ...
    serd.io.pciex.corrlink-bus@... >6    2h   3       ...

For patch 137137-09 or patch 139555-08:

    # fmstat -s -m eft
    NAME >N     T CNT                DELTA STAT
    serd.io.pciex.corrlink@.. >6    2h   3          ...

Workaround

There is no workaround for this issue.

This issue is resolved in the following releases:

SPARC Platform

  • Solaris 10 with patch 147705-01 or later
  • Solaris 11 Express based upon builds snv_171 or later
Note:  After installing the Solaris 10 patch, PCIEX-8000-KP/PCIEX-8000-J5 faults should be cleared using the fmadm(1M) command.

    # fmadm acquit <EVENT-ID >

where the event-id is obtained from the output from the "fmadm faulty" command as shown in the symptoms section above.

Please also see Reference <Document:1021322.1> for additional information.

Patches

<SUNPATCH:147705-01>

History

21-Oct-2011: Date of Resolved Release
03-Nov-2011: Updated product field to include version for Hot Topics
09-Feb-2012: Updated to include specific Product attribution
16-Aug-2013: Updated to include reference to PCIEX-8000-J5 and 1021322.1

Internal Notes:

This regression was caused by the putback for CR 6510830.
This was taken into patch 137137-02, but the only revision of this patch available to customers is 137137-09.

In Solaris 11 Express 2010.11 this issue is resolved in SRU12. The SRU installed on a customer system may be determined by running the following command:

# pkg info entire | grep Summary

Summary: entire incorporation including Support Repository
Update (Oracle Solaris 11 Express 2010.11 SRU 11). (....)

Apply IDR147518-01, available on Aug. 09, 2011, made for S10U9 KU144488-05 which address this bug.
It requires FMA patch 146855-01 or above.

Please send technical questions to the following email:
sunalertpublication_us_grp@oracle.com
and copy the Responsible Engineer/Contributor listed below.

Internal Contributor/Submitter: daniel.ice@oracle.com
Internal Eng Responsible Engineer: daniel.ice@oracle.com
Internal Services Knowledge Engineer: jeff.folla@oracle.com
Internal Eng Business Unit Group: Systems RPE
Internal Escalation ID: 3-3313205201 3-3313205201 3-3842072191
Internal Resolution Patches: 147705-01


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback