![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||
Solution Type Sun Alert Sure Solution 1369835.1 : Solaris 10 SPARC Kernel Patch 137137-09 May Cause Erroneous PCIEX-8000-KP/-J5 Reports During PCIE Correctable Events
In this Document
Applies to:Sun SPARC Enterprise M8000 Server - Version Not Applicable and laterSun SPARC Enterprise M9000-32 Server - Version Not Applicable and later Sun SPARC Enterprise M4000 Server - Version Not Applicable and later Sun Microsystems > Operating Systems > Solaris Operating System Solaris Operating System - Version 10 10/09 U8 to 10 10/09 U8 [Release 10.0] Information in this document applies to any platform. _____________________ Date of Resolved Release: 21-Oct-2011 ____________________________________ DescriptionAn issue with the Fault Management Architecture (FMA) in Solaris 10 SPARC kernel patch 137137-09 and certain Solaris 11 Express builds may cause erroneous PCIEX-8000-KP/PCIEX-8000-J5 reports during PCIE correctable events. These erroneous reports may result in unnecessary hardware replacement. OccurrenceThis issue can occur in the following releases:
Note 1: All SPARC platforms with PCI-E I/O Expansion Slots are impacted by this issue.
Note 2: Solaris 8, Solaris 9, and Solaris on the x86 platform are not impacted by this issue. Note 3: Solaris 11 Express distributions may include additional bug fixes above and beyond the build from which it was derived. The base build can be derived as follows: $ uname -v snv_151 If the output is of the format 151.x.x.x, then the build installed is snv_151. SymptomsWhen patch 137137-09 is installed, or a system is upgraded to a release that includes this patch or to an affected Solaris 11 Express build, FMA may report correctable errors not previously observed on the system. Eventually suspect devices may be reported faulty if Soft Error Rate Discrimination (SERD) thresholds are exceeded. SUNW-MSG-ID: PCIEX-8000-KP, TYPE: Fault, VER: 1, SEVERITY: Major EVENT-TIME: Tue Mar 29 21:03 PDT 2011 PLATFORM: SUNW,SPARC-Enterprise , CSN: -, HOSTNAME: - SOURCE: eft, REV: 1.16 EVENT-ID: af46a1fb-a712-617b-cab3-fc57b79a1dd9 DESC: Too many recovered bus errors have been detected, which indicates a problem with the specified bus or with the specified transmitting device. This may degrade into an unrecoverable fault. Refer to http://sun.com/msg/PCIEX-8000-KP for more information. AUTO-RESPONSE: One or more device instances may be disabled IMPACT: Loss of services provided by the device instances associated with this fault REQ-ACTION: If a plug-in card is involved check for badly-seated cards or bent pins. Otherwise schedule a repair procedure to replace the affected device. Use fmadm(1M) faulty to identify the device or contact Oracle for support. # fmadm faulty --------------- ------------------------------------ -------------- --------- TIME EVENT-ID MSG-ID SEVERITY --------------- ------------------------------------ -------------- --------- Mar 29 21:22:03 af46a1fb-a712-617b-cab3-fc57b79a1dd9 PCIEX-8000-KP Major Host : xyz1 Platform : SUNW,SPARC-Enterprise Chassis_id : xyz2400L Fault class : fault.io.pciex.device-interr-corr max 15% fault.io.pciex.bus-linkerr-corr max 8% Affects : dev:////pci@12,600000/network@0,3 dev:////pci@12,600000/network@0 dev:////pci@12,600000/network@0,1 dev:////pci@12,600000/network@0,2 dev:////pci@12,600000 faulted but still in service FRU : "iou#1-pci#3" (hc://:product-id=SUNW,SPARC-Enterprise:chassis-id=xyz2400L:server-id=xyz1305/chassis=0/ioboard=1/hostbridge=1/pciexrc=0 /pciexbus=2/pciexdev=0) max 15% "iou#1-pci#3" (hc:///component=iou#1-pci#3) 8% faulty Description: Too many recovered bus errors have been detected, which indicates a problem with the specified bus or with the specified transmitting device. This may degrade into an unrecoverable fault. Refer to http://sun.com/msg/PCIEX-8000-KP for more information. Response: One or more device instances may be disabled. Impact: Loss of services provided by the device instances associated with this fault Action: If a plug-in card is involved check for badly-seated cards or bent pins. Otherwise schedule a repair procedure to replace the affected device. Use fmadm(1M) faulty to identify the device or contact Oracle for support. Then execute the fmstat(1M) command to determine if a SERD threshold has been exceeded. Note: The output seen when encountering this issue will vary depending upon the patch level and affected SERD threshold as follows:
For patch 142909-17 or patch 147440-01: # fmstat -s -m eft NAME >N T CNT DELTA STAT serd.io.device.nonfatal_bdllp@... >6 2h 3 ... serd.io.pciex.corrlink-bus_bdllp@... >6 2h 3 ... or serd.io.device.nonfatal_btlp@... >6 2h 3 ... serd.io.pciex.corrlink-bus_btlp@ >6 2h 3 ... or serd.io.device.nonfatal_re@... >6 2h 3 ... serd.io.pciex.corrlink-bus_re@... >6 2h 3 ... For patch 141444-09: # fmstat -s -m eft NAME >N T CNT DELTA STAT serd.io.device.nonfatal_corr@.. >6 2h 3 ... serd.io.pciex.corrlink-bus@... >6 2h 3 ... For patch 137137-09 or patch 139555-08: # fmstat -s -m eft NAME >N T CNT DELTA STAT serd.io.pciex.corrlink@.. >6 2h 3 ... WorkaroundThere is no workaround for this issue.
Note: After installing the Solaris 10 patch, PCIEX-8000-KP/PCIEX-8000-J5 faults should be cleared using the fmadm(1M) command.
# fmadm acquit <EVENT-ID > where the event-id is obtained from the output from the "fmadm faulty" command as shown in the symptoms section above. Please also see Reference <Document:1021322.1> for additional information. Patches<SUNPATCH:147705-01> History21-Oct-2011: Date of Resolved Release # pkg info entire | grep Summary Apply IDR147518-01, available on Aug. 09, 2011, made for S10U9 KU144488-05 which address this bug. Please send technical questions to the following email: Internal Contributor/Submitter: daniel.ice@oracle.com Attachments This solution has no attachment |
||||||||||||||||||||||||
|