Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1924028.1
Update Date:2017-09-21
Keywords:

Solution Type  Problem Resolution Sure

Solution  1924028.1 :   Fujitsu M10-4/M10-4S: PCI access errors (12bb0000) on both CMUL and CMUU  


Related Items
  • Fujitsu M10-4S
  •  
  • Fujitsu M10-4
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: Fujitsu M10
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-9519312981>

Applies to:

Fujitsu M10-4S - Version All Versions to All Versions [Release All Releases]
Fujitsu M10-4 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

PCI access errors appear on both the CMUL and CMUU ‘showlogs monitor’ output

Date: Aug 22 11:06:16 UTC 2014
  Code: 40000000-00a20400480400a204-12bb00000000000000000000
  Status: Warning Occurred: Aug 22 11:06:12.605 UTC 2014
  FRU: /BB#0/CMUL
  Msg: PCI access error
  Diagnostic Code:
  00000100 00000000 0000
  00000001 00000000 0000
  00000100 00000000 0000
  00000000 00000000 00000000 00000000
  00000000 00000000 0000
Date: Aug 22 10:48:35 UTC 2014
  Code: 40000000-006b0400a20400a204-12bb00000000000000000000
  Status: Warning Occurred: Aug 22 10:48:26.778 UTC 2014
  FRU: /BB#0/CMUU,/BB#0/CMUL
  Msg: PCI access error
  Diagnostic Code:
  00000101 00000000 0000
  00000301 00000000 0000
  00000301 00000000 0000
  00000000 00000000 00000000 00000000
  00000000 00000000 0000
Date: Aug 22 10:16:34 UTC 2014
  Code: 40000000-00a204006b0400a204-12bb00000000000000000000
  Status: Warning Occurred: Aug 22 10:16:30.788 UTC 2014
  FRU: /BB#0/CMUL,/BB#0/CMUU
  Msg: PCI access error
  Diagnostic Code:
  00000301 00000000 0000
  00000101 00000000 0000
  00000301 00000000 0000
  00000000 00000000 00000000 00000000
  00000000 00000000 0000

 

Faults appear in the FMA output reporting faults on both CMUL and CMUU. No PCI slot information will be available.

--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Aug 22 18:06:10 fcf4298f-99b8-e4dd-d488-d30804e03186  PCIEX-8000-YJ  Major    

Problem Status    : solved
Diag Engine       : eft / 1.16
System
   Manufacturer  : unknown
   Name          : ORCL,SPARC64-X
   Part_Number   : unknown
   Serial_Number : PZ01426021
   Host_ID       : 90071189

----------------------------------------
Suspect 1 of 3 :
  Fault class : fault.io.pciex.device-pcie-ce
  Certainty   : 75%
  Affects     : dev:////pci@8100/pci@4/pci@0
  Status      : faulted but still in service

  FRU
    Location         : "/BB0/CMUL"
    Manufacturer     : unknown
    Name             : unknown
    Part_Number      : 7088706
    Revision         : unknown
    Serial_Number    : PP142602BL
    Chassis
       Manufacturer  : unknown
       Name          : ORCL,SPARC64-X
       Part_Number   : 7088788              
       Serial_Number : PZ01426021
       Status        : faulty
----------------------------------------
Suspect 2 of 3 :
  Fault class : fault.io.pciex.bus-linkerr-corr
  Certainty   : 25%
  Affects     : dev:////pci@8100/pci@4
  Status      : faulted but still in service

  FRU
    Location         : "/BB0/CMUL"
    Manufacturer     : unknown
    Name             : unknown
    Part_Number      : 7088706
    Revision         : unknown
    Serial_Number    : PP142602BL
    Chassis
       Manufacturer  : unknown
       Name          : ORCL,SPARC64-X
       Part_Number   : 7088788              
       Serial_Number : PZ01426021
       Status        : faulty
----------------------------------------
Suspect 3 of 3 :
  Fault class : fault.io.pciex.device-pcie-ce
  Certainty   : 75%
  Affects     : dev:////pci@8100/pci@4/pci@0
  Status      : faulted but still in service

  FRU
    Location         : "/BB0/CMUL"
    Manufacturer     : unknown
    Name             : unknown
    Part_Number      : 7088706
    Revision         : unknown
    Serial_Number    : PP142602BL
    Chassis
       Manufacturer  : unknown
       Name          : ORCL,SPARC64-X
       Part_Number   : 7088788              
       Serial_Number : PZ01426021
       Status        : faulty

Description : Too many recovered bus errors have been detected, which indicates
             a problem with the specified bus or with the specified
             transmitting device. This may degrade into an unrecoverable
             fault.

Response    : One or more device instances may be disabled

Impact      : Loss of services provided by the device instances associated with
             this fault

Action      : Use 'fmadm faulty' to provide a more detailed view of this event.
             If a plug-in card is involved check for badly-seated cards or
             bent pins. Please refer to the associated reference document at
             http://support.oracle.com/msg/PCIEX-8000-YJ for the latest
             service procedures and policies regarding this diagnosis.

--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Aug 22 17:16:29 198aa417-5c27-69a5-87af-998bc3fe4936  PCIEX-8000-YJ  Major    

Problem Status    : solved
Diag Engine       : eft / 1.16
System
   Manufacturer  : unknown
   Name          : ORCL,SPARC64-X
   Part_Number   : unknown
   Serial_Number : PZ01426021
   Host_ID       : 90071189

----------------------------------------
Suspect 1 of 3 :
  Fault class : fault.io.pciex.device-pcie-ce
  Certainty   : 75%
  Affects     : dev:////pci@8700/pci@4/pci@0
  Status      : faulted but still in service

  FRU
    Location         : "/BB0/CMUU"
    Manufacturer     : unknown
    Name             : unknown
    Part_Number      : 7088708
    Revision         : unknown
    Serial_Number    : PP142601JR
    Chassis
       Manufacturer  : unknown
       Name          : ORCL,SPARC64-X
       Part_Number   : 7088788              
       Serial_Number : PZ01426021
       Status        : faulty
----------------------------------------
Suspect 2 of 3 :
  Fault class : fault.io.pciex.bus-linkerr-corr
  Certainty   : 25%
  Affects     : dev:////pci@8700/pci@4
  Status      : faulted but still in service

  FRU
    Location         : "/BB0/CMUU"
    Manufacturer     : unknown
    Name             : unknown
    Part_Number      : 7088708
    Revision         : unknown
    Serial_Number    : PP142601JR
    Chassis
       Manufacturer  : unknown
       Name          : ORCL,SPARC64-X
       Part_Number   : 7088788              
       Serial_Number : PZ01426021
       Status        : faulty
----------------------------------------
Suspect 3 of 3 :
  Fault class : fault.io.pciex.device-pcie-ce
  Certainty   : 75%
  Affects     : dev:////pci@8700/pci@4/pci@0
  Status      : faulted but still in service

  FRU
    Location         : "/BB0/CMUU"
    Manufacturer     : unknown
    Name             : unknown
    Part_Number      : 7088708
    Revision         : unknown
    Serial_Number    : PP142601JR
    Chassis
       Manufacturer  : unknown
       Name          : ORCL,SPARC64-X
       Part_Number   : 7088788              
       Serial_Number : PZ01426021
       Status        : faulty

Description : Too many recovered bus errors have been detected, which indicates
             a problem with the specified bus or with the specified
             transmitting device. This may degrade into an unrecoverable
             fault.

Response    : One or more device instances may be disabled

Impact      : Loss of services provided by the device instances associated with
             this fault

Action      : Use 'fmadm faulty' to provide a more detailed view of this event.
             If a plug-in card is involved check for badly-seated cards or
             bent pins. Please refer to the associated reference document at
             http://support.oracle.com/msg/PCIEX-8000-YJ for the latest
             service procedures and policies regarding this diagnosis.
-------------------------------------------------------------------

 

Changes

 

Cause

Possible Soft Error Rate Descrimination (SERD) issue.

Possible cable or seating issue on PCIe cable connection between CMUL and CMUU.

Possible Hardware issue with CMUL, CMUU or Cable (Cable is included with CMUL, so a CMUL replacement will effectively replace the cable as well).

Possible FCO A0335-1 Red phosphorus in the PCI-e cable connecting CMUL to CMUU causes corrosion resulting in a short circuit on the DDC control signal and a system panic. (Doc ID 1629497.1)),

Solution


1, If maintenance or system move was just done, or this is a newer system install, reseat and verify the PCIe cable connection between CMUL and CMUU. Boot Solaris and wait for 20 minutes or enough time to verify there are no new FMA errors(OS:PCIEX-8000-YJ, XSCF:12bb0000).

2, Once seating is ruled out, the SERD issue discussed in Doc ID 1617956.1 needs to be ruled out as well, at a minimum apply:
    Solaris 11.1 : SRU 18.5
    or
    Solaris 10 SPARC : <SunPatch 149279-03>

3.  If items ! and 2 above are complete then check that XCP is at or above XCP2321 ( See doc 2211342.1 for more details ).

3,  If patch or upgrade was already installed or if additional errors are seen after application of upgrade or patch, replace the parts listed in XCP error logs.

Date: Aug 22 11:06:16 UTC 2014
    Code: 40000000-00a20400480400a204-12bb00000000000000000000
    Status: Warning                Occurred: Aug 22 11:06:12.605 UTC 2014
    FRU: /BB#0/CMUL          ~~~~~~~~~~ (*)PCI error has occurred in CMUL. Please replace only CMUL.
    Msg: PCI access error


Date: Aug 22 10:48:35 UTC 2014
    Code: 40000000-006b0400a20400a204-12bb00000000000000000000
    Status: Warning                Occurred: Aug 22 10:48:26.778 UTC 2014
    FRU: /BB#0/CMUU,/BB#0/CMUL           ~~~~~~~~~~ ~~~~~~~~~~
    Msg: PCI access error

 

  Fix FCO if it applies. The fix described in FCO A0335-1 Described in Doc ID 1629497.1 was applied to all systems proactively except these 7 serial numbers. If the error described in the document is on one of these systems apply the fix described in FCO A0335-1 first Serial numbers: PZ01334012 - PZ01334013 - PZ01337013 - PZ01337014 - PZ01326001 PZ01326002 PZ01326003

 

References

<NOTE:1600364.1> - Fujitsu M10-4/M10-4s: Error: How to Decode the Correct Cable Location for Error 0200242d PCI Express link up failed After Replacing CMUL ( with PCIBP ) or CMUU.
<NOTE:1617956.1> - I/O SERD threshold values are set too low and may result in PCIEX-8000-J5, PCIEX-8000-YJ and PCIEX-8000-KP faults.
<BUG:20264642> - CONFLICTING DIAGNOSIS BETWEEN FMA AND XCP. FMA = CMUU - XCP = CMUL,CMUU
<NOTE:1629497.1> - FCO A0335-1: Proactive - Scheduled: Red phosphorus in the PCI-e cable connecting CMUL to CMUU causes corrosion resulting in a short circuit on the DDC control signal and a system panic.

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback