Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-2054473.1
Update Date:2017-10-11
Keywords:

Solution Type  Troubleshooting Sure

Solution  2054473.1 :   SPARC M7 Series Servers : Suspect list for PCIe link faults is incomplete  


Related Items
  • SPARC M7-8
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>Enterprise>SN-SPARC: M7
  •  




In this Document
Purpose
Troubleshooting Steps
References


Applies to:

SPARC M7-8 - Version All Versions and later
Information in this document applies to any platform.

Purpose

When a fault is detected for the PCIe link to an add-in card slot, there should be three fault suspects:

  • CMIOU containing the PCIe root port or the PCIe slot

Ex :
      /SYS/CMIOUn/IOH/IOSn/RP0
      /SYS/CMIOUn/PCIEn

  • PCIe hot-plug carrier

Ex :
      /SYS/CMIOUn/PCIEn/CAR

  • PCIe add-in card

Ex :
      /SYS/CMIOUn/PCIEn/CAR/CARD


However the list of suspects as reported by FMA is incomplete; only one or two suspect FRUs are identified.  
The CMIOU is always diagnosed faulty.

The following faults are impacted :

  • fault.io.pciex.bus-linkerr
  • fault.io.pciex.bus-linkerr-deg
  • fault.io.pciex.bus-linkerr-unaf
  • fault.io.pciex.device


Whenever a root port or PCIe slot is faulted, also consider the PCIe hot-plug carrier and PCIe add-in card as suspects. Use the following information to determine if an add-in slot is affected:

Root Port PCIe Slot
/SYS/CMIOUn/IOH/IOS3/RP0    /SYS/CMIOUn/PCIE1
/SYS/CMIOUn/IOH/IOS0/RP0   /SYS/CMIOUn/PCIE2
/SYS/CMIOUn/IOH/IOS1/RP0    /SYS/CMIOUn/PCIE3

 

Fix available in 9.5.2.g
  

Troubleshooting Steps

1. Identify the PCIe link fault. This can be done

By starting the Fault Management Shell

    -> start -script /SP/faultmgmt/shell
    faultmgmtsp>  fmadm faulty

By checking the /System/Open_Problems property

    -> show /System/Open_Problems


Example :

        -> show /System/Open_Problems
            
        Open Problems (4)
        Date/Time                 Subsystems          Component
        ------------------------  ------------------  ------------
        Mon Aug 31 11:25:07 2015  Domain Configuration Unit  CMIOU0 (CPU Memory IO Unit 0)
                An IO interconnect has failed to initialize during power on testing.
        (Probability:100, UUID:a0435632-59f9-4e8c-f9bd-a38dbe929e2c, Resource:/SYS/CMIOU0/IOH/IOS0/RP0/PCIE_LINK, Part Number:7094491, Serial
                Number:465769T+14266N00YE, Reference Document:http://support.oracle.com/msg/SPSUN4V-8000-YE)



2. From the suspect provided, along with the CMIOU, also consider the PCIe hot-plug carrier and PCIe add-in card as suspect using

Root Port PCIe Slot
/SYS/CMIOUn/IOH/IOS3/RP0    /SYS/CMIOUn/PCIE1
/SYS/CMIOUn/IOH/IOS0/RP0   /SYS/CMIOUn/PCIE2
/SYS/CMIOUn/IOH/IOS1/RP0    /SYS/CMIOUn/PCIE3



From the above example (/SYS/CMIOU0/IOH/IOS0/RP0), the suspect list to consider should be :
    - CMIOU0
    - /SYS/CMIOU0/PCIE2/CAR/CARD
    - /SYS/CMIOU0/PCIE2/CAR

References

<BUG:20924762> - DIAGNOSIS RULES NEED TO BE CHANGED TO FAULT / DECONFIG FAILED PCIE SLOTS

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback