M10-io.pcie.device.fe - Fatal error detected on a card in a PCI slot or an onboard device

Asset ID:	1-79-1527089.1
Update Date:	2015-12-31
Keywords:

Solution Type Predictive Self-Healing Sure

Solution 1527089.1 : M10-io.pcie.device.fe - Fatal error detected on a card in a PCI slot or an onboard device

Applies to:

Fujitsu M10-4
Fujitsu M10-4S
Fujitsu M10-1
SPARC

Purpose

Provide additional information for message ID: M10-io.pcie.device.fe

Fujitsu fault codes:

02002415, 02002431, 02002433, 2002435

Details

Type

: Hardware Fault; io.pcie.device.fe

Severity

: Minor

Description

: Fault due to a fatal error detected on a card in a PCI slot or an onboard device.

Automated Response

: No immediate action is taken.

Impact

When the failure is detected on a PCI card, the PCI card is marked for replacement

POST/OBP stops using the PCI card when the failure is detected. But, by power-off and on the domain, the domain will start using it again.

When the failure is detected on an onboard device, the onboard device chip is deconfigured.

Indicted Hardware

: For M10-1 systems, if the ereport class is fe_linkup_* or fe_no_access, then the MBU is the second suspect on the suspect list.
For M10-4/4S systems, if the ereport class is fe_linkup_* or fe_no_access, then the CMUL is the second suspect on the suspect list; When the failure is detected on an onboard device (built-in SAS chip, GbE chip, or USB chip), the MBU is marked for replacement on M10-1, the CMUL is marked for replacement on M10-4/4S.; If the fault was detected while running POST, then the fault may be detected on; - For M10-1 systems, the PCIe slots are on the MBU;
- For M10-4/4S systems, the PCIe slots are on the CMUL; The fault may be detected while running OBP.; If the fault was detected while running POST, such event is listed in the following categories:; - fe-no-access 02002415 The PCI device was not accessible
- fe-read-err 02002431 Read access to the PCI device causes uncorrectable error
- fe-reg-cmp-err 02002433 The register on a PCIe device show unexpected result
- fe-linkup-err 02002435 The PCIe linkup process failed; The fault information for this fault is not stored in the FMA resource cache, nor is it stored on the XSCF's persistent storage. Instead, the fault information is stored only in the hardware descriptor (HWD) of the domain that the device belongs to. The HWD itself is cleared of all information about faulty devices when the domain is powered down (this includes platform resets and platform power-downs).; The HWD information about this device being faulty is also cleared when a hot-plug operation is performed on the faulty PCIe card from within Solaris running on the domain. However, even though the fault information is not stored in the FMA resource cache or XSCF persistent storage, the fault occurrence is logged in the relevant error logs and fault logs.

Suggested Action for System Administrator

: The recommended service action for this event is to schedule replacement of the affected component(s) at the earliest possible convenience. Although the hardware may be functioning, it is not intended nor recommended that the faulted component(s) remain in the system for a prolonged period of time.

Refer to the following document for the latest procedures for displaying event content in preparation for submitting a service request and applying any post-repair actions that may be required.

PSH Procedural Article for Fujitsu M10 Diagnosis (Doc ID 1525156.1)

Attachments

This solution has no attachment