Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1672818.1
Update Date:2017-10-04
Keywords:

Solution Type  Problem Resolution Sure

Solution  1672818.1 :   Troubleshooting "Fatal error has occured in: PCIe fabric" panics on TX-X servers  


Related Items
  • SPARC T4-1
  •  
  • SPARC T3-4
  •  
  • Netra SPARC T4-2 Server
  •  
  • Netra SPARC T4-1 Server
  •  
  • SPARC T3-1B
  •  
  • Netra SPARC T3-1B
  •  
  • SPARC T3-2
  •  
  • Netra SPARC T4-1B
  •  
  • SPARC T5-4
  •  
  • SPARC T4-1B
  •  
  • SPARC T5-2
  •  
  • SPARC T4-4
  •  
  • SPARC T4-2
  •  
  • SPARC T3-1
  •  
  • Netra T3-1
  •  
  • SPARC T5-8
  •  
  • SPARC T5-1B
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T4
  •  


Fatal error has occured in: PCIe fabric is a common panic type that has multiple causes, both hardware and software.  This document will assist in determining what caused the panic.

In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-8978367222>

Applies to:

SPARC T4-4 - Version All Versions to All Versions [Release All Releases]
SPARC T4-2 - Version All Versions to All Versions [Release All Releases]
SPARC T4-1 - Version All Versions to All Versions [Release All Releases]
SPARC T3-1 - Version All Versions to All Versions [Release All Releases]
SPARC T3-2 - Version All Versions to All Versions [Release All Releases]
Oracle Solaris on SPARC (64-bit)

Symptoms

Below is an example of a PCIe panic

May  9 08:21:00 xxxxxxx^Mpanic[cpu120]/thread=2a104e41c80:
May  9 08:21:00 xxxxxxx unix: [ID 198415 kern.notice] Fatal error has occured in: PCIe fabric.(0x1)(0x101)
May  9 08:21:00 xxxxxxx unix: [ID 100000 kern.notice]
May  9 08:21:00 xxxxxxx genunix: [ID 723222 kern.notice] 000002a104e416f0 px:px_err_panic+1ac (19c5400, 13aa000, 101, 2a104e417a0, 1, 0)
May  9 08:21:00 xxxxxxx genunix: [ID 179002 kern.notice]   %l0-3: 0000009980001602 00000000019c5400 0000000000000000 0000000000000001
May  9 08:21:00 xxxxxxx %l4-7: 0000000000000000 000000000190d400 0000000000000001 0000000000000000
May  9 08:21:00 xxxxxxx genunix: [ID 723222 kern.notice] 000002a104e41800 px:px_err_fabric_intr+1c0 (6012a13d280, 1, 19c5800, 1, 101, 100)
May  9 08:21:00 xxxxxxx genunix: [ID 179002 kern.notice]   %l0-3: 0000000000000100 0000000000000001 00000000019c5960 00000000019c5800
May  9 08:21:00 xxxxxxx %l4-7: 00000000019c5958 00000000019c5800 0000000000000001 00000300147bd2e0
May  9 08:21:00 xxxxxxx genunix: [ID 723222 kern.notice] 000002a104e41970 px:px_msiq_intr+1e8 (300147c6638, 1, 30011727d38, 30011727d38, 6012a140a88, 2)
May  9 08:21:00 xxxxxxx genunix: [ID 179002 kern.notice]   %l0-3: 000006012a15f660 000006012a0654a0 000006012a15dd60 000002a104e41a80
May  9 08:21:00 xxxxxxx %l4-7: 0000000000000000 0000000003820000 0000000000000000 0000000000000030

Check the second number (in this case 0x101) for any known issues (eg PCIe fabric.(0x1) (0x41) which has a known issue in 1519563.1) before continuing.  Now 0x101 does not have one listed so we need to continue.

You now should look at FMA and the console logs.  In this instance the customer only provided an explorer so we will concentrate on FMA.

Run fmdump -eV and look through the output to find the longest device paths, then compare them with prtdiag -v & doc 1005907.1.  In this SR the following were found.

/pci@600/pci@1/pci@0/pci@4/network@0  ---------------------> EM4:  /pci@600/pci@1/pci@0/pci@4

/pci@600/pci@2/pci@0/pci@3/network@0  ---------------------> NET2: /pci@600/pci@2/pci@0/pci@3/network@0

/pci@600/pci@2/pci@0/pci@5/pci@0/pci@2/SUNW,qlc@0,1  --> EM5:  /pci@600/pci@2/pci@0/pci@5

/pci@600/pci@2/pci@0/pci@5/pci@0/pci@3/network@0   -----> EM5:  /pci@600/pci@2/pci@0/pci@5

Cause

As these paths were to multiple devices that handle the same thing (network), this is most likely a driver issue. 

Had there been multiple devices with different uses (network, hba, other) then firmware should be checked.  If it is up to date open an SR with support.

Had there been only 1 path, then the component is more likely to be at fault but drivers should be updated first.

Solution

 In the SR this document was developed from, network driver patches were missing and the customer was advised to update them.


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback