![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1672818.1 : Troubleshooting "Fatal error has occured in: PCIe fabric" panics on TX-X servers
Fatal error has occured in: PCIe fabric is a common panic type that has multiple causes, both hardware and software. This document will assist in determining what caused the panic. In this Document
Created from <SR 3-8978367222> Applies to:SPARC T4-4 - Version All Versions to All Versions [Release All Releases]SPARC T4-2 - Version All Versions to All Versions [Release All Releases] SPARC T4-1 - Version All Versions to All Versions [Release All Releases] SPARC T3-1 - Version All Versions to All Versions [Release All Releases] SPARC T3-2 - Version All Versions to All Versions [Release All Releases] Oracle Solaris on SPARC (64-bit) SymptomsBelow is an example of a PCIe panic May 9 08:21:00 xxxxxxx^Mpanic[cpu120]/thread=2a104e41c80:
May 9 08:21:00 xxxxxxx unix: [ID 198415 kern.notice] Fatal error has occured in: PCIe fabric.(0x1)(0x101) May 9 08:21:00 xxxxxxx unix: [ID 100000 kern.notice] May 9 08:21:00 xxxxxxx genunix: [ID 723222 kern.notice] 000002a104e416f0 px:px_err_panic+1ac (19c5400, 13aa000, 101, 2a104e417a0, 1, 0) May 9 08:21:00 xxxxxxx genunix: [ID 179002 kern.notice] %l0-3: 0000009980001602 00000000019c5400 0000000000000000 0000000000000001 May 9 08:21:00 xxxxxxx %l4-7: 0000000000000000 000000000190d400 0000000000000001 0000000000000000 May 9 08:21:00 xxxxxxx genunix: [ID 723222 kern.notice] 000002a104e41800 px:px_err_fabric_intr+1c0 (6012a13d280, 1, 19c5800, 1, 101, 100) May 9 08:21:00 xxxxxxx genunix: [ID 179002 kern.notice] %l0-3: 0000000000000100 0000000000000001 00000000019c5960 00000000019c5800 May 9 08:21:00 xxxxxxx %l4-7: 00000000019c5958 00000000019c5800 0000000000000001 00000300147bd2e0 May 9 08:21:00 xxxxxxx genunix: [ID 723222 kern.notice] 000002a104e41970 px:px_msiq_intr+1e8 (300147c6638, 1, 30011727d38, 30011727d38, 6012a140a88, 2) May 9 08:21:00 xxxxxxx genunix: [ID 179002 kern.notice] %l0-3: 000006012a15f660 000006012a0654a0 000006012a15dd60 000002a104e41a80 May 9 08:21:00 xxxxxxx %l4-7: 0000000000000000 0000000003820000 0000000000000000 0000000000000030 Check the second number (in this case 0x101) for any known issues (eg PCIe fabric.(0x1) (0x41) which has a known issue in 1519563.1) before continuing. Now 0x101 does not have one listed so we need to continue. You now should look at FMA and the console logs. In this instance the customer only provided an explorer so we will concentrate on FMA. Run fmdump -eV and look through the output to find the longest device paths, then compare them with prtdiag -v & doc 1005907.1. In this SR the following were found. /pci@600/pci@1/pci@0/pci@4/network@0 ---------------------> EM4: /pci@600/pci@1/pci@0/pci@4 /pci@600/pci@2/pci@0/pci@3/network@0 ---------------------> NET2: /pci@600/pci@2/pci@0/pci@3/network@0 /pci@600/pci@2/pci@0/pci@5/pci@0/pci@2/SUNW,qlc@0,1 --> EM5: /pci@600/pci@2/pci@0/pci@5 /pci@600/pci@2/pci@0/pci@5/pci@0/pci@3/network@0 -----> EM5: /pci@600/pci@2/pci@0/pci@5 CauseAs these paths were to multiple devices that handle the same thing (network), this is most likely a driver issue. Had there been multiple devices with different uses (network, hba, other) then firmware should be checked. If it is up to date open an SR with support. Had there been only 1 path, then the component is more likely to be at fault but drivers should be updated first. SolutionIn the SR this document was developed from, network driver patches were missing and the customer was advised to update them. Attachments This solution has no attachment |
||||||||||||||||||
|