Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1593389.1
Update Date:2017-05-01
Keywords:

Solution Type  Problem Resolution Sure

Solution  1593389.1 :   T5140/T5240/Netra T5440 Panics with FMA alert PCIEX-8000-6D due to "Fatal error has occured in: PCIe fabric.(0x2)(0x45)"  


Related Items
  • Sun SPARC Enterprise T5240 Server
  •  
  • Sun SPARC Enterprise T5140 Server
  •  
  • Sun Netra T5440 Server
  •  
Related Categories
  • PLA-Support>Sun Systems>SPARC>CMT>SN-SPARC: T5xx0
  •  


Server Panics with FMA alert PCIEX-8000-6D with the panic string of "Fatal error has occured in: PCIe fabric.(0x2)(0x45)"

In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-7755065081>

Applies to:

Sun SPARC Enterprise T5140 Server - Version All Versions to All Versions [Release All Releases]
Sun SPARC Enterprise T5240 Server - Version All Versions to All Versions [Release All Releases]
Sun Netra T5440 Server - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.

Symptoms

 ILOM will flag a fault with the MB:

 
##### Tx000/showfaults_-v#####
Last POST Run: Mon May  4 00:22:36 2009

Post Status: Passed all devices
  ID Time                           FRU               Class             Fault
   1 Sep 02 10:51:39                /SYS/MB                             Host detected fault MSGID: PCIEX-8000-6D  UUID: 2ea76502-01e6-45b4-c736-955849f48f29  <------------- 6D event
 

Also this alert will come out in FMA:
##### fma/fmadm-faulty.out#####
--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Sep 02 06:47:12 2ea76502-01e6-45b4-c736-955849f48f29  PCIEX-8000-6D  Major     
Fault class : fault.io.pci.device-invreq 67%
              fault.io.pciex.device-interr 33%
Affects     : dev:////pci@400/pci@0/pci@1/pci@0/usb@0,2  <------------- Event points to one of the external USB ports
              dev:////pci@400
                  faulted but still in service
FRU         : "MB" (hc://:product-id=SUNW,T5240:server-id=<host name>:chassis-id=<serial number>:serial=100375:part=501784806/motherboard=0)
                  faulty
Description : Either the transmitting device sent an invalid request or the
              receiving device is reporting an internal fault.
              Refer to http://sun.com/msg/PCIEX-8000-6D for more information.


Panic string:
 
Sep  2 04:33:07 <hostname> ^Mpanic[cpu102]/thread=2a103f9bc80: 
panic[cpu102]/thread=2a103f9bc80: Fatal error has occured in: PCIe fabric.(0x2)(0x45)  <------------------ Causes this panic string
000002a103fc3d50 px:px_err_panic+1ac (19e1c00, 13ae400, 45, 2a103fc3e00, 2, 0)
  %l0-3: 0000000000000001 00000000019e1c00 0000000000000000 0000000000000001
  %l4-7: 0000000000000000 000000000190d400 0000000000000001 0000000000000000
000002a103fc3e60 px:px_err_intr+1a0 (2, 2, 21, 2, 30006e6aaa0, 2)
  %l0-3: 0000030006e451e0 0000030006e6a940 0000000000000001 0000060053115a30
  %l4-7: 0000060053115a30 0000000000000004 0000000000000001 0000000000000001
000002a103fc3f50 unix:current_thread+188 (16, 1, c, 1000, 101010101010101, 12)
  %l0-3: 0000000001009944 000002a103f9afe1 000000000000000e 0000000070010140
  %l4-7: 00000000ffffffff 0000000000000000 0000000000000000 000002a103f9b890
000002a103f9b930 unix:cpu_halt+104 (3000baae000, 66, 1913ba8, 1913a78, 3000baae000, 0)
  %l0-3: 000006005898bdf4 0000000000000001 0000000000000016 0000000000000001
  %l4-7: 0000000001000000 0000000000000002 000003000baae178 0000000000000001
000002a103f9b9e0 unix:idle+128 (183ec00, 0, 3000baae000, ffffffffffffffff, 67, 183d400)
  %l0-3: 000006005898bdd0 000000000000001b 0000000000000000 ffffffffffffffff
  %l4-7: 000002a103febc80 0000000000000000 0000000001938400 00000000010432e0
syncing file systems... 26 12 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 done (not all i/o completed)


 
##### fma/fmdump-e.out(Skips CEs & other benign ereports) #####
Sep 02 04:33:07.4303 ereport.io.fire.dmc.tte_inv     <------------------- tte_inv ereport at the same timestamp as the panic
Sep 02 04:01:11.3689 ereport.io.fire.epkt            
Sep 02 04:33:07.4303 ereport.io.pci.sec-sta          
Sep 02 04:01:11.3689 ereport.io.pci.fabric           
Sep 02 04:01:11.3689 ereport.io.pci.fabric           
Sep 02 04:01:11.3689 ereport.io.pci.fabric           
Sep 02 04:33:07.4303 ereport.io.pci.target-rta       
Sep 02 04:01:11.3689 ereport.io.pci.fabric           
Sep 02 04:33:07.4303 ereport.io.pciex.tl.ca          
Sep 02 04:01:11.3689 ereport.io.pci.fabric           
Sep 02 04:01:11.3689 ereport.io.pci.fabric           
Sep 02 04:01:11.3689 ereport.io.pci.fabric           
Sep 02 04:01:11.3689 ereport.io.pci.rta              
Sep 02 04:01:11.3689 ereport.io.pci.rta              
Sep 02 04:01:11.3689 ereport.io.pci.sec-sta         
 
 

Cause

 The panic is generally caused by this fma event: Sep 02 04:33:07.4303 ereport.io.fire.dmc.tte_inv

This is what the system does that causes the panic:

 1. Write to the MMU TTE Cache Flush Address (in order to clear old TTE 
@ cache,
@      if exists)
@ 
@   2. Setup new TTE entry in the TSB memory.
@ 
@   3. At this point, VF does not ensure that the old TTE cache entry has been 
@      already flushed (if exists).
@ 
@   4. When PCI-E tries to access to the memory area followed by the new TTE 
@ entry
@      at the step-2 above, it hits the old TTE cache wrongly, if VF has *not* 
@      compled to flush the TTE cache yet.
@      And then VF will flush the TTE cache entry, and then the TTE entry will 
@ be
@      set to "invalid".
@ 
@      It causes "tte_inv" panic, because VF suddenly changes the TTE entry 
@ from 
@      "valid" to "invalid" during the memory access.

Solution

 Update to Firmware 7.2.11 or newer

References

<NOTE:1021307.1> - PCIEX-8000-6D - PCIEX subsystem problem
<BUG:15529944> - SUNBT6784914-1.9.X VF PLATFORMS GETTING TTE_INVALID PANIC FROM TO USB DMA TRANSA

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback