Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2082722.1
Update Date:2018-05-15
Keywords:

Solution Type  Problem Resolution Sure

Solution  2082722.1 :   Oracle ZFS Storage Appliance: FMA 'pcie-fatal' event seen after upgrade of Infiniband CX-2 firmware on ZS3-ES  


Related Items
  • Oracle ZFS Storage ZS3-2
  •  
  • Oracle ZFS Storage ZS4-4
  •  
  • Oracle ZFS Storage ZS3-4
  •  
  • Oracle ZFS Storage ZS3-BA
  •  
Related Categories
  • PLA-Support>Sun Systems>DISK>ZFS Storage>SN-DK: ZS
  •  




In this Document
Symptoms
Cause
Solution
References


Applies to:

Oracle ZFS Storage ZS3-4 - Version All Versions and later
Oracle ZFS Storage ZS3-BA - Version All Versions and later
Oracle ZFS Storage ZS3-2 - Version All Versions and later
Oracle ZFS Storage ZS4-4 - Version All Versions and later
7000 Appliance OS (Fishworks)

Symptoms

Upgrading the IB CX-2 firmware from 2.7.8130 to 2.11.2010 resulted in PCIE fatal errors causing /SYS, /SYS/MB, /SYS/MB/RISER1/PCIE1 and /SYS/MB/P1 to be faulted on a Oracle ZFS Storage ZS3-ES.

However, if we clear the faults, they will stay cleared until we do another upgrade.

 

Downgrade from 2.11.2010 back to 2.7.8130 or reflash (same version) do not incur the problem.

 

FMA event:

------------------- ------------------------------------ -------------- --------
Time                UUID                                 msgid          Severity
------------------- ------------------------------------ -------------- --------
2015-09-24/01:01:27 94130e54-45cb-6a19-d61e-fa52e9fc25c4 SPX86-8003-RR  Critical

Problem Status    : open
Diag Engine       : fdd 1.0
System
  Manufacturer   : Oracle Corporation
  Name           : Exalogic X4-2
  Part_Number    : Exalogic X4-2
  Serial_Number  : AK00260761

System Component
  Manufacturer   : Oracle Corporation
  Name           : SUN FIRE X4170 M3
  Part_Number    : 7078183
  Serial_Number  : 1441NML0G2

----------------------------------------
Suspect 1 of 3
  Fault class  : fault.io.intel.iio.pcie-fatal
  Certainty    : 33%
  Affects      : /SYS/MB/RISER1/PCIE1
  Status       : faulted

  FRU
     Status            : faulty
     Location          : /SYS/MB/RISER1/PCIE1
     Chassis
        Manufacturer   : Oracle Corporation
        Name           : SUN FIRE X4170 M3
        Part_Number    : 7078183
        Serial_Number  : 1441NML0G2
----------------------------------------
Suspect 2 of 3
  Fault class  : fault.io.intel.iio.pcie-fatal
  Certainty    : 33%
  Affects      : /SYS/MB/P1
  Status       : faulted

  FRU
     Status            : faulty
     Location          : /SYS/MB/P1
     Name              : Intel(R) Xeon(R) CPU E5-2658 0 @ 2.10GHz
     Part_Number       : 060D
     Chassis
        Manufacturer   : Oracle Corporation
        Name           : SUN FIRE X4170 M3
        Part_Number    : 7078183
        Serial_Number  : 1441NML0G2
----------------------------------------
Suspect 3 of 3
  Fault class  : fault.io.intel.iio.pcie-fatal
  Certainty    : 33%
  Affects      : /SYS/MB
  Status       : faulted

  FRU
     Status            : faulty
     Location          : /SYS/MB
     Manufacturer      : MiTAC International Corporation
     Name              : MOTHER BOARD ASSEMBL
     Part_Number       : 7048712
     Revision          : 06
     Serial_Number     : 489089M+1434U92M3W
     Chassis
        Manufacturer   : Oracle Corporation
        Name           : SUN FIRE X4170 M3
        Part_Number    : 7078183
        Serial_Number  : 1441NML0G2

Description : An Integrated I/O (II0) fatal error in downstream PCIE device
             has occurred.

Response    : The service-required LED on the chassis will be illuminated.

 

Cause

Bug 22012490 - PCIE fatal enountered when upgrading infiniband CX-2 firmware

 

Solution

There is no current resolution to this issue.


Workaround:

Following this procedure - exactly - results in no FMA event :

  1.  Initial upgrade of IB CX-2 firmware from '2.7.8130' to '2.11.2010'
  2.  Reboot
  3.  FMA event is reported
  4.  Clear the FMA fault  (maintenance problems  ......  markrepaired)
  5.  Re-flash the IB CX-2 firmware with '2.11.2010'
  6.  Reboot


      => No FMA event

 

The required order is :  Clear FMA -> Re-flash, -> Reboot

 

The procedure below also results in the FMA event being reported :

  1.   Initial upgrade of IB CX-2 firmware from '2.7.8130' to '2.11.2010'
  2.   Reboot
  3.   FMA event is reported
  4.   Re-flash the IB CX-2 firmware with '2.11.2010'
  5.   Clear FMA fault
  6.   Reboot


      => FMA event reported

 

Further information from TSC Engineer:

I have seen lots of CPU1 faults on ZS3-ES platforms only.   The error description says the CPU is faulted.

Action:
 
Follow this Doc ID to mark the error repaired and schedule a reboot of the node to make sure the PCIE Downstream CPU error does not return.
So far I have done 4 nodes and when I check maintenance problems the error is gone and after a reboot the problem does not come back.  

Note: If the error returns you will need to follow the doc clear & re-flash the IB HCA & reboot the node then  check maintenance problems again.

 

Check for relevancy - 10-May-2018

References

<BUG:22012490> - PCIE FATAL ENOUNTERED WHEN UPGRADING INFINIBAND CX-2 FIRMWARE

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback