Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2116695.1
Update Date:2018-03-22
Keywords:

Solution Type  Problem Resolution Sure

Solution  2116695.1 :   X4800 / X2-8 Server might Post CPU/PCI Faults and Fail to Boot after Firmware Upgrade on Sun InfiniBand Dual Port 4x QDR PCIe EM  


Related Items
  • Sun Fire X4800 Server
  •  
  • Sun Server X2-8
  •  
Related Categories
  • PLA-Support>Sun Systems>x86>Server>SN-x64: SERVER 64bit
  •  




Applies to:

Sun Fire X4800 Server - Version Not Applicable to Not Applicable [Release N/A]
Sun Server X2-8 - Version Not Applicable to Not Applicable [Release N/A]
Information in this document applies to any platform.

Symptoms

The system posts CPU/PCI faults and fails to boot.

e.g.

FMA fault on SP:
fault.cpu.intel.internal --> SPX86-8000-F4


HOST Console output:
[  162.215634] mlx4_core 0000:08:00.0: command 0x23 timed out (go bit not cleared).
[  162.292781] mlx4_core 0000:08:00.0: device is going to be reset.
[  163.411212] pcieport 0000:00:05.0: AER: Uncorrected (Non-Fatal) error received: id=0028.
[  163.498970] pcieport 0000:00:05.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0028(Requester ID).
[  163.644466] pcieport 0000:00:05.0:   device [8086:340c] error status/mask=00004000/00000000.
[  163.728809] pcieport 0000:00:05.0:    [14] Completion Timeout     (First).

Changes

On systems which have both PCIe EM slots of a CMOD/IOH populated with a Sun InfiniBand Dual Port 4x QDR PCIe ExpressModule Host Channel Adapter M2.

The issue might be triggered by:

  •  Upgrading InfiniBand Dual Port 4x QDR PCIe EMs to FW 2.11.2014 or later.
  •  Installing a spare part with FW 2.11.2014 or later.

 

Cause

The BAR space requirements of FW 2.11.2014 exceeding the default setting of 2Gb MMIOH Size per IOH if both PCIe EM slots of a CMOD/IOH are populated with a Sun InfiniBand Dual Port 4x QDR PCIe ExpressModule Host Channel Adapter M2.

If the PCIe EMs are installed by considering the population rules then the issue should be only seen if there are more then four InfiniBand Dual Port 4x QDR PCIe EMs installed.

Solution

For Sun Fire X4800 - This issue is fixed in firmware update ILOM SW1.8.0 and later.

For Sun Fire X4800 M2 - This issue is fixed in firmware update ILOM SW1.5.1 and ILOM SW1.6 and later.

 

Or the issue can be avoided by changing BIOS settings as described below:

  1. Access the BIOS Setup Utility.
  2. Navigate to RC Settings > QPI and change
    * MMIOH Size per IOH from 2Gb to 4Gb (the default is 2Gb).
  3. Navigate to Chipset > North Bridge Configuration and change
    * PCI MMIO 64 Bits Support to Enabled (the default is Disabled)
  4. Save your changes and exit the BIOS Setup Utility.
  5. Reboot the server.

References

<BUG:22536804> - CPUS FAULTED AND SYSTEM FAILED TO BOOT WHEN 6 CX2 ARE IN THE CHASSIS
<BUG:22686146> - OL7.2/RHEL7.2 INSTALLATION FAILED DUE TO PCIE AER ERROR ON SYSTEM W/ 4 CX2
<NOTE:2051841.1> - Infiniband (IB) HCA Firmware - Summary of Patches Available
<BUG:24933200> - X4800 12.2.1.1.1 BOOT: ON RDMA START,MISSING UAR?, DEVICE & SYSTEM RESET
<BUG:25066062> - X4800M2 12.2.1.1.1 BOOT: ON RDMA START,MISSING UAR?, DEVICE & SYSTEM RESET

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback