Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2358708.1
Update Date:2018-03-25
Keywords:

Solution Type  Problem Resolution Sure

Solution  2358708.1 :   [PCA] Amber Light from Compute Node after Applied PCA PSU on PCA X3-2 Server  


Related Items
  • Oracle Virtual Compute Appliance X3-2 Hardware
  •  
Related Categories
  • PLA-Support>Infrastructure>Operating Systems and Virtualization>Virtualization>Oracle PCA
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-16756380114>

Applies to:

Oracle Virtual Compute Appliance X3-2 Hardware - Version All Versions to All Versions [Release All Releases]
Linux x86-64
Aft

Symptoms

After applied PCA PSU version such as 2.3.2, some compute nodes are having amber lights.

We can see PCIE2 critical error from ILOM Snapshot :

Fault class : fault.io.intel.iio.pcie-fatal

ASRU : /SYS/MB/RISER2/PCIE2
faulted

FRU : /SYS/MB/RISER2/PCIE2
(Part Number: unknown)
(Serial Number: unknown) 100%
faulty

PCIE2 belongs to HCA card in PCA X3-2 Compute node.

System/PCI_Devices/Add-on/Device_2
Properties:
part_number = XXXXX
description = Sun Infiniband Dual Port 4x QDR PCIe LP HCA M2
location = PCIE2 (PCIe Slot 2)
pci_vendor_id = 0x15b3
pci_device_id = 0x673c
pci_subvendor_id = 0x15b3
pci_subdevice_id = 0x673c

On the console logs you will see the errors

Fri Jan 19 10:57:09 2018 ID ffff P0 Fatals GFERRST 0x00000000,GFFERRST 0x00000400,GFNERRST 0x00000000, 1
Fri Jan 19 10:57:10 2018 ID 02E7 P0 Fatal:FIRST:PCIe 2:port2c:PCIE:84::Completion Time-out Status
Fri Jan 19 10:57:10 2018 ID ffff P0:PCIe 2:port2c: UNCERR:00004000 LNERR:00000000
Fri Jan 19 10:57:10 2018 ID 02E8 P0 Fatal:MSG:PCIe 2:port2c:PCIE:Fatal Error Msgs Received

Changes

Upgrade PCA

Cause

This is an IB driver or firmware bug.

NOTE :

This issue only seems to happen the one time during the patch update so far. Not root cause known at this time.

Engineer Team is waiting for the EEST team to log an official bug, which I track on and update int this document once having solution.

Solution

This is not a hardware failure.

The faults can be cleared in the ILOM fault management shell step by step as below:

% ssh -l root <IP address of Service Processor>

-> show /SP/faultmgmt/shell

-> start /SP/faultmgmt/shell
Are you sure you want to start /SP/faultmgmt/shell(y/n)? y

faultmgmtsp>

faultmgmtsp> fmadm repair /SYS/MB/RISER2/PCIE2
faultmgmtsp>exit

-> reset /SP

References

<NOTE:1501450.1> - INTERNAL Exadata Database Machine Hardware Current Product Issues (X3-2, X4-2, X3-8, X4-8 w/X4-2L)

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback