![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 2369194.1 : Oracle ZFS Storage Applaince: Many false persistent link errors which lead to FMA fault - INTEL-8001-ND
In this Document
Created from <SR 3-16600912681> Applies to:Sun ZFS Storage 7420 - Version All Versions and laterSun ZFS Storage 7320 - Version All Versions and later Sun ZFS Storage 7120 - Version All Versions and later Oracle ZFS Storage ZS3-4 - Version All Versions and later Oracle ZFS Storage ZS3-2 - Version All Versions and later 7000 Appliance OS (Fishworks) SymptomsThis issue of the erroneous reporting of CPU faults occurs only when the INTEL-8001-ND CPU errors are triggered from the Memory Controller North Bound or South Bound FB-DIMM link events. The INTEL-8001-ND errors are logged in the fmadm.out as shown in the following example: ------------------- ------------------------------------------------ ---------------- ----------- Problem Status : open System Component ---------------------------------------- FRU Description : A quickpath memory link correctable error was detected. Response : Processor is not off-lined, as the memory controller on the CPU chip is still accessible by other processors
CauseThis issue is related to Bug 15807381. To confirm if this problem is being hit, run fmdump on the errlog, using the -u option with the UUID from the fault to find out if quickpath.mem_lnkpers is the cause, e.g.
From the fltlog: Jan 14 19:15:03.7607 88b430c3-b48e-42e1-d73d-c35d688a5a08 INTEL-8001-ND
/usr/sbin/fmdump -eu 88b430c3-b48e-42e1-d73d-c35d688a5a08 fltlog TIME CLASS
Jan 14 19:14:32.7134 ereport.cpu.intel.quickpath.mem_lnkpers Jan 14 19:14:22.6365 ereport.cpu.intel.quickpath.mem_lnkpers Jan 14 19:14:07.6437 ereport.cpu.intel.quickpath.mem_lnkpers Jan 14 19:13:42.2797 ereport.cpu.intel.quickpath.mem_lnkpers Jan 14 19:13:05.4707 ereport.cpu.intel.quickpath.mem_lnkpers ........
SolutionThere is no workaround for this issue. The INTEL-8001-ND faults that have been triggered by these Memory Controller events can be ignored or repaired using the "mark repaired" command.
This issue is addressed in the following releases: AK 2013.1.6.15 (OS 8.6.15) or later
References<NOTE:1920585.1> - Solaris 10 and Solaris 11 Fault Management Architecture (FMA) may Erroneously Report Xeon 7500 Series and E7-x800 Series Intel CPUs as Faulty After Memory Controller Events are Logged<BUG:24711310> - FMA FOR NHM-EX SHOULD NOT SERD CORRECTABLE PERSISTENT LINK ERROR Attachments This solution has no attachment |
||||||||||||||||||
|