![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 1956549.1 : The BDA utility, bdacheckhw Reports a Memory Failure: "WARNING: Hardware errors reported by ILOM : fault.memory.intel.sb.dimm_ce"
In this Document
Created from <SR 3-9966605076> Applies to:Big Data Appliance X3-2 Hardware - Version All Versions and laterBig Data Appliance X4-2 Hardware - Version All Versions and later Big Data Appliance Hardware - Version All Versions and later x86_64 Symptoms1. Running the BDA utility, bdacheckhw on a node, reports a memory failure: fault.memory.intel.sb.dimm_ce. WARNING: Hardware errors reported by ILOM : fault.memory.intel.sb.dimm_ce
INFO: Run 'ipmitool sunoem cli "show faulty"' to see the full error ... WARNING: Big Data Appliance warnings during hardware validation checks
# ipmitool sunoem cli "show faulty"
Connected. Use ^D to exit. -> show faulty
Target | Property | Value --------------------+------------------------+--------------------------------- /SP/faultmgmt/0 | fru | /SYS/MB/P0/D7 /SP/faultmgmt/0/ | class | fault.memory.intel.sb.dimm_ce faults/0 | | /SP/faultmgmt/0/ | sunw-msg-id | SPX86-8004-CE faults/0 ...
a) From ./ilom/@usr@local@bin@spshexec_show_-script_@X@logs@event@list.out we see the fault and that it was cleared after reboot: 1060 Wed Dec 3 15:08:08 2014 Fault Repair minor
Fault fault.memory.intel.sb.dimm_ce on component /SYS/MB/P0/D7 cleared ... 1058 Tue Dec 2 11:51:11 2014 Fault Fault critical Fault detected at time = Tue Dec 2 11:51:11 2014. The suspect component: /SYS/MB/P0/D7 has fault.memory.intel.sb.dimm_ce with probability=100. Re fer to http://www.sun.com/msg/SPX86-8004-CE for details.
Target | Property | Value
--------------------+------------------------+--------------------------------- -> Session closed ipmiint_sunoem_led_get.out fault leds P0/SERVICE | OFF ... P0/D6/SERV | OFF P0/D7/SERV | OFF<<<<<<<<<<<<<<< Not faulted P1/SERVICE | OFF ... P1/D7/SERV | OFF c) Also from the ILOM snapshot the fault is "Repaired"/"Resolved" after reboot: ...
2014-12-02/11:51:11 ef3f77c5-d16b-6a08-fca1-dbce2c725eee SPX86-8004-CE FRU = /SYS/MB/P0/D7 2014-12-03/15:08:08 ef3f77c5-d16b-6a08-fca1-dbce2c725eee SPX86-8004-CE Repaired 2014-12-03/15:08:08 ef3f77c5-d16b-6a08-fca1-dbce2c725eee SPX86-8004-CE Resolved ... CauseThe bdacheckhw errors and ILOM snapshot prior to reboot confirm a memory fault i.e. fault.memory.intel.sb.dimm_ce" due to excessive memory correctable errors.
Memory is tested with every post, so if a a server is rebooted and the error does not reoccur, memory is assumed to be ok. MOS document: 1155200.1 - PSH Procedural Article for ILOM-Based Diagnosis, states that you should check FMA errors. SolutionIn the case of a memory fault due to excessive memory correctable errors the solution is to: References<NOTE:1438864.1> - SPX86-8004-CE - Fault due to excessive memory correctable errors (CE's)<NOTE:1155200.1> - PSH Procedural Article for ILOM-Based Diagnosis <NOTE:1956552.1> - Running Enabled and Extended Diagnostics to Confirm Hardware Errors on the BDA Attachments This solution has no attachment |
||||||||||||||||||
|