![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 2294697.1 : After SPX86A-8000-TL fault, bdacheckhw Command Reports Wrong Number of CPU Cores
In this Document
Applies to:Big Data Appliance X6-2 Hardware - Version All Versions and laterOracle Server X6-2L - Version All Versions and later Linux x86-64 SymptomsAfter SPX86A-8000-TL fault, the bdacheckhw command reports the wrong number of CPU cores. This example shows an X6-2L BDA node which reported an SPX86A-8000-TL CPU fault: 2017-02-04/11:11:48 a27a3456-20a3-4c06-91ff-f8b14dc64220 SPX86A-8000-TL
timestamp ereports fault = fault.cpu.intel.internal@/SYS/MB/P1/CORE12 The fault turned the service LED for the CPU on: /SYS/MB/P1 | ON
A field engineer (FE) may have already been dispatched for repair. If an FE was dispatched, the FE replaced the CPU and cleared the fault. -> show faulty
Target | Property | Value --------------------+------------------------+--------------------------------- [No faults are listed] All service LEDs are off, and there are no CPU faults identified in an ILOM snapshot. You note bdacheckhw still reports the wrong number of CPU cores: bdanode06: ERROR: Wrong number of CPU cores : 86
bdanode06: INFO: Expected number of CPU cores : 88 bdanode06: ERROR: Big Data Appliance failed hardware validation checks The ILOM version on the node is less than 3.2.8.24 build 114611. -> version
SP firmware 3.2.6.26 SP firmware build number: 107051 CauseThe CPU core(s) may still be disabled because the CPU fault was cleared after the node was already powered on. Even when the power was turned off and the FE replaced the CPU, another power cycle may be needed to re-enable the CPU core(s). SolutionUsing the restricted shell in the ILOM, run the HWdiag command to check to see if the same CPU core is disabled. This is done with the node powered on and can be done while the OS on the node is running: 1. Login to the ILOM of the node in question using ssh. 2. Start the restricted shell and run the 'hwdiag cpu info all' command, then exit the shell: -> set SESSION mode=restricted WARNING: The "Restricted Shell" account is provided solely [(restricted_shell) bdanode-ilom:~]# hwdiag cpu info all HWdiag (Restricted Mode) - Build Number 107051 (Jan 24 2016, 14:40:48)Current Date/Time: Aug 03 2017, 18:51:17 CPU0 CPUID: 406F1 (B0 Stepping) CPU 1 CPUID: 406F1 (B0 Stepping)Socket ID: 01 Number of cores: 21 < [22 cores are expected on X6-2L] Number of threads: 2 Current cores enabled: 0 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16 17 18 19 20 21 Current threads enabled: 0 1 ^ [snip] [Core 12 is not enabled and is missing from the list] -> exit
3. Gracefully shutdown the OS. Follow Note 2099858.1 that details "Steps to Gracefully Shutdown and Power on a Single Node on Oracle Big Data Appliance Prior to Maintenance". 4. Once the OS has been shutdown, power cycle the node: -> stop /SYS Are you sure you want to stop /SYS (y/n)? y -> show /SYS power_state /SYS Are you sure you want to start /SYS (y/n)? y -> exit 6. If the missing core(s) are enabled after the power cycle, continue to monitor the node to see if the fault returns. BDA image 4.8.0 contains ILOM-3_2_8_24_r114611-Oracle_Server_X6-2L.pkg. It contains new microcode which, Eng believes, should resolve the missing core issue. If on 4.8.0, 4.9.0, or 4.10.0, you could use "bdaupdatefw" to apply x6-2L ILOM-3_2_8_24_r114611. However this option will cause both bdacheckhw and bdacheckcluster to log an error similar to this if the one-off patch is not applied: Documentation on the use of 'bdaupdatefw" is located in the Oracle Big Data Appliance documentation set - Owner's Guide, Chapter titled "Oracle Big Data Appliance Utilities" Documentation on the use of the restricted shell and HWdiags is located in the Oracle® x86 Server Diagnostics, Applications, and Utilities Guide.
References<NOTE:1611199.1> - SPX86A-8000-TL - Internal Processor Fault<BUG:25652534> - BDA X6L-2 NODE HAD SPX86A-8000-TL. CPU REPLACED BUT THE CORE IS STILL OFFLINED. <NOTE:2099858.1> - Steps to Gracefully Shutdown and Power on a Single Node on Oracle Big Data Appliance Prior to Maintenance <BUG:25606502> - X6-2L BDA NODE MULTIPLE SPX86A-8000-TL FAULTS AT SAME CUSTOMER SITE Attachments This solution has no attachment |
||||||||||||||||||
|