![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||
Solution Type Problem Resolution Sure Solution 2081723.1 : ILOM sending occasional incorrect sensor readings via IPMI when being polled by hwmgmtd on BDA V4.2
In this Document
Created from <SR 3-11415470031> Applies to:Big Data Appliance X4-2 Hardware - Version All Versions and laterLinux x86-64 SymptomsThe problem symptoms are as follows: Sep 21 16:12:36 bdanode03 hwmgmtd[13804]: State change: overall alarm state changed from "Cleared" (1) to "Critical" (2).
Sep 21 16:12:36 bdanode03 hwmgmtd[13804]: State change: alarm state of subsystem "Temperature" changed state from "Cleared" (1) to "Critical" (2). Sep 21 16:12:36 bdanode03 hwmgmtd[13804]: State change: alarm state of subsystem "Fan Speed" changed state from "Cleared" (1) to "Critical" (2). Sep 21 16:12:36 bdanode03 hwmgmtd[13804]: State change: alarm state of subsystem "Other" changed state from "Cleared" (1) to "Major" (3). Sep 21 16:13:14 bdanode03 modprobe: WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/. Sep 21 16:14:20 bdanode03 modprobe: WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/. Sep 21 16:15:13 bdanode03 hwmgmtd[13804]: State change: overall alarm state changed from "Critical" (2) to "Cleared" (1). Sep 21 16:15:13 bdanode03 hwmgmtd[13804]: State change: alarm state of subsystem "Temperature" changed state from "Critical" (2) to "Cleared" (1). Sep 21 16:15:13 bdanode03 hwmgmtd[13804]: State change: alarm state of subsystem "Fan Speed" changed state from "Critical" (2) to "Cleared" (1). ...
# grep hwmgmtd /var/log/messages
Oct 25 09:08:52 bdanode03 hwmgmtd[12805]: State change: indicator: /SYS/MB/FM0/OK (ID: 208) changed state from "On" (4) to "Off" (3). Oct 25 09:08:52 bdanode03 hwmgmtd[12805]: State change: indicator: /SYS/MB/FM1/OK (ID: 209) changed state from "On" (4) to "Off" (3). Oct 25 09:08:52 bdanode03 hwmgmtd[12805]: State change: service indicator: /SYS/SERVICE (ID: 213) changed state from "Off" (3) to "On" (4). Oct 25 09:08:52 bdanode03 hwmgmtd[12805]: State change: locator indicator: /SYS/LOCATE (ID: 214) changed state from "Off" (3) to "On" (4). Oct 25 09:08:52 bdanode03 hwmgmtd[12805]: State change: indicator: /SYS/SP/OK (ID: 215) changed state from "On" (4) to "Off" (3). Oct 25 09:08:52 bdanode03 hwmgmtd[12805]: State change: indicator: /SYS/PS_FAULT (ID: 217) changed state from "Off" (3) to "On" (4). Oct 25 09:09:54 bdanode03 hwmgmtd[12805]: State change: indicator: /SYS/MB/FM0/OK (ID: 208) changed state from "Off" (3) to "On" (4). ... 3. But the ILOM snapshot shows: the Fault leds are off, the fma did not log any fault, and the sel events are clear as well. CauseThere are several bugs which may be related to the ILOM sending incorrect sensor readings via the IPMI when being polled by hwmgmtd. SolutionThe recommendation is to upgrade to the latest ILOM version. However upgrading the ILOM is not support on BDA V4.2 . This is not supported because BDA hardware checks (and therefore cluster checks) do not support this version. References<BUG:21764888> - OHMP EVENT SHOWN, BUT NO RELATED EVENT SEL NOR FMAAttachments This solution has no attachment |
||||||||||||||||||
|