Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2081723.1
Update Date:2016-07-12
Keywords:

Solution Type  Problem Resolution Sure

Solution  2081723.1 :   ILOM sending occasional incorrect sensor readings via IPMI when being polled by hwmgmtd on BDA V4.2  


Related Items
  • Big Data Appliance X4-2 Hardware
  •  
Related Categories
  • PLA-Support>Eng Systems>BDA>Big Data Appliance>DB: BDA_EST
  •  




In this Document
Symptoms
Cause
Solution
References


Created from <SR 3-11415470031>

Applies to:

Big Data Appliance X4-2 Hardware - Version All Versions and later
Linux x86-64

Symptoms

The problem symptoms are as follows:

1. On BDA V4.2, X4-2 HW, OS OL 6.6, with ILOM version: Version 3.1.2.32 Copyright (c) 2006, 2013, the following ILOM  "Temperature", "Fan Speed" and "Other" Warnings are periodically raised like:

Sep 21 16:12:36 bdanode03 hwmgmtd[13804]: State change: overall alarm state changed from "Cleared" (1) to "Critical" (2).
Sep 21 16:12:36 bdanode03 hwmgmtd[13804]: State change: alarm state of subsystem "Temperature" changed state from "Cleared" (1) to "Critical" (2).
Sep 21 16:12:36 bdanode03 hwmgmtd[13804]: State change: alarm state of subsystem "Fan Speed" changed state from "Cleared" (1) to "Critical" (2).
Sep 21 16:12:36 bdanode03 hwmgmtd[13804]: State change: alarm state of subsystem "Other" changed state from "Cleared" (1) to "Major" (3).
Sep 21 16:13:14 bdanode03 modprobe: WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
Sep 21 16:14:20 bdanode03 modprobe: WARNING: Deprecated config file /etc/modprobe.conf, all config files belong into /etc/modprobe.d/.
Sep 21 16:15:13 bdanode03 hwmgmtd[13804]: State change: overall alarm state changed from "Critical" (2) to "Cleared" (1).
Sep 21 16:15:13 bdanode03 hwmgmtd[13804]: State change: alarm state of subsystem "Temperature" changed state from "Critical" (2) to "Cleared" (1).
Sep 21 16:15:13 bdanode03 hwmgmtd[13804]: State change: alarm state of subsystem "Fan Speed" changed state from "Critical" (2) to "Cleared" (1).
...


2. Searching for hwmgmtd in /var/log/messages also shows lots of related errors like:

# grep hwmgmtd /var/log/messages
Oct 25 09:08:52 bdanode03 hwmgmtd[12805]: State change: indicator: /SYS/MB/FM0/OK (ID: 208) changed state from "On" (4) to "Off" (3).
Oct 25 09:08:52 bdanode03 hwmgmtd[12805]: State change: indicator: /SYS/MB/FM1/OK (ID: 209) changed state from "On" (4) to "Off" (3).
Oct 25 09:08:52 bdanode03 hwmgmtd[12805]: State change: service indicator: /SYS/SERVICE (ID: 213) changed state from "Off" (3) to "On" (4).
Oct 25 09:08:52 bdanode03 hwmgmtd[12805]: State change: locator indicator: /SYS/LOCATE (ID: 214) changed state from "Off" (3) to "On" (4).
Oct 25 09:08:52 bdanode03 hwmgmtd[12805]: State change: indicator: /SYS/SP/OK (ID: 215) changed state from "On" (4) to "Off" (3).
Oct 25 09:08:52 bdanode03 hwmgmtd[12805]: State change: indicator: /SYS/PS_FAULT (ID: 217) changed state from "Off" (3) to "On" (4).
Oct 25 09:09:54 bdanode03 hwmgmtd[12805]: State change: indicator: /SYS/MB/FM0/OK (ID: 208) changed state from "Off" (3) to "On" (4).
...

3. But the ILOM snapshot shows: the Fault leds are off, the fma did not log any fault, and the sel events are clear as well.

Cause

There are several bugs which may be related to the ILOM sending incorrect sensor readings via the  IPMI when being polled by hwmgmtd.

It is not certain which bug is causing this particular problem.

Solution

The recommendation is to upgrade to the latest ILOM version. However upgrading the ILOM is not support on BDA V4.2 .  This is not supported because BDA hardware checks (and therefore cluster checks) do not support this version.

As workaround disable the hwmgmtd daemon. This is supported on the BDA as no BDA monitoring functions rely on it.  To disable the hwmgmtd daemon follow the steps in: How to Disable the Oracle Server Hardware Management Agent (Hardware Management Agent) Daemon hwmgmtd on BDA V4.2 (Doc ID 2081716.1).

References

<BUG:21764888> - OHMP EVENT SHOWN, BUT NO RELATED EVENT SEL NOR FMA

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback