Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-2113987.1
Update Date:2017-08-17
Keywords:

Solution Type  Problem Resolution Sure

Solution  2113987.1 :   Exalytics OVS Server Unexpectedly Rebooted and Seeing Lots Of Smartd Messages in Log before Reboot  


Related Items
  • Oracle Exalytics Software
  •  
  • Exalytics In-Memory Machine X3-4
  •  
Related Categories
  • PLA-Support>Eng Systems>Exalytics>Oracle Exalytics>DB: Exalytics_EST
  •  




In this Document
Symptoms
Changes
Cause
Solution
References


Created from <SR 3-12228609641>

Applies to:

Exalytics In-Memory Machine X3-4 - Version All Versions and later
Oracle Exalytics Software - Version 1.0.0.0.0 and later
Information in this document applies to any platform.

Symptoms

Oracle Exalytics X3-4 server with Patchset 4 (PS4) installed and two OVM 3.2.7 servers suddenly rebooted itself for no apparent reason. This machine had been up and running fine for a few months, but was taken down for hardware battery replacement.
After battery replacement, the server was successfully restarted and ran fine for a few days before suddenly rebooting today. 

The /var/log/messages file shows many messages like:

Feb 23 07:01:45 exaovm1 smartd[12979]: Device: /dev/sdaa [SAT], 130107444297728 Offline uncorrectable sectors (changed +386547056640)

These messages match the known issue described in <Document 1904921.1> - Exalytics /var/log/messages Shows Smartd Error: "smartd[10687]: Device: /dev/sdd [SAT]...Offline uncorrectable sectors", but according to the note, the issue does not occur in Patchset 4 (PS4).  This machine is running PS4.

Changes

Hardware battery replacement.

Cause

NTP server time was off as evidenced in the log files. 
 
Messages show:

Feb 23 07:01:45 exaovm1 smartd[12979]: Device: /dev/sdaa [SAT], 130107444297728 Offline uncorrectable sectors (changed +386547056640)
Feb 23 07:28:44 exaovm1 rsnapshot[23516]: /usr/bin/rsnapshot daily: completed successfully
...
Feb 23 07:28:44 exaovm1 rsnapshot[23516]: /usr/bin/rsnapshot daily: completed successfully
Feb 23 02:38:33 exaovm1 syslogd 1.4.1: restart.
Feb 23 02:38:33 exaovm1 kernel: klogd 1.4.1, log source = /proc/kmsg started.
...
Feb 23 02:38:40 exaovm1 kernel: FS-Cache: Netfs 'nfs' registered for caching
Feb 23 02:38:44 exaovm1 kernel: bonding: bondib0: link status definitely up for interface ib1, 4294967295 Mbps full duplex.
Feb 23 07:38:55 exaovm1 ntpdate[11922]: step time server 10.2.73.40 offset 17994.972014 sec                                              *** NOTE: NTP time correction.***
Feb 23 07:38:55 exaovm1 ntpd[11926]: ntpd 4.2.2p1@1.1570-o Thu Jul 19 04:33:53 UTC 2012 (1)

These systems are OCFS2 cluster system and the timing between the servers must be within seconds.  Current system has only one NTP server configured.

Solution

Configure extra NTP servers to prevent this issue.  Three to four NTP servers are recommended. Please see steps in <Document 1554253.1>, or steps in the VM Install Guide.
 

References

<NOTE:1904921.1> - Exalytics /var/log/messages Shows Smartd Error: "smartd[10687]: Device: /dev/sdd [SAT]...Offline uncorrectable sectors"
<NOTE:1554253.1> - NTP Server and Client Configuration
<NOTE:238278.1> - Linux: What's OCFS or OCFS2

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback