Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-2096087.1
Update Date:2017-10-16
Keywords:

Solution Type  Technical Instruction Sure

Solution  2096087.1 :   How to Check if a Reboot is Due to a Node being Fenced out of an OCFS2 / o2cb Cluster  


Related Items
  • Oracle Exalogic Elastic Cloud Software
  •  
  • Oracle VM
  •  
  • Linux OS
  •  
  • Private Cloud Appliance
  •  
  • Private Cloud Appliance X5-2 Hardware
  •  
Related Categories
  • PLA-Support>Infrastructure>Operating Systems and Virtualization>Virtualization>Oracle VM
  •  




In this Document
Goal
Solution


Created from <SR 3-11985344801>

Applies to:

Oracle VM - Version 3.0.1 and later
Linux OS - Version Enterprise Linux 3.0 and later
Private Cloud Appliance - Version 1.0.1 and later
Oracle Exalogic Elastic Cloud Software - Version 2.0.6.2.2 to 2.0.6.2.2
Private Cloud Appliance X5-2 Hardware
Linux x86-64

Goal

 On Oracle Linux, Oracle VM and Oracle Private Cloud Appliance (PCA), when making use of the Oracle Clustered File System version 2 (OCFS2), it may be sometimes difficult to determine if a server which rebooted has been fenced out of the ocfs2 cluster due to not writing its heartbeat in time or if the cause is external to the o2cb cluster.

Solution

 In several node fencing situations looking only at the node that had an unexpected reboot log files, there is no track of any logging describing the potential root cause of the reboot - The syslog ends abruptly and then a new server boot is recorded, e.g. because the fenced node lost access to its clustered file system.

A node gets fenced out of the o2cb cluster when it does not write its heartbeat on the shared filesystem for O2CB_HEARTBEAT_THRESHOLD times two seconds (this parameter is defined on each node in the /etc/sysconfig/o2cb configuration file).

However, on the surviving nodes of the o2cb cluster (for Oracle VM, on other nodes of the server pool, on PCA's on the other compute nodes), the following message is reliably logged at least on one member of the cluster / server pool :

ovs1 kernel: o2cb: o2dlm has evicted node X from domain ovm

So to check if a node has been evicted, the surviving nodes are often giving more leads about a possible fence than the fenced node.

Running a command in the lines of :

# grep "has evicted node" /var/log/messages*

on the surviving nodes of the cluster/pool often gives an initial lead if this is a cluster eviction or not. 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback