Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-79-2099858.1
Update Date:2017-04-21
Keywords:

Solution Type  Predictive Self-Healing Sure

Solution  2099858.1 :   Steps to Gracefully Shutdown and Power on a Single Node on Oracle Big Data Appliance Prior to Maintenance  


Related Items
  • Big Data Appliance X3-2 Full Rack
  •  
  • Big Data Appliance X5-2 Starter Rack
  •  
  • Big Data Appliance X5-2 Full Rack
  •  
  • Big Data Appliance X3-2 In-Rack Expansion
  •  
  • Big Data Appliance Integrated Software
  •  
  • Big Data Appliance X4-2 Full Rack
  •  
  • Big Data Appliance Hardware
  •  
  • Big Data Appliance X4-2 Starter Rack
  •  
  • Big Data Appliance X5-2 In-Rack Expansion
  •  
  • Big Data Appliance X4-2 In-Rack Expansion
  •  
Related Categories
  • PLA-Support>Eng Systems>BDA>Big Data Appliance>DB: BDA_EST
  •  




In this Document
Purpose
Scope
Details
 Shutdown Procedure to Gracefully Shutdown / Power off a Single Node
 Stop Software Services
 Stop NoSQL Services
 Stop Cloudera Manager and CDH Service Roles
 Stop ODI Agents
 EM Agents
 ASR Agents
 Unmount NFS Mounts
 Shutdown Node
 Startup Procedure to Gracefully Startup / Power on a Single Node
 Powering On
 Powering On Server Remotely Using Oracle ILOM
 Start Software Services
 Start NoSQL Services
 Start Cloudera Manager and CDH Service Roles
 Start ODI Agents
 Mount NFS Mounts


Created from <SR 3-12044100744>

Applies to:

Big Data Appliance Hardware - Version All Versions and later
Big Data Appliance Integrated Software - Version 3.0 and later
Big Data Appliance X5-2 Starter Rack - Version All Versions and later
Big Data Appliance X5-2 In-Rack Expansion - Version All Versions and later
Big Data Appliance X5-2 Full Rack - Version All Versions and later
Linux x86-64

Purpose

This document provides steps to follow to gracefully shutdown / power off and start up / power on a single node in Oracle Big Data Appliance(BDA).

Scope

 Oracle Big Data Appliance (BDA) Administrators, Oracle Advanced Customer Support (ACS), Oracle Field Engineers, etc

Details

Shutdown Procedure to Gracefully Shutdown / Power off a Single Node

Below procedures need to be followed for gracefully shutting down a single node of the BDA.

Stop Software Services

Below software services running on BDA that need to be stopped prior to shutting down the nodes.

Stop NoSQL Services

If the cluster is a NoSQL cluster then stop the Storage Node Agent on the BDA nodes prior to shutdown.

On BDA, NoSQL server agents are setup to be run as a service to follow the UNIX service paradigm.

a) To check the status of the NoSQL server agent:

# service nsdbservice status

b) Command to stop NoSQL server agent, if it is running:

# service nsdbservice stop

# service nsdbservice status

Stop Cloudera Manager and CDH Service Roles

If the cluster is a CDH cluster then Hadoop Distributed File System (HDFS) services running on that node need to be stopped.

1. Log in to the BDA node where part replacement will be performed prior to the Field Engineer replacing the part.

2. Gracefully shutdown any running applications on the node by following these steps:

a) Log in to Cloudera Manager (CM) as admin user in a browser by bringing up http://<node03>:7180 or http://<node03>:7183 and click on "Hosts" to determine what roles the node has. Click on the > by the node that needs to be shutdown.

b) Under Roles check the roles that the node has. Click on check box for the roles and then click on "Actions for Selected" drop down then click "Stop".

Or for 4.3.0 or higher click on the check box next to the node that needs to be stopped and then click on "Actions for Selected" drop down then click "Stop Roles on Host".

c) From command line ssh to the server. Log in as root.

On the node issue:

# service cloudera-scm-agent status
# service cloudera-scm-agent stop

d) For a node 3 shutdown issue:

# service cloudera-scm-server status
# service cloudera-scm-server stop

Note: this does not need to be done in all cases, only if you are shutting down node3.

Stop ODI Agents

If Oracle Data Integrator (ODI) Application Adapter for Hadoop Agents are running on the BDA then stop the agent on running on the node before shutting down the BDA node.

On BDA, ODI is setup to be run as a service to follow the UNIX service paradigm.

a) To check the status of the ODI service:

#  service odi-agent status

b) Command to stop the ODI agent, if it is running:

# service odi-agent stop

# service odi-agent status

For list of available scripts and tools provided in the /opt/oracle/odiagent-11.1.1.6.5/agent_standalone/oracledi/agent/bin directory refer to http://docs.oracle.com/cd/E28280_01/install.1111/e16453/overview.htm#r15c1-t6

EM Agents

If Oracle Enterprise Manager (OEM) Agents are installed on BDA then the Agents are stopped and started automatically on shutdown and startup.

When a node is down the OEM console states the Agent is unreachable and all the targets monitored by that agent are marked with an X . But when the node is started back the agent gets auto started and EM Console refresh will display the agent and all related targets as healthy.

Thus NO specific action needs to be performed for OEM Agents before shutting down or after starting up BDA.

ASR Agents

No actions need to be performed for Oracle Auto Service Request (ASR) Agents running on BDA.

Unmount NFS Mounts

1) Once mammoth has been deployed there is an NFS directory shared on all nodes.

NFS mount is created as an automount. On BDA /etc/auto.master includes file /etc/auto_direct_bda and that file contains the entry for the NFS shared directory.

/opt/shareddir -rw,tcp,soft,intr,timeo=10,retrans=10 <node03>:/opt/exportdir

So NFS mount is supposed to be automatically mounted when accessed and automatically unmounted after not being accessed for a while.

But if the server (node03) that contains the NFS mount source (/opt/exportdir) is shutdown, then trying to shutdown any other host will hang. This is issue is being addressed in internal Bug 17215768.

As a workaround to resolve Bug 17215768, umount the nfs mount before doing the shutdown.

a) Check/Access the nfs mount

# ls /opt/shareddir

Example output:

# ls /opt/shareddir
data oracle spatial

 

# mount | grep shareddir

Example output:

<node03>:/opt/exportdir on /opt/shareddir type nfs (rw,tcp,soft,intr,timeo=10,retrans=10,sloppy,vers=4,addr=192.xxx.x.xx,clientaddr=192.xxx.x.xx)

b) Unmount /opt/shareddir

# umount /opt/shareddir

2) If Fuse-DFS is setup then it needs to be unmounted prior to shutdown

a) Execute below command to check for Fuse-DFS mount points.

# mount -l | grep fuse

Sample output:

fuse_dfs on <mount> type fuse.fuse_dfs indicates a Fuse-DFS mount.
fuse_dfs on /mnt/hdfs-nnmount type fuse.fuse_dfs (rw,nosuid,nodev,allow_other,allow_other,default_permissions)

b) Unmount the Fuse-DFS mount point on the servers where it exists. For example for a Fuse-DFS mount point on server *.*.*.12:

# umount /mnt/hdfs-nnmount

c) Use below commands to verify fuse-dfs is unmounted

Try to access with 'ls' after 'umount'. No output should be returned.

# ls /mnt/hdfs-nnmount

'mount -l' should return no output

# mount -l | grep fuse

3) Also check for any custom NFS mounts(mount -l) and unmount them as well.

# mount -l

 

Shutdown Node

1. Use the Linux shutdown command to power off the server. Enter this command as root to shut down a server:

# shutdown -hP now

Include the "-P" option to also power down the server after halting.

Or

Use ILOM to power down the server

1. Log into the web interface as root and stop the server from the Host Management > Power Control page.

2. For "Select Action" drop down select "Graceful Shutdown and Power Off".

 

Startup Procedure to Gracefully Startup / Power on a Single Node

Powering On

After part replacement the FE should power on the server and check ILOM system status.

An Oracle Big Data Appliance Node is powered on locally by pressing the power button after the ILOM has booted, or remotely using the ILOM interface as described in the sub-section below.

Powering On Server Remotely Using Oracle ILOM

If the server does not start automatically, then you can start it locally by pressing the power button on the front of the server, or remotely by using Oracle ILOM. Oracle ILOM has several interfaces, including a web console and command-line interface (CLI). Use whichever interface you prefer.

1. From the ILOM Web Console:

The URL for Oracle ILOM is the same as for the host, except that it typically has a -c or -ilom extension. This URL connects to Oracle ILOM for bda1node4:
http://bda1node04-ilom.example.com

a) Log into the Oracle ILOM web interface as root and start the server from the Host Management > Power Control page.

b) For "Select Action" drop down select "Power On".

2. From the CLI:

a) ssh into the ILOM as 'root' user:

# ssh root@<hostname>-ilom

b) run 'start /System' to power on the server.

-> start /System

See: The Oracle ILOM Quick Reference for CLI Commands, Section on Host and System Control.

Start Software Services

Cloudera Manager service on node03 will be auto started on boot. But HDFS Cluster and Management Services need to be manually started in Cloudera Manager (CM). Also /opt/shareddir NFS mount is auto mounted when accessed and so there is no need to manually mount after startup.

Start NoSQL Services

If NoSQL cluster then no need to start NoSQL server agents as server agent service is setup to start on boot.

a) To check the status of the NoSQL server agent:

# service nsdbservice status

b) Also execute bdacheckcluster command (after waiting 10 mins after boot) on the first node of every cluster to ensure that all expected services are up.

Start Cloudera Manager and CDH Service Roles

After the server is powered on the services should be automatically started:

1. From command line ssh to the server. Log in as root.

a. For node 3 (only if the server was shut down in step above you would also issue the following to check if the cloudera-scm-server is started):

# service cloudera-scm-server status

If the services are not started on the node you can issue a command to start the service:

# service cloudera-scm-server start

b. For all nodes issue the following to check if the cloudera-scm-agent is started:

# service cloudera-scm-agent status

If the service are not started on the node you can issue a command to start the service:

# service cloudera-scm-agent start

Once you have confirmed the services are up  continue with the next step.

2. Restart Role services in Cloudera Manager:
a. Log in to Cloudera Manager and click on "Hosts".

b. Under Roles you can see what roles the node has.

c. Click on check box for the roles and then select "Actions for Selected" drop down, then click start.

Or for 4.3.0 or higher click on the check box next to the node that needs to be started click on "Actions for Selected" drop down then click "Start Roles on Host".

3. Verify that health of all Roles for the node in Cloudera Manager are healthy.

Start ODI Agents

If Oracle Data Integrator (ODI) Application Adapter for Hadoop is being used then start the ODI Agent on BDA as needed.

On BDA, ODI is setup to be run as a service to follow the UNIX service paradigm. The ODI service should be generally started on boot.

1. To check the status of the ODI service

# service odi-agent status

2. Command to start the ODI agent, if not started already

# service odi-agent start

# service odi-agent status

For list of available scripts and tools provided in the /opt/oracle/odiagent-11.1.1.6.5/agent_standalone/oracledi/agent/bin directory refer to http://docs.oracle.com/cd/E28280_01/install.1111/e16453/overview.htm#r15c1-t6

Mount NFS Mounts

1. Mammoth mounted /opt/shareddir is an automount and will be automatically mounted when accessed on all nodes. So no need to manually mount on startup.

2. If Fuse-DFS mount point entry is part of /etc/fstab file then it's automatically mounted on startup.

Check if Fuse-DFS is mounted

# mount -l | grep fuse

Sample output: fuse_dfs on <mount> type fuse.fuse_dfs indicates a Fuse-DFS mount.

fuse_dfs on /mnt/hdfs-nnmount type fuse.fuse_dfs (rw,nosuid,nodev,allow_other,allow_other,default_permissions)

But if Fuse-DFS mount point entry is not part of /etc/fstab file then manually mount it on needed nodes

# mount /mnt/hdfs-nnmount
INFO /data/1/jenkins/workspace/**********/src/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:164 Adding FUSE arg /mnt/hdfs-nnmount
INFO /data/1/jenkins/workspace/***********/src/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option allow_other

3. If any custom NFS mounts are unmounted prior to shutdown and are not listed in /etc/fstab file then manually mount them.

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback