Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-79-1607802.1
Update Date:2017-10-27
Keywords:

Solution Type  Predictive Self-Healing Sure

Solution  1607802.1 :   Steps to Gracefully Shutting Down and Powering On Oracle Big Data Appliance  


Related Items
  • Big Data Appliance Integrated Software
  •  
  • Big Data Appliance Hardware
  •  
Related Categories
  • PLA-Support>Eng Systems>BDA>Big Data Appliance>DB: BDA_EST
  •  




In this Document
Purpose
Scope
Details
 Prerequisites
 Shutdown Procedure
 Stop Software Services
 Stop NoSQL Services
 Stop HDFS Services
 Stop ODI Agents
 EM Agents
 ASR Agents
 Unmount NFS Mounts
 Shutdown Nodes
 Poweroff HW Components
 Startup Procedure
 Powering On
 Powering On Servers Remotely Using Oracle ILOM
 On-Disk Encryption Enabled Clusters
 Start Software Services
 Start NoSQL Services
 Start HDFS Services
 Start ODI Agents
 Mount NFS Mounts
References


Applies to:

Big Data Appliance Integrated Software - Version 2.2.1 and later
Big Data Appliance Hardware - Version All Versions and later
Linux x86-64

Purpose

 This document provides steps to follow to gracefully shutdown / poweroff and startup / poweron an Oracle Big Data Appliance(BDA).  

Scope

Oracle Big Data Appliance (BDA) Administrators, Oracle Advanced Customer Support (ACS), Oracle Field Engineers, etc.

Details

Prerequisites

In the steps documented dcli is used to stop services across nodes in the BDA cluster. The dcli utility is usually setup on node01 of the cluster, so unless stated a particular node, execute the commands listed in this document as root user from node01 of the cluster.

Ensure that passwordless-ssh is setup on node1 of cluster by running:

dcli -C hostname

If it is not setup, set it up by executing below command:

setup-root-ssh -C 

Shutdown Procedure

Below procedures need to be followed for gracefully shutting down BDA and all related Hardware Components.

Stop Software Services

Below software services running on BDA that need to be stopped prior to shutting down the nodes.

Stop NoSQL Services

If NoSQL cluster then stop the Storage Node Agent on all the BDA nodes prior to shutdown

On BDA, NoSQL server agents are setup to be run as a service to follow the unix service paradigm.

a) To check the status of the NoSQL server agent

# dcli -C service nsdbservice status 

b) Command to stop NoSQL server agent, if it is running:

# dcli -C service nsdbservice stop

# dcli -C service nsdbservice status

Stop HDFS Services

If CDH cluster then all Hadoop Distributed File System (HDFS) services need to be stopped.

1) Stop all services managed by Cloudera Manager (CM). Starting with BDA V2.1.2, Cloudera Manger 4.5 release manages all the HDFS services running on BDA.

a) Log into CM as admin user in a browser by bringing up http://<node03>:7180

b) Stop 'All Services'.

From Cloudera Manager:

Services > All Services > Actions > Stop

Stop Hadoop Services

c) Stop 'Cloudera Management Service'.

From Cloudera Manager:

Services >  Cloudera Management Services > Actions > Stop

Stop CM Services

2) Stop the Cloudera Manager agents and server from the command line as 'root' user.

a) From Node 1 stop the CM agents on all nodes:

# dcli -C service cloudera-scm-agent status
# dcli -C service cloudera-scm-agent stop
# dcli -C service cloudera-scm-agent status 

b) From the Cloudera Manager master node which is server 3 by default, use the below commands to stop the CM server.

Note: 'dcli  -c  <node03>  service cloudera-scm-server stop' can be used from node01 to stop CM server.

service cloudera-scm-server status
service cloudera-scm-server stop

  Sample Output:

# service cloudera-scm-server status
  
cloudera-scm-server (pid  21476) is running...
# service cloudera-scm-server stop
  
Stopping cloudera-scm-server:                              [  OK  ]

 Once Cloudera Manager server is stopped you cannot access it via the web console.


3) Prior to BDA V2.1.2 Hive Service is not managed by CM. So only on BDA release prior to 2.1.2 follow below steps to stop Hive Service.

a) Stop Hive server from the command line as root from the Hive server master node which is server 3.

service hive-server status
service hive-server stop

Note: 'dcli  -c  <node03>  service hive-server stop' can be used from node01 to stop hive-server.

Sample Output:

# service hive-server status
  
Checking for service : hive-server is running              [  OK  ]

# service hive-server stop
  
Stopping (hadoop-hive-server):                             [  OK  ]

b) Stop hive-metastore service from the command line as root from the Hive server master node which is server 3.

service hive-metastore status
service hive-metastore stop

Note: 'dcli  -c  <node03>  service hive-metastore stop' can be used from node01 to stop hive-metastore .

Sample Output:

# service hive-metastore status
  
Checking for service : metastore is running              [  OK  ]

# service hive-metastore stop
  
Stopping  (hive-metastore):                             [  OK  ]
Stop ODI Agents

If Oracle Data Integrator (ODI) Application Adapter for Hadoop Agents are running on the BDA then stop the agents before shutting down the BDA nodes.

On BDA, ODI is setup to be run as a service to follow the unix service paradigm.

a) To check the status of the ODI service 

# dcli -C service odi-agent status 

b) Command to stop the ODI agent, if it is running:

# dcli -C service odi-agent stop

# dcli -C service odi-agent status

For list of available scripts and tools provided in the /opt/oracle/odiagent-11.1.1.6.5/agent_standalone/oracledi/agent/bin directory refer to http://docs.oracle.com/cd/E28280_01/install.1111/e16453/overview.htm#r15c1-t6

EM Agents

If Oracle Enterprise Manager (OEM) Agents are installed on BDA then the Agents are stopped and started automatically on shutdown and startup.

When node is down the OEM console states Agent is unreachable and all the targets monitored by that agent are marked with an X . But when the node is started back the agent gets auto started and EM Console refresh will display the agent and all related targets as healthy.

Thus NO specific action needs to be performed for OEM Agents before shutting down or after starting up BDA.

ASR Agents

No actions need to be performed for Oracle Auto Service Request (ASR) Agents running on BDA.

Unmount NFS Mounts

1) Once mammoth has been deployed there is an NFS directory shared on all nodes.

NFS mount is created as an automount. On BDA /etc/auto.master includes file /etc/auto_direct_bda and that file contains the entry for the NFS shared directory.

/opt/shareddir -rw,tcp,soft,intr,timeo=10,retrans=10 <node03>:/opt/exportdir

So NFS mount is supposed to be automatically mounted when accessed and automatically unmounted after not being accessed for a while.

But if the server (node03) that contains the NFS mount source (/opt/exportdir) is shutdown, then trying to shutdown any other host will hang. This is issue is being addressed in internal Bug 17215768.

As a workaround to resolve Bug 17215768, umount all nfs mounts before doing the shutdown.

 a) Check/Access the nfs mount    

# dcli -C ls /opt/shareddir 
<private-ip-node01>: oracle
............
<private-ip-lastnode>: oracle

  

#  dcli -C  mount | grep shareddir 
<private-ip-node01>: <node03>:/opt/exportdir on /opt/shareddir type nfs (rw,tcp,soft,intr,timeo=10,retrans=10,addr=192.**.41.**)
<private-ip-node02>: <node03>:/opt/exportdir on /opt/shareddir type nfs (rw,tcp,soft,intr,timeo=10,retrans=10,addr=192.**.41.**)
<private-ip-node03>: /opt/exportdir on /opt/shareddir type none (rw,bind)
...
<private-ip-lastnode>: <node03>:/opt/exportdir on /opt/shareddir type nfs (rw,tcp,soft,intr,timeo=10,retrans=10,addr=192.**.41.**)

b) Unmount /opt/shareddir

# dcli -C umount /opt/shareddir

2) If Fuse-DFS is setup then it needs to be unmounted prior to shutdown

 a) Execute below command to check for Fuse-DFS mount points.    

dcli -C mount -l | grep fuse 

Sample output: fuse_dfs on <mount> type fuse.fuse_dfs indicates a Fuse-DFS mount.

*.*.*.12:fuse_dfs on /mnt/hdfs-nnmount type fuse.fuse_dfs (rw,nosuid,nodev,allow_other,allow_other,default_permissions)

b) Unmount the Fuse-DFS mount point on the servers where it exists.  For example for a Fuse-DFS mount point on server *.*.*.12:

# dcli -c *.*.*.12 umount /mnt/hdfs-nnmount


c) Use below commands to verify fuse-dfs is unmounted

Try to access with 'ls' after 'umount'.  No output should be returned.

# dcli -c *.*.*.12 ls /mnt/hdfs-nnmount

'mount -l' should return no output

dcli -C mount -l | grep fuse  

3) Also check for any custom NFS mounts(mount -l) and unmount them as well.

Shutdown Nodes

Use the Linux shutdown command to power off the servers. Enter this command as root to shut down a server: 

# shutdown -h now

The dcli utility can be used to run the shutdown command on multiple servers at the same time. Do not run the dcli utility from a server that will be shut down.

The following command shows the syntax of the command: 

# dcli -g group_name shutdown -h now 

In this command, group_name is a file that contains a list of servers.

Here is an example of sample group_name file:

mybdanode02
mybdanode03
.............
..........
mybdalastnode 

The following example shuts down all Oracle Big Data Appliance servers listed in the server_group file: 

# dcli -g server_group shutdown -h now 

Finally shutdown node01 from which dcli command to shutdown the rest of the nodes in the cluster is executed:

shutdown -h now

Poweroff HW Components

To stop the network switches, turn off a PDU or a breaker in the data center. The switches only turn off when power is removed.

The network switches do not have power buttons. They shut down only when power is removed

To stop the switches, turn off all breakers in the two PDUs.

Startup Procedure

Powering On

Oracle Big Data Appliance is powered on by either pressing the power button on the front of the servers, or by logging in to the Oracle ILOM interface and applying power to the system.

To power on Oracle Big Data Appliance: 

  1. Turn on all 12 breakers on both PDUs.

    Allow 4 to 5 minutes for Oracle ILOM to start.

  2. Power up the servers.

Powering On Servers Remotely Using Oracle ILOM

If the servers do not start automatically, then you can start them locally by pressing the power button on the front of the servers, or remotely by using Oracle ILOM. Oracle ILOM has several interfaces, including a command-line interface (CLI) and a web console. Use whichever interface you prefer.

For example, you can log into the web interface as root and start the server from the Remote Power Control page. The URL for Oracle ILOM is the same as for the host, except that it typically has a -c or -ilom extension. This URL connects to Oracle ILOM for bda1node4: 

http://bda1node04-ilom.example.com 

On-Disk Encryption Enabled Clusters

For HDFS Clusters which are release 2.5 and above there is an option to enable On-Disk Encryption.

If password-based, on-disk encryption is enabled, then mount-hadoop-dirs need to be executed prior to starting HDFS services.

A) Check if On-Disk Encryption is enabled

i) In BDA 3.0 and higher execute below command to check   

#  bdacli getinfo cluster_disk_encryption_enabled
true

  
If the output is true then continue on with step B

ii) In BDA 2.5 release check in /opt/oracle/BDAMammoth/mammoth-<rack_name>.params file if DISK_ENCRYPTION_ENABLED is set to true 

DISK_ENCRYPTION_ENABLED=true
USE_TPM_ENCRYPTION=false
DISK_ENCRYPTION_PWD=**** 

If DISK_ENCRYPTION_ENABLED is set to true then continue on with step B

B) Use dcli to execute  mount-hadoop-dirs on all nodes in the cluster.

1) Log into node1 of the primary rack as root user

2) mount-hadoop-dirs prompts for a password used to setup encryption, so interactive user input is needed during execution. As the password is static value and is same on all nodes in the cluster , pass in password via stdin ( e.g. 'echo "value" | cmd' ) to dcli command.

Syntax 

# dcli -C 'echo "<passwd>" |  mount_hadoop_dirs'


If password used to setup encryption is welcome1 then  the command is

# dcli -C 'echo "welcome1" |  mount_hadoop_dirs' 

Doc ID 1615029.1 Tips to Execute Commands that Require User Input using dcli on Oracle Big Data Appliance(BDA)

Start Software Services

Cloudera Manager service on node03 will be auto started on boot. But HDFS Cluster and Managment Services need to be manually started in ClouderaManager (CM). Also /opt/shareddir NFS mount is auto mounted when accessed and no need to manually mount after startup.

Start NoSQL Services

If NoSQL cluster then no need to start NoSQL server agents as server agent service is setup to start on boot.

a) To check the status of the NoSQL server agent

# dcli -C service nsdbservice status 

b) Also execute bdacheckcluster command (after waiting 10 mins after boot) on the first node of every cluster to ensure that all expected services are up.

Start HDFS Services

1) Check if the Cloudera Manager server and agents are started.

a) On the Cloudera Manager master node which is server 3 by default use the below command.

Note: 'dcli  -c  <node03>  service cloudera-scm-server status' can be used from node01 to check CM server status.

service cloudera-scm-server status

 Sample Output:

# service cloudera-scm-server status 

cloudera-scm-server (pid  11399) is running...

Start the CM server if it is not up with:

# service cloudera-scm-server start 

b) From Node 1 as 'root' user verify the agents are running:

# dcli -C service cloudera-scm-agent status 

Start the agents as below if they are not already started.

# dcli -C service cloudera-scm-agent start 

2) Once confirmed Cloudera Manager service is running, log into CM as admin user in a browser by bringing up http://<node03>:7180

3) Start all services managed by Cloudera Manager (CM). Starting with BDA V2.1.2, Cloudera Manger 4.5 release manages all the HDFS services running on BDA.

a) Start Cloudera Management services

ServicesCloudera Management Services > Actions > Start

Start Mgmt service

b) Start HDFS Cluster by starting 'All Services'

Services > All Services > Actions > Start

Start Clouster

4)  Prior to BDA V2.1.2 Hive Service is not managed by CM. So only on BDA release prior to 2.1.2 follow below steps to start the Hive Service.

a) Check the status of hive-metastore and hive-server as root from the Hive server master node which is server 3.

Note: 'dcli  -c   <node03>  service <hive-*> status' can be used from node01 to check the status.

service hive-metastore status
service hive-server status 

Sample output:

# service hive-metastore status
Hive Metastore is not running                              [FAILED]
# service hive-server status
Hive Server is not running                                 [FAILED]

b) If hive-metastore and hive-server are not running then start them.

Note: 'dcli  -c  <node03>  service <hive-*>  start ' can be used from node01

service hive-metastore start
service hive-server start

 Sample Output:   

# service hive-metastore start
Starting Hive Metastore (hive-metastore):                  [  OK  ]
[root@scaj21bda10 ~]# service hive-server start
Starting Hive Server (hive-server):                        [  OK  ]

5) Execute bdacheckcluster command on the first node of every cluster to ensure that all expected services are up.

Start ODI Agents

If Oracle Data Integrator (ODI) Application Adapter for Hadoop is being used then start the ODI Agent on BDA as needed.

On BDA, ODI is setup to be run as a service to follow the unix service paradigm.  And the ODI service should be generally started on boot.

a) To check the status of the ODI service 

# dcli -C service odi-agent status

b) Command to start the ODI agent, if not started already

# dcli -C service odi-agent start

# dcli -C service odi-agent status

For list of available scripts and tools provided in the /opt/oracle/odiagent-11.1.1.6.5/agent_standalone/oracledi/agent/bin directory refer to http://docs.oracle.com/cd/E28280_01/install.1111/e16453/overview.htm#r15c1-t6

Mount NFS Mounts

1) Mammoth mounted /opt/shareddir is an automount and will be automatically mounted when accessed on all nodes.  So no need to manually mount on startup .

2) If Fuse-DFS mount point entry is part of /etc/fstab file then it's automatically mounted on startup.

Check if Fuse-DFS is mounted

dcli -C mount -l | grep fuse 

Sample output: fuse_dfs on <mount> type fuse.fuse_dfs indicates a Fuse-DFS mount.

*.*.*.12:fuse_dfs on /mnt/hdfs-nnmount type fuse.fuse_dfs (rw,nosuid,nodev,allow_other,allow_other,default_permissions)

But if Fuse-DFS mount point entry is not part of /etc/fstab file then manually mount it on needed nodes 

# mount /mnt/hdfs-nnmount
INFO /data/1/jenkins/workspace/**********/src/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:164 Adding FUSE arg /mnt/hdfs-nnmount
INFO /data/1/jenkins/workspace/***********/src/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option allow_other
  

 3) If any custom NFS mounts are unmounted prior to shutdown and are not listed in /etc/fstab file then manually mount them.

 


Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback