![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Solution Type Predictive Self-Healing Sure Solution 1607802.1 : Steps to Gracefully Shutting Down and Powering On Oracle Big Data Appliance
In this Document
Applies to:Big Data Appliance Integrated Software - Version 2.2.1 and laterBig Data Appliance Hardware - Version All Versions and later Linux x86-64 PurposeThis document provides steps to follow to gracefully shutdown / poweroff and startup / poweron an Oracle Big Data Appliance(BDA). ScopeOracle Big Data Appliance (BDA) Administrators, Oracle Advanced Customer Support (ACS), Oracle Field Engineers, etc. DetailsPrerequisitesIn the steps documented dcli is used to stop services across nodes in the BDA cluster. The dcli utility is usually setup on node01 of the cluster, so unless stated a particular node, execute the commands listed in this document as root user from node01 of the cluster. Ensure that passwordless-ssh is setup on node1 of cluster by running: dcli -C hostname
If it is not setup, set it up by executing below command: setup-root-ssh -C
Shutdown ProcedureBelow procedures need to be followed for gracefully shutting down BDA and all related Hardware Components. Stop Software ServicesBelow software services running on BDA that need to be stopped prior to shutting down the nodes. Stop NoSQL ServicesIf NoSQL cluster then stop the Storage Node Agent on all the BDA nodes prior to shutdown On BDA, NoSQL server agents are setup to be run as a service to follow the unix service paradigm. a) To check the status of the NoSQL server agent # dcli -C service nsdbservice status
b) Command to stop NoSQL server agent, if it is running: # dcli -C service nsdbservice stop # dcli -C service nsdbservice status Stop HDFS ServicesIf CDH cluster then all Hadoop Distributed File System (HDFS) services need to be stopped. 1) Stop all services managed by Cloudera Manager (CM). Starting with BDA V2.1.2, Cloudera Manger 4.5 release manages all the HDFS services running on BDA. a) Log into CM as admin user in a browser by bringing up http://<node03>:7180 b) Stop 'All Services'. From Cloudera Manager: Services > All Services > Actions > Stop c) Stop 'Cloudera Management Service'. From Cloudera Manager: Services > Cloudera Management Services > Actions > Stop 2) Stop the Cloudera Manager agents and server from the command line as 'root' user. a) From Node 1 stop the CM agents on all nodes: # dcli -C service cloudera-scm-agent status
# dcli -C service cloudera-scm-agent stop # dcli -C service cloudera-scm-agent status b) From the Cloudera Manager master node which is server 3 by default, use the below commands to stop the CM server. Note: 'dcli -c <node03> service cloudera-scm-server stop' can be used from node01 to stop CM server. service cloudera-scm-server status
service cloudera-scm-server stop Sample Output: # service cloudera-scm-server status
cloudera-scm-server (pid 21476) is running... # service cloudera-scm-server stop
Stopping cloudera-scm-server: [ OK ] Once Cloudera Manager server is stopped you cannot access it via the web console.
a) Stop Hive server from the command line as root from the Hive server master node which is server 3. service hive-server status
service hive-server stop Note: 'dcli -c <node03> service hive-server stop' can be used from node01 to stop hive-server. Sample Output: # service hive-server status
Checking for service : hive-server is running [ OK ] # service hive-server stop
Stopping (hadoop-hive-server): [ OK ] b) Stop hive-metastore service from the command line as root from the Hive server master node which is server 3. service hive-metastore status
service hive-metastore stop Note: 'dcli -c <node03> service hive-metastore stop' can be used from node01 to stop hive-metastore . Sample Output: # service hive-metastore status
Checking for service : metastore is running [ OK ] # service hive-metastore stop
Stopping (hive-metastore): [ OK ] Stop ODI AgentsIf Oracle Data Integrator (ODI) Application Adapter for Hadoop Agents are running on the BDA then stop the agents before shutting down the BDA nodes. On BDA, ODI is setup to be run as a service to follow the unix service paradigm. a) To check the status of the ODI service # dcli -C service odi-agent status
b) Command to stop the ODI agent, if it is running: # dcli -C service odi-agent stop # dcli -C service odi-agent status For list of available scripts and tools provided in the /opt/oracle/odiagent-11.1.1.6.5/agent_standalone/oracledi/agent/bin directory refer to http://docs.oracle.com/cd/E28280_01/install.1111/e16453/overview.htm#r15c1-t6 EM AgentsIf Oracle Enterprise Manager (OEM) Agents are installed on BDA then the Agents are stopped and started automatically on shutdown and startup. When node is down the OEM console states Agent is unreachable and all the targets monitored by that agent are marked with an X . But when the node is started back the agent gets auto started and EM Console refresh will display the agent and all related targets as healthy. Thus NO specific action needs to be performed for OEM Agents before shutting down or after starting up BDA. ASR AgentsNo actions need to be performed for Oracle Auto Service Request (ASR) Agents running on BDA. Unmount NFS Mounts1) Once mammoth has been deployed there is an NFS directory shared on all nodes. NFS mount is created as an automount. On BDA /etc/auto.master includes file /etc/auto_direct_bda and that file contains the entry for the NFS shared directory. /opt/shareddir -rw,tcp,soft,intr,timeo=10,retrans=10 <node03>:/opt/exportdir
So NFS mount is supposed to be automatically mounted when accessed and automatically unmounted after not being accessed for a while. a) Check/Access the nfs mount # dcli -C ls /opt/shareddir
<private-ip-node01>: oracle............ <private-ip-lastnode>: oracle
# dcli -C mount | grep shareddir
<private-ip-node01>: <node03>:/opt/exportdir on /opt/shareddir type nfs (rw,tcp,soft,intr,timeo=10,retrans=10,addr=192.**.41.**)<private-ip-node02>: <node03>:/opt/exportdir on /opt/shareddir type nfs (rw,tcp,soft,intr,timeo=10,retrans=10,addr=192.**.41.**) <private-ip-node03>: /opt/exportdir on /opt/shareddir type none (rw,bind) ... <private-ip-lastnode>: <node03>:/opt/exportdir on /opt/shareddir type nfs (rw,tcp,soft,intr,timeo=10,retrans=10,addr=192.**.41.**) b) Unmount /opt/shareddir # dcli -C umount /opt/shareddir
2) If Fuse-DFS is setup then it needs to be unmounted prior to shutdown a) Execute below command to check for Fuse-DFS mount points. dcli -C mount -l | grep fuse
Sample output: fuse_dfs on <mount> type fuse.fuse_dfs indicates a Fuse-DFS mount. *.*.*.12:fuse_dfs on /mnt/hdfs-nnmount type fuse.fuse_dfs (rw,nosuid,nodev,allow_other,allow_other,default_permissions)
b) Unmount the Fuse-DFS mount point on the servers where it exists. For example for a Fuse-DFS mount point on server *.*.*.12: # dcli -c *.*.*.12 umount /mnt/hdfs-nnmount
Try to access with 'ls' after 'umount'. No output should be returned. # dcli -c *.*.*.12 ls /mnt/hdfs-nnmount
'mount -l' should return no output dcli -C mount -l | grep fuse
3) Also check for any custom NFS mounts(mount -l) and unmount them as well. Shutdown NodesUse the Linux # shutdown -h now
The following command shows the syntax of the command: # dcli -g group_name shutdown -h now
In this command, group_name is a file that contains a list of servers. Here is an example of sample group_name file: mybdanode02
mybdanode03 ............. .......... mybdalastnode The following example shuts down all Oracle Big Data Appliance servers listed in the server_group file: # dcli -g server_group shutdown -h now
Finally shutdown node01 from which dcli command to shutdown the rest of the nodes in the cluster is executed: shutdown -h now
Poweroff HW ComponentsTo stop the network switches, turn off a PDU or a breaker in the data center. The switches only turn off when power is removed. The network switches do not have power buttons. They shut down only when power is removed To stop the switches, turn off all breakers in the two PDUs. Startup ProcedurePowering OnOracle Big Data Appliance is powered on by either pressing the power button on the front of the servers, or by logging in to the Oracle ILOM interface and applying power to the system. To power on Oracle Big Data Appliance:
Powering On Servers Remotely Using Oracle ILOMIf the servers do not start automatically, then you can start them locally by pressing the power button on the front of the servers, or remotely by using Oracle ILOM. Oracle ILOM has several interfaces, including a command-line interface (CLI) and a web console. Use whichever interface you prefer. For example, you can log into the web interface as http://bda1node04-ilom.example.com
On-Disk Encryption Enabled ClustersFor HDFS Clusters which are release 2.5 and above there is an option to enable On-Disk Encryption. If password-based, on-disk encryption is enabled, then mount-hadoop-dirs need to be executed prior to starting HDFS services. A) Check if On-Disk Encryption is enabled i) In BDA 3.0 and higher execute below command to check # bdacli getinfo cluster_disk_encryption_enabled
true ii) In BDA 2.5 release check in /opt/oracle/BDAMammoth/mammoth-<rack_name>.params file if DISK_ENCRYPTION_ENABLED is set to true DISK_ENCRYPTION_ENABLED=true
USE_TPM_ENCRYPTION=false DISK_ENCRYPTION_PWD=**** If DISK_ENCRYPTION_ENABLED is set to true then continue on with step B B) Use dcli to execute mount-hadoop-dirs on all nodes in the cluster. 1) Log into node1 of the primary rack as root user 2) mount-hadoop-dirs prompts for a password used to setup encryption, so interactive user input is needed during execution. As the password is static value and is same on all nodes in the cluster , pass in password via stdin ( e.g. 'echo "value" | cmd' ) to dcli command. Syntax # dcli -C 'echo "<passwd>" | mount_hadoop_dirs'
# dcli -C 'echo "welcome1" | mount_hadoop_dirs'
Doc ID 1615029.1 Tips to Execute Commands that Require User Input using dcli on Oracle Big Data Appliance(BDA) Start Software ServicesCloudera Manager service on node03 will be auto started on boot. But HDFS Cluster and Managment Services need to be manually started in ClouderaManager (CM). Also /opt/shareddir NFS mount is auto mounted when accessed and no need to manually mount after startup. Start NoSQL ServicesIf NoSQL cluster then no need to start NoSQL server agents as server agent service is setup to start on boot. a) To check the status of the NoSQL server agent # dcli -C service nsdbservice status
b) Also execute bdacheckcluster command (after waiting 10 mins after boot) on the first node of every cluster to ensure that all expected services are up. Start HDFS Services1) Check if the Cloudera Manager server and agents are started. a) On the Cloudera Manager master node which is server 3 by default use the below command. Note: 'dcli -c <node03> service cloudera-scm-server status' can be used from node01 to check CM server status. service cloudera-scm-server status
Sample Output: # service cloudera-scm-server status
cloudera-scm-server (pid 11399) is running... Start the CM server if it is not up with: # service cloudera-scm-server start
b) From Node 1 as 'root' user verify the agents are running: # dcli -C service cloudera-scm-agent status
Start the agents as below if they are not already started. # dcli -C service cloudera-scm-agent start
2) Once confirmed Cloudera Manager service is running, log into CM as admin user in a browser by bringing up http://<node03>:7180 3) Start all services managed by Cloudera Manager (CM). Starting with BDA V2.1.2, Cloudera Manger 4.5 release manages all the HDFS services running on BDA. a) Start Cloudera Management services Services > Cloudera Management Services > Actions > Start b) Start HDFS Cluster by starting 'All Services' Services > All Services > Actions > Start 4) Prior to BDA V2.1.2 Hive Service is not managed by CM. So only on BDA release prior to 2.1.2 follow below steps to start the Hive Service. a) Check the status of hive-metastore and hive-server as root from the Hive server master node which is server 3. Note: 'dcli -c <node03> service <hive-*> status' can be used from node01 to check the status. service hive-metastore status
service hive-server status Sample output: # service hive-metastore status
Hive Metastore is not running [FAILED] # service hive-server status Hive Server is not running [FAILED] b) If hive-metastore and hive-server are not running then start them. Note: 'dcli -c <node03> service <hive-*> start ' can be used from node01 service hive-metastore start
service hive-server start Sample Output: # service hive-metastore start
Starting Hive Metastore (hive-metastore): [ OK ] [root@scaj21bda10 ~]# service hive-server start Starting Hive Server (hive-server): [ OK ] 5) Execute bdacheckcluster command on the first node of every cluster to ensure that all expected services are up. Start ODI AgentsIf Oracle Data Integrator (ODI) Application Adapter for Hadoop is being used then start the ODI Agent on BDA as needed. On BDA, ODI is setup to be run as a service to follow the unix service paradigm. And the ODI service should be generally started on boot. a) To check the status of the ODI service # dcli -C service odi-agent status
b) Command to start the ODI agent, if not started already # dcli -C service odi-agent start # dcli -C service odi-agent status For list of available scripts and tools provided in the /opt/oracle/odiagent-11.1.1.6.5/agent_standalone/oracledi/agent/bin directory refer to http://docs.oracle.com/cd/E28280_01/install.1111/e16453/overview.htm#r15c1-t6 Mount NFS Mounts1) Mammoth mounted /opt/shareddir is an automount and will be automatically mounted when accessed on all nodes. So no need to manually mount on startup . 2) If Fuse-DFS mount point entry is part of /etc/fstab file then it's automatically mounted on startup. Check if Fuse-DFS is mounted dcli -C mount -l | grep fuse
Sample output: fuse_dfs on <mount> type fuse.fuse_dfs indicates a Fuse-DFS mount. *.*.*.12:fuse_dfs on /mnt/hdfs-nnmount type fuse.fuse_dfs (rw,nosuid,nodev,allow_other,allow_other,default_permissions)
But if Fuse-DFS mount point entry is not part of /etc/fstab file then manually mount it on needed nodes # mount /mnt/hdfs-nnmount
INFO /data/1/jenkins/workspace/**********/src/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:164 Adding FUSE arg /mnt/hdfs-nnmount
INFO /data/1/jenkins/workspace/***********/src/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option allow_other 3) If any custom NFS mounts are unmounted prior to shutdown and are not listed in /etc/fstab file then manually mount them.
Attachments This solution has no attachment |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|