![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||||||||||||||||||||||||||||
Solution Type Predictive Self-Healing Sure Solution 2099858.1 : Steps to Gracefully Shutdown and Power on a Single Node on Oracle Big Data Appliance Prior to Maintenance
In this Document
Created from <SR 3-12044100744> Applies to:Big Data Appliance Hardware - Version All Versions and laterBig Data Appliance Integrated Software - Version 3.0 and later Big Data Appliance X5-2 Starter Rack - Version All Versions and later Big Data Appliance X5-2 In-Rack Expansion - Version All Versions and later Big Data Appliance X5-2 Full Rack - Version All Versions and later Linux x86-64 PurposeThis document provides steps to follow to gracefully shutdown / power off and start up / power on a single node in Oracle Big Data Appliance(BDA). ScopeOracle Big Data Appliance (BDA) Administrators, Oracle Advanced Customer Support (ACS), Oracle Field Engineers, etc DetailsShutdown Procedure to Gracefully Shutdown / Power off a Single NodeBelow procedures need to be followed for gracefully shutting down a single node of the BDA. Stop Software ServicesBelow software services running on BDA that need to be stopped prior to shutting down the nodes. Stop NoSQL ServicesIf the cluster is a NoSQL cluster then stop the Storage Node Agent on the BDA nodes prior to shutdown. On BDA, NoSQL server agents are setup to be run as a service to follow the UNIX service paradigm. a) To check the status of the NoSQL server agent: # service nsdbservice status
b) Command to stop NoSQL server agent, if it is running: # service nsdbservice stop # service nsdbservice status Stop Cloudera Manager and CDH Service RolesIf the cluster is a CDH cluster then Hadoop Distributed File System (HDFS) services running on that node need to be stopped. 1. Log in to the BDA node where part replacement will be performed prior to the Field Engineer replacing the part. 2. Gracefully shutdown any running applications on the node by following these steps: a) Log in to Cloudera Manager (CM) as admin user in a browser by bringing up http://<node03>:7180 or http://<node03>:7183 and click on "Hosts" to determine what roles the node has. Click on the > by the node that needs to be shutdown. b) Under Roles check the roles that the node has. Click on check box for the roles and then click on "Actions for Selected" drop down then click "Stop". Or for 4.3.0 or higher click on the check box next to the node that needs to be stopped and then click on "Actions for Selected" drop down then click "Stop Roles on Host". c) From command line ssh to the server. Log in as root. On the node issue: # service cloudera-scm-agent status
# service cloudera-scm-agent stop d) For a node 3 shutdown issue: # service cloudera-scm-server status
# service cloudera-scm-server stop Note: this does not need to be done in all cases, only if you are shutting down node3. Stop ODI AgentsIf Oracle Data Integrator (ODI) Application Adapter for Hadoop Agents are running on the BDA then stop the agent on running on the node before shutting down the BDA node. On BDA, ODI is setup to be run as a service to follow the UNIX service paradigm. a) To check the status of the ODI service: # service odi-agent status
b) Command to stop the ODI agent, if it is running: # service odi-agent stop # service odi-agent status For list of available scripts and tools provided in the /opt/oracle/odiagent-11.1.1.6.5/agent_standalone/oracledi/agent/bin directory refer to http://docs.oracle.com/cd/E28280_01/install.1111/e16453/overview.htm#r15c1-t6 EM AgentsIf Oracle Enterprise Manager (OEM) Agents are installed on BDA then the Agents are stopped and started automatically on shutdown and startup. When a node is down the OEM console states the Agent is unreachable and all the targets monitored by that agent are marked with an X . But when the node is started back the agent gets auto started and EM Console refresh will display the agent and all related targets as healthy. Thus NO specific action needs to be performed for OEM Agents before shutting down or after starting up BDA. ASR AgentsNo actions need to be performed for Oracle Auto Service Request (ASR) Agents running on BDA. Unmount NFS Mounts1) Once mammoth has been deployed there is an NFS directory shared on all nodes. NFS mount is created as an automount. On BDA /etc/auto.master includes file /etc/auto_direct_bda and that file contains the entry for the NFS shared directory. /opt/shareddir -rw,tcp,soft,intr,timeo=10,retrans=10 <node03>:/opt/exportdir
So NFS mount is supposed to be automatically mounted when accessed and automatically unmounted after not being accessed for a while. But if the server (node03) that contains the NFS mount source (/opt/exportdir) is shutdown, then trying to shutdown any other host will hang. This is issue is being addressed in internal Bug 17215768. As a workaround to resolve Bug 17215768, umount the nfs mount before doing the shutdown. a) Check/Access the nfs mount # ls /opt/shareddir
Example output: # ls /opt/shareddir
data oracle spatial
# mount | grep shareddir
Example output: <node03>:/opt/exportdir on /opt/shareddir type nfs (rw,tcp,soft,intr,timeo=10,retrans=10,sloppy,vers=4,addr=192.xxx.x.xx,clientaddr=192.xxx.x.xx)
b) Unmount /opt/shareddir # umount /opt/shareddir
2) If Fuse-DFS is setup then it needs to be unmounted prior to shutdown a) Execute below command to check for Fuse-DFS mount points. # mount -l | grep fuse
Sample output: fuse_dfs on <mount> type fuse.fuse_dfs indicates a Fuse-DFS mount.
fuse_dfs on /mnt/hdfs-nnmount type fuse.fuse_dfs (rw,nosuid,nodev,allow_other,allow_other,default_permissions) b) Unmount the Fuse-DFS mount point on the servers where it exists. For example for a Fuse-DFS mount point on server *.*.*.12: # umount /mnt/hdfs-nnmount
c) Use below commands to verify fuse-dfs is unmounted Try to access with 'ls' after 'umount'. No output should be returned. # ls /mnt/hdfs-nnmount
'mount -l' should return no output # mount -l | grep fuse
3) Also check for any custom NFS mounts(mount -l) and unmount them as well. # mount -l
Shutdown Node1. Use the Linux shutdown command to power off the server. Enter this command as root to shut down a server: # shutdown -hP now
Include the "-P" option to also power down the server after halting. Or Use ILOM to power down the server 1. Log into the web interface as root and stop the server from the Host Management > Power Control page. 2. For "Select Action" drop down select "Graceful Shutdown and Power Off".
Startup Procedure to Gracefully Startup / Power on a Single NodePowering OnAfter part replacement the FE should power on the server and check ILOM system status. An Oracle Big Data Appliance Node is powered on locally by pressing the power button after the ILOM has booted, or remotely using the ILOM interface as described in the sub-section below. Powering On Server Remotely Using Oracle ILOMIf the server does not start automatically, then you can start it locally by pressing the power button on the front of the server, or remotely by using Oracle ILOM. Oracle ILOM has several interfaces, including a web console and command-line interface (CLI). Use whichever interface you prefer. 1. From the ILOM Web Console: The URL for Oracle ILOM is the same as for the host, except that it typically has a -c or -ilom extension. This URL connects to Oracle ILOM for bda1node4: a) Log into the Oracle ILOM web interface as root and start the server from the Host Management > Power Control page. b) For "Select Action" drop down select "Power On". 2. From the CLI: a) ssh into the ILOM as 'root' user: # ssh root@<hostname>-ilom
b) run 'start /System' to power on the server. -> start /System
See: The Oracle ILOM Quick Reference for CLI Commands, Section on Host and System Control. Start Software ServicesCloudera Manager service on node03 will be auto started on boot. But HDFS Cluster and Management Services need to be manually started in Cloudera Manager (CM). Also /opt/shareddir NFS mount is auto mounted when accessed and so there is no need to manually mount after startup. Start NoSQL ServicesIf NoSQL cluster then no need to start NoSQL server agents as server agent service is setup to start on boot. a) To check the status of the NoSQL server agent: # service nsdbservice status
b) Also execute bdacheckcluster command (after waiting 10 mins after boot) on the first node of every cluster to ensure that all expected services are up. Start Cloudera Manager and CDH Service RolesAfter the server is powered on the services should be automatically started: 1. From command line ssh to the server. Log in as root. a. For node 3 (only if the server was shut down in step above you would also issue the following to check if the cloudera-scm-server is started): # service cloudera-scm-server status
If the services are not started on the node you can issue a command to start the service: # service cloudera-scm-server start
b. For all nodes issue the following to check if the cloudera-scm-agent is started: # service cloudera-scm-agent status
If the service are not started on the node you can issue a command to start the service: # service cloudera-scm-agent start
Once you have confirmed the services are up continue with the next step. 2. Restart Role services in Cloudera Manager: b. Under Roles you can see what roles the node has. c. Click on check box for the roles and then select "Actions for Selected" drop down, then click start. Or for 4.3.0 or higher click on the check box next to the node that needs to be started click on "Actions for Selected" drop down then click "Start Roles on Host". 3. Verify that health of all Roles for the node in Cloudera Manager are healthy. Start ODI AgentsIf Oracle Data Integrator (ODI) Application Adapter for Hadoop is being used then start the ODI Agent on BDA as needed. On BDA, ODI is setup to be run as a service to follow the UNIX service paradigm. The ODI service should be generally started on boot. 1. To check the status of the ODI service # service odi-agent status
2. Command to start the ODI agent, if not started already # service odi-agent start # service odi-agent status For list of available scripts and tools provided in the /opt/oracle/odiagent-11.1.1.6.5/agent_standalone/oracledi/agent/bin directory refer to http://docs.oracle.com/cd/E28280_01/install.1111/e16453/overview.htm#r15c1-t6 Mount NFS Mounts1. Mammoth mounted /opt/shareddir is an automount and will be automatically mounted when accessed on all nodes. So no need to manually mount on startup. 2. If Fuse-DFS mount point entry is part of /etc/fstab file then it's automatically mounted on startup. Check if Fuse-DFS is mounted # mount -l | grep fuse
Sample output: fuse_dfs on <mount> type fuse.fuse_dfs indicates a Fuse-DFS mount. fuse_dfs on /mnt/hdfs-nnmount type fuse.fuse_dfs (rw,nosuid,nodev,allow_other,allow_other,default_permissions)
But if Fuse-DFS mount point entry is not part of /etc/fstab file then manually mount it on needed nodes # mount /mnt/hdfs-nnmount
INFO /data/1/jenkins/workspace/**********/src/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:164 Adding FUSE arg /mnt/hdfs-nnmount INFO /data/1/jenkins/workspace/***********/src/hadoop-hdfs-project/hadoop-hdfs/src/main/native/fuse-dfs/fuse_options.c:115 Ignoring option allow_other 3. If any custom NFS mounts are unmounted prior to shutdown and are not listed in /etc/fstab file then manually mount them.
Attachments This solution has no attachment |
||||||||||||||||||||||||||||||||||||||||||||||||||
|