![]() | Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition | ||
|
|
![]() |
||||||||||||||||||||||||
Solution Type Technical Instruction Sure Solution 2102784.1 : How To Free Up Disk Space on Exalogic Nodes and vServers
In this Document
Created from <SR 3-12042952741> Applies to:Oracle Exalogic Elastic Cloud Software - Version 2.0.0.0.0 and laterExalogic Elastic Cloud X4-2 Hardware Exalogic Elastic Cloud X5-2 Eighth Rack - Version X5 to X5 [Release X5] Linux x86-64 Oracle Virtual Server x86-64 GoalTo Free Up Disk Space on Exalogic Nodes and vServers SolutionOver time, Linux operating systems can get filled up with files that are left behind. Unlike Solaris, Linux does not clean up /tmp on reboots. This document goes through some steps to verify and clean up files to avoid out-of-space issues. IMPORTANT: in some cases, files can be removed while their contents remain. If a process is holding open a large file and the file is deleted, the space will not be reclaimed until the process is stopped. To address this, the operating system can be rebooted or you can run the command "lsof +L1 / | grep <file or directory you deleted>" to find any processes holding on to the data from those files.
To verify, you can check the difference between the output of "df -kh" and "du -sh /". If those show differences, it is likely due to this behavior. Pre-RequisitesFor the commands we'll issue, it is good to use the dcli tool and an exclusion list. Here are the steps to set up those pre-requisites: 1. On node 1,create a nodes file and add all the compute nodes in the rack to the file example (change hosts el01cn01 to your hostnames) echo "el01cn01" > /var/tmp/nodes 2. On node 1, run these commands to create an exclusion list of directories and files that should never be changed as changes can affect the way the system runs: Create the exclude file with filesystems and files that should not be touched echo '/proc' > /var/tmp/exclude.out
echo '/dev' >> /var/tmp/exclude.out echo '/var/tmp/exalogic' >> /var/tmp/exclude.out echo '/opt/exalogic' >> /var/tmp/exclude.out echo '/bin' >> /var/tmp/exclude.out echo '/ssd' >> /var/tmp/exclude.out echo '/boot' >> /var/tmp/exclude.out echo '/etc' >> /var/tmp/exclude.out echo '/lib' >> /var/tmp/exclude.out echo '/lib64' >> /var/tmp/exclude.out echo '/opt' >> /var/tmp/exclude.out echo '/sbin' >> /var/tmp/exclude.out echo '/usr' >> /var/tmp/exclude.out echo '/tftpboot' >> /var/tmp/exclude.out echo '/MegaSAS.log' >> /var/tmp/exclude.out echo '/var/lib/rpm' >> /var/tmp/exclude.out 3. Copy the exclude.out file to all nodes /opt/exalogic.tools/tools/dcli -g /var/tmp/nodes -f /var/tmp/exclude.out -d /var/tmp/
Identify and Free Space for the Root Filesystem1. Determine which directories are the ones we need to review /opt/exalogic.tools/tools/dcli -g /var/tmp/nodes 'for a in /*; do mountpoint -q -- "$a" || du -X /var/tmp/exclude.out -s -h -x "$a"; done | egrep "(M|G)\W"'
This command will return a list of files with the host names similar to this: el01cn01: 1.1M /root
el01cn01: 34M /tmp el01cn01: 358M /var el01cn02: 9.0M /root el01cn02: 1.9M /tmp el01cn02: 341M /var el01cn03: 2.2M /root el01cn03: 341M /var el01cn04: 2.3M /root el01cn04: 341M /var Compile a list for the next step. In this case, not all nodes have files we are concerned with in /tmp but we want a list that will work for all nodes. In this example, our list is "/root /tmp /var" 2. Modify this command with the list from step 1 (in bold underlined) /opt/exalogic.tools/tools/dcli -g /var/tmp/nodes 'ls -lhR -Irun -Ilib -Iyum -S /root /tmp /var | egrep "((M|G)\W)|\/" | grep -v ^l'
This will return a significant amount of output per host so it's helpful to modify the command to only return it one node at a time by adding " | grep <host>" at the end. The output shows a directory on a host with the files that are of a size in Megabytes or Gigabytes. # /opt/exalogic.tools/tools/dcli -g /var/tmp/nodes 'ls -lhR -Irun -Ilib -Iyum -S /root /tmp /var | egrep "((M|G)\W)|\/" | grep -v ^l' | grep el01cn02 el01cn02/root:el01cn02: -rw-r--r-- 1 root root 6.6M Jun 24 2015 files el01cn02: -rw-r--r-- 1 root root 2.1M Dec 1 12:13 MegaSAS.log el01cn02: /tmp: el01cn02: /tmp/exaware: el01cn02: /tmp/exaware/1_2: el01cn02: /tmp/exaware/1_2_1: el01cn02: /tmp/hmptemp: el01cn02: /var: el01cn02: /var/cache: el01cn02: /var/cache/opensm: el01cn02: /var/cache/ovs-disk_info: el01cn02: /var/cache/ovs-multipath: el01cn02: /var/crash: el01cn02: /var/db: el01cn02: /var/db/nscd: el01cn02: /var/empty: el01cn02: /var/empty/sshd: el01cn02: /var/empty/sshd/etc: el01cn02: /var/exalogic: el01cn02: /var/exalogic/info: el01cn02: /var/games: el01cn02: /var/local: el01cn02: /var/lock: el01cn02: /var/lock/dmraid: el01cn02: /var/lock/iscsi: el01cn02: /var/lock/lvm: el01cn02: /var/lock/subsys: el01cn02: /var/log: el01cn02: -rw-r--r-- 1 root root 154M Feb 4 11:09 lastlog el01cn02: -rw------- 1 root root 5.0M Oct 7 19:20 ovs-agent.log.4 el01cn02: -rw------- 1 root root 5.0M Dec 3 09:30 ovs-agent.log.2 el01cn02: -rw------- 1 root root 5.0M Oct 17 13:09 ovs-agent.log.3 el01cn02: -rw------- 1 root root 5.0M Jul 19 2015 ovs-agent.log.5 el01cn02: -rw------- 1 root root 5.0M Dec 9 11:56 ovs-agent.log.1 el01cn02: /var/log/xen: el01cn02: -rw-r--r-- 1 root root 1.0M Jun 24 2015 xend.log.3 el01cn02: -rw-r--r-- 1 root root 1.0M Jan 27 21:31 xend.log.1 el01cn02: -rw-r--r-- 1 root root 1.0M Nov 25 07:13 xend.log.2 el01cn02: -rw------- 1 root root 1.0M Apr 13 2015 xend.log.4 el01cn02: /var/log/disconnect_device: el01cn02: /var/log/flush_inflight_io: el01cn02: /var/log/init-exalogic-node: el01cn02: /var/log/pm: el01cn02: /var/log/prelink: el01cn02: /var/log/rescan_for_disks: el01cn02: /var/log/sa: el01cn02: -rw-r--r-- 1 root root 16M Jan 28 23:53 sar28 el01cn02: -rw-r--r-- 1 root root 16M Feb 2 23:53 sar02 el01cn02: -rw-r--r-- 1 root root 16M Jan 29 23:53 sar29 el01cn02: -rw-r--r-- 1 root root 16M Feb 3 23:53 sar03 el01cn02: -rw-r--r-- 1 root root 16M Jan 30 23:53 sar30 el01cn02: -rw-r--r-- 1 root root 16M Feb 1 23:53 sar01 el01cn02: -rw-r--r-- 1 root root 16M Jan 31 23:53 sar31 el01cn02: -rw-r--r-- 1 root root 16M Jan 27 23:53 sar27 el01cn02: -rw-r--r-- 1 root root 16M Feb 1 23:50 sa01 el01cn02: -rw-r--r-- 1 root root 16M Feb 2 23:50 sa02 el01cn02: -rw-r--r-- 1 root root 16M Feb 3 23:50 sa03 el01cn02: -rw-r--r-- 1 root root 16M Jan 28 23:50 sa28 el01cn02: -rw-r--r-- 1 root root 16M Jan 29 23:50 sa29 el01cn02: -rw-r--r-- 1 root root 16M Jan 30 23:50 sa30 el01cn02: -rw-r--r-- 1 root root 16M Jan 31 23:50 sa31 el01cn02: -rw-r--r-- 1 root root 16M Jan 27 23:50 sa27 el01cn02: -rw-r--r-- 1 root root 7.8M Feb 4 12:10 sa04 el01cn02: /var/log/sun-ssm: el01cn02: /var/mpp: el01cn02: /var/nis: el01cn02: /var/opt: el01cn02: /var/preserve: el01cn02: /var/spool: el01cn02: /var/spool/anacron: el01cn02: /var/spool/at: el01cn02: /var/spool/at/spool: el01cn02: /var/spool/cron: el01cn02: /var/spool/lpd: el01cn02: /var/spool/mail: el01cn02: /var/spool/repackage: el01cn02: /var/tmp: el01cn02: /var/tmp/ebi_conf.bak20852403: el01cn02: /var/tmp/ebi_conf.pre20612.bak: el01cn02: /var/tmp/ebi_conf.pre20620.bak: el01cn02: /var/tmp/ebi_conf.pre20621.bak: el01cn02: /var/tmp/ebi_conf.pre21349631.bak: el01cn02: /var/xen: el01cn02: /var/xen/dump: el01cn02: /var/yp: For each file or directory that is returned, we should evaluate moving it to another filesystem. On Exalogic nodes, we have the /ssd filesystem for that purpose (check for the /ssd directory. sometimes it has been named /SSD. change commands as necessary). /var/db and /var/cache are populated using the YUM and RPM tools. These can be left as they shouldn't be cleaned out by hand. Commands to clean those will be given later.
3. Once we determine the files to move, create backup directories for the files /opt/exalogic.tools/tools/dcli -g /var/tmp/nodes 'mkdir /ssd/backup'
4. In our example, we saw the following files in /root; /root/MegaSAS.log, /root/files We can move both these off the root filesystem /opt/exalogic.tools/tools/dcli -g /var/tmp/nodes 'mkdir /ssd/backup/root'
/opt/exalogic.tools/tools/dcli -g /var/tmp/nodes 'mv /root/MegaSAS.log /ssd/backup/root' /opt/exalogic.tools/tools/dcli -g /var/tmp/nodes 'mv /root/files /ssd/backup/root' Any large files in /root should be moved to the /ssd directory. It is always safer to use /ssd rather than the root home directory for storing patches and such. For files needed for a longer time, consider mounting and using /export/common/patches from the storage nodes.
5. For /var/log, we found sa and sar files left from sysstat commands run by oswbb We can move these files off also. /opt/exalogic.tools/tools/dcli -g /var/tmp/nodes 'mkdir /ssd/backup/varlog'
/opt/exalogic.tools/tools/dcli -g /var/tmp/nodes 'mkdir /ssd/backup/varlog/sa' /opt/exalogic.tools/tools/dcli -g /var/tmp/nodes 'mv /var/log/sa/* /ssd/backup/varlog/sa' 6. For /var/log/wtmp, if wtmp is growing too large on a normal basis, you can change the rotation from monthly to weekly by modifying this block, in /etc/logrotate.conf: from: /var/log/wtmp {
monthly minsize 1M create 0664 root utmp rotate 1 } to: /var/log/wtmp {
weekly minsize 1M create 0664 root utmp rotate 1 } Running this command will force logrotate to run on demand and rotate /var/log/wtmp if it's larger than 1M: /usr/sbin/logrotate -f /etc/logrotate.conf
7. For rotated log files in /var/log such as ovsagent, maillog, messages, secure, etc, you will find that there are 4 backups by default. If these are large, the line "#compress" in /etc/logrotate.conf, can be uncommented to compress the files going forward. To move off the existing files, run the following: /opt/exalogic.tools/tools/dcli -g /var/tmp/nodes 'for file in 'ls /var/log/*.[0-9]'; do mv $file /ssd/backup/varlog; done'
/opt/exalogic.tools/tools/dcli -g /var/tmp/nodes 'mv /var/log/.exapatch* /ssd/backup/varlog' /opt/exalogic.tools/tools/dcli -g /var/tmp/nodes 'mv /var/log/exapatch* /ssd/backup/varlog' /opt/exalogic.tools/tools/dcli -g /var/tmp/nodes 'mv /var/log/ebi* /ssd/backup/varlog' 8. lastlog can also grow large as shown in our example, move that over also /opt/exalogic.tools/tools/dcli -g /var/tmp/nodes 'mv /var/log/lastlog /ssd/backup/varlog'
9. Clean up the rpm repository and database /opt/exalogic.tools/tools/dcli -g /var/tmp/nodes 'yum clean all'
/opt/exalogic.tools/tools/dcli -g /var/tmp/nodes 'rpm --rebuilddb' 10. Repeat steps 2 through 9 for all the other nodes 11. Check all machines for anything that may have been missed. All root file systems should have 1.3G or so of free space. /opt/exalogic.tools/tools/dcli -g /var/tmp/nodes 'df -khl /'
* If one or more nodes still don't have at least 1GB free, compare their "du -sh /*" output to other nodes' filesystems to identify any oddities. For example, /opt, /etc and /usr should be about the same size across nodes. * If "du -sh /opt/oswbb" shows a very high usage, please open an SR with Oracle Support to clean up the OS Watcher logs. We do not want to delete those logs indiscriminately.
Additional Changes If Necessary
By default, OSWatcher is installed, configured, and running on all the Oracle VM Server nodes on an Exalogic machine and on all guest vServers created using the EECS 2.0.6.0.0 Guest Base Template. The utilities that OSWatcher invokes run as background processes and impose minimal incremental overhead on the system. By default, OSWatcher invokes the data-collection utilities once every 30 seconds. It archives the data for the last 48 hours on the local disk in the /opt/oswbb directory. You can configure the data-collection interval, the archive-retention period, and the archive location. For more information about configuring OSWatcher to store files in alternate directories, see the OS Watcher Black Box User's Guide in the My Oracle Support <Doc ID 1531223.1>. Please be aware that Oracle Support requires OS Watcher files for ALL performance issues so do not disable the collection. If relocating, only use /ssd/<oswbb> or similar. ***** ALSO, if clearing out oswbb logs does appear to affect the output of "df -k", it is likely that oswbb is holding files open. Run "service oswbb stop; service oswbb start" and recheck. Identify and Free Space for the Boot Filesystem1. Compile a list of kernels installed > cat /etc/grub.conf | egrep 'title|default'
default=0 title Oracle Linux Server (2.6.39-400.250.1.el5uek) title Oracle Linux Server (2.6.18-406.el5) title Oracle Enterprise Linux (2.6.39-400.215.10.el5uek) 2. The default value refers to the default kernel that is selected in GRUB on startup. The list is numbered from 0 to 2 in the order they appear. For Exalogic, we need to keep a minimum of 2 kernels and they are the one non-UEK kernel, in this case it is 2.6.18-406.el5, and the default uek kernel which is 2.6.39-400.250.1.el5uek. This gives us one kernel to remove to free up space, 2.6.39-400.215.10.el5uek. *** NEVER REMOVE THE DEFAULT UEK KERNEL ***
*** NEVER REMOVE THE LATEST NON-UEK KERNEL *** 3. Search for the full contents between the parentheses to find all the packages for a particular kernel > rpm -qa | grep 2.6.39-400.215.10.el5uek 4. Each one in the list will need to be removed as seen here yum remove kernel-uek-2.6.39-400.215.10.el5uek
yum remove kernel-uek-devel-2.6.39-400.215.10.el5uek yum remove kernel-uek-headers-2.6.39-400.215.10.el5uek yum remove kernel-uek-firmware-2.6.39-400.215.10.el5uek 5. Verify it has removed itself from the grub.conf file > cat /etc/grub.conf | egrep 'title|default'
default=0 title Oracle Linux Server (2.6.39-400.250.1.el5uek) title Oracle Linux Server (2.6.18-406.el5) 6. Check the size of /boot > df -khl /boot
Filesystem Size Used Avail Use% Mounted on /dev/sda1 99M 42M 53M 45% /boot 7. If space in /boot is still not sufficient, contact Oracle Support. Only a maximum of 3 kernels can exist at a time by default so there are no others to remove. What About the /ssd Filesystem?The ssd filesystem was originally created as an area where kdump or vmcores could go without affecting the root partition and system operations. Since then, it has also become a place to copy files temporarily for operations such as sending ILOM snapshots to the support team for review or copying patche files onto the compute nodes. The files in this partition are not used in the normal operation of the Exalogic compute node and can be removed completely at the customer's discretion. References<NOTE:1374890.1> - Exalogic FAQsAttachments This solution has no attachment |
||||||||||||||||||||||||
|