Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1934039.1
Update Date:2016-06-26
Keywords:

Solution Type  Technical Instruction Sure

Solution  1934039.1 :   How to clear for file system spaces when the IB switch (NM2-36p and NM2-GW) file systems become full or almost full.  


Related Items
  • Exadata X3-2 Hardware
  •  
  • Sun Datacenter InfiniBand Switch 36
  •  
  • Oracle SuperCluster T5-8 Hardware
  •  
  • Sun Network QDR InfiniBand Gateway Switch
  •  
Related Categories
  • PLA-Support>Sun Systems>SAND>Network>SN-SND: Sun Network Infiniband
  •  




In this Document
Goal
Solution
References


Created from <SR 3-9701410261>

Applies to:

Oracle SuperCluster T5-8 Hardware - Version All Versions and later
Exadata X3-2 Hardware - Version All Versions and later
Sun Network QDR InfiniBand Gateway Switch - Version All Versions and later
Sun Datacenter InfiniBand Switch 36 - Version All Versions and later
Information in this document applies to any platform.

Goal

How to clear for file system spaces when the IB switch (NM2-36p and NM2-GW) file systems become full or almost full.

When an IB switch whose root file system (/) and tmpfs file system (/tmp/) become full or almost full, there will be risks to its functionality from a resource point of view.  Running out of resources can cause potential hangs or other side effects, for example, when the root file system is full and users are changing, say user passwords via passwd, it is possible for /etc/shadow to become corrupt or disappear totally.

For NM2-36p IB switches, / incorporates everything besides /dev/shm/ and /tmp/, where /dev/shm/ and /tmp/ are 2 other tmpfs file systems.

For NM2-GW IB gateway switches, / incorporates everything besides /dev/shm/, /config/, /var/log/, and /tmp/, where /config/ and /var/log/ are 2 other disk file systems separate to /.

Solution

While logged in onto the IB switch in question, customer can run df and showfree to check file system usage.

# df -h
# df -k
# showfree -d

An example of / becoming full on an arbitrary NM2-36p IB switch is given below.

# df -h
Filesystem  Size  Used  Avail  Use%  Mounted on
/dev/hda2   471M  471M  0      100%  /
tmpfs       250M  24K   250M   1%    /dev/shm
tmpfs       250M  956K  249M   1%    /tmp

For / or anything underneath it, without rebooting the IB switch in question, customer can search which large files are taking up file system spaces by running find.

In the following example, find is used to search the top 20 files whose sizes exceed 10 MBytes.  Customer can adjust the find command to suit his or her own searches, for example, adjusting the "-size" option to specify a different size.

# find / -size +10M -exec ls -l {} \; | awk '{ print $5, "\t\t", $NF }' | sort -nr | head -20

In the following example, find is used to search the top 20 files whose sizes exceed 1 MBytes.

# find / -size +1M  -exec ls -l {} \; | awk '{ print $5, "\t\t", $NF }' | sort -nr | head -20

In the find output, customer can ignore file entries inside /proc/ and /sys/ as they are system critical files.  Instead, customer can focus more on file entries inside /tmp/, /var/log/ and /var/tmp/ as they are normally for log files and temporary files.

Based on the above find output, customer can also search which processes (i.e. which PIDs) that have which files opened by using a simple for-loop.  Customer only needs to supply parts of the file names for his or her own searches.

# for d in `find /proc -name fd`; do  echo $d;  ls -ls $d | egrep -i 'file1|file2|...|fileN'; done

Based on the above for-loop output, whichever process directory listing the specified large files will show which PID have them opened.

An example of searching which process having /var/log/messages opened on an arbitrary NM2-GW IB gateway switch is given below.  In this example, syslogd is found to have /var/log/messages opened.

# for d in `find /proc -name fd`; do  echo $d;  ls -ls $d | egrep -i 'messages'; done
...
/proc/2247/task/2247/fd
0 l-wx------ 1 root root 64 Aug 12 14:41 1 -> /var/log/messages
/proc/2247/fd
0 l-wx------ 1 root root 64 Jul 14 10:45 1 -> /var/log/messages
...
# ps -ef | grep '2247'
root      2247     1  0 Jul14 ?        00:00:22 syslogd -m 0
root     27522 23559  0 13:30 pts/0    00:00:00 grep 2247
#

For old log files, customer can manually delete them, for example, delete old log files that are over 12 months old.  Often, log files are rotated and compressed via logrotate to form *.n.gz, where n is a number.  However, even old and already rotated and compressed log files can still take up file system space, so customer can manually delete them too.

For current log files, customer can manually clear them because they are still in use by their respective processes.  This is a very direct method to save file system space.

For temporary files, customer can also manually delete old temporary files that are no longer in use.  This is also a very direct method to save file system space.

In the following example, /var/log/fd.log and /var/log/secure are seen to take up more than 100 MBytes of file system space.

-rw-r--r-- 1 root root 112610288 Oct 6 22:11 fd.log
-rw------- 1 root root 119200758 Oct 6 22:10 secure

In this particular example, the most direct way is to clear the offending log files and their backups.  There is no need to restart the processes that have /var/log/fd.log and /var/log/secure opened.

# cp /dev/null /var/log/fd.log
# rm /var/log/fd.log.1.gz
# cp /dev/null /var/log/secure
# rm /var/log/secure.1.gz

The same principle applies to other log files and other logging processes.

Another less direct way is to rotate and compress the offending log files via logrotate.

- Open and view /var/lib/logrotate.status.

- Check for the particular entries against the offending log files inside /var/lib/logrotate.status and focus on their timestamps.  If the timestamps are way back in the past or way forward in the future, then logrotate may not be functioning at all.

- Clear the particular entries against the offending log files inside /var/lib/logrotate.status.

- Restart crond to restart logrotate, for example, "service crond restart".

- Monitor /var/lib/logrotate.status and the offending log files and their rotated and compressed backups after say 60 seconds.

For the tmpfs file system, the quickest way to clear for more file system space is to reboot the IB switch in question.  This will clear all files in the tmpfs file system upon reboot.

If the above approaches still cannot clear sufficient file system space, then customer is required to consult Oracle support by logging an SR.

References

<NOTE:1987078.1> - How to clear for memory spaces when the IB switch (NM2-36p and NM2-GW) memory becomes full or almost full.

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback