Sun Microsystems, Inc.  Oracle System Handbook - ISO 7.0 May 2018 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1577344.1
Update Date:2014-03-14
Keywords:

Solution Type  Problem Resolution Sure

Solution  1577344.1 :   OS Watcher is Taking Huge Space on Database Server  


Related Items
  • Exadata X3-2 Hardware
  •  
  • Exadata Database Machine X2-8
  •  
  • Exadata X3-8 Hardware
  •  
  • Linux OS
  •  
  • Exadata Database Machine X2-2 Hardware
  •  
  • Exadata Database Machine V2
  •  
  • Oracle Exadata Hardware
  •  
Related Categories
  • PLA-Support>Eng Systems>Exadata/ODA/SSC>Oracle Exadata>DB: Exadata_EST
  •  


OS Watcher is Taking Huge Space on Database Server

Created from <SR 3-7366655141>

Applies to:

Linux OS - Version Oracle Linux 5.8 and later
Exadata Database Machine V2 - Version All Versions and later
Exadata Database Machine X2-2 Hardware - Version All Versions and later
Exadata Database Machine X2-8 - Version All Versions and later
Exadata X3-2 Hardware - Version All Versions and later
Information in this document applies to any platform.

Symptoms

 

OS Watcher is consuming huge space on database nodes when there are many instances (100+ ) and many (ten thounsands) processes on the compute nodes.

Example for a system with 110 instances and 10000 processes, OS watcher archives takes 56GB space when OS watcher is run with default settings.
.

# du -hs .
56G .

and breakdown is as follows:

# du -hs *
6.7G ExadataOSW
20G ExadataRDS
4.0K oswcellsrvstat
23M oswdiskstats
8.9M oswiostat
3.5M oswmeminfo
2.4M oswmpstat
3.9G oswnetstat
4.0K oswprvtnet
3.3G oswps
35M oswslabinfo
22G oswtop
3.2M oswvmstat

 top, RDS, ExadataOSW (includes lsof command output) dumps are particularly taking significant space due to high number of processes, and high number of open file and network descriptors.

 

** IMPORTANT NOTE **

OS watcher tool will be disabled in upcoming releases and be replaced by a complete rewrite named ExaWatcher. New version comes with more configuration options, proactive compression and clean up. The folllowing solution is a workaround in the current version of OS watcher for large customer systems.
 

Cause

 There are 100+ instances and many (ten thounsands) processes on the database server node.

Solution

OS watcher uses some system parameters when the node boots up. In order to change to change the system start parameters for OS watcher, you need to modify oswatcher script under oracle.cellos. Stopping OS watcher processes is recommended before doing any changes.

 

# cd /opt/oracle.oswatcher/osw
# ./stopOSW.sh
# vi /opt/oracle.cellos/validations_server/init.d/oswatcher

change line from: 
(umask 0037; nohup ./startOSW.sh 15 168 bzip2 9 >/var/log/cellos/start_oswatcher.log 2>&1 &)

Parameter descriptions is as follows:

    15 : interval
  168 : duration, 7*24
bzip2 : compress algorithm
     3 : 3gb limit to start cleanup when limit is reached

Then, you can start OS Watcher to make the changes effective.

# /opt/oracle.cellos/vldrun -script oswatcher 

  


Besides system parameters there are frequency parameters for individual OS watcher scripts such as top, ps, lsof, netstat, etc. They dump every 5-15 seconds within one hour duration (3600seconds).

The frequency for each individual command is hard-coded (like "5 3600") within OSWatcher.sh file.

OS watcher scripts are located in /opt/oracle.oswatcher/osw. The lines similar to below in "OSWatcher.sh" needs to be changed manually to allow commands to create less snapshots in one hour.


echo "archive/oswtop/${hostn}_top_$hour $zip 5 3600" > archive/oswtop/HighFreq

For example, the change from "5 3600" to "15 3600" will dump data at every 15 seconds instead of 5 seconds.

For vmstat, mpstat, netstat,iostat, top and RDS, the following lines in OSWatcher.sh script may need to be modified accordingly with the size and space requirements.

 

"OSWatcher.sh"
.
echo "archive/oswvmstat/${hostn}_vmstat_$hour $zip 5 3600" > archive/oswvmstat/HighFreq
..
echo "archive/oswmpstat/${hostn}_mpstat_$hour $zip 5 3600" > archive/oswmpstat/HighFreq
..
echo "archive/oswnetstat/${hostn}_netstat_$hour $zip 15 3600" > archive/oswnetstat/HighFreq
..
echo "archive/oswiostat/${hostn}_iostat_$hour $zip 5 3600" > archive/oswiostat/HighFreq
..
echo "archive/oswtop/${hostn}_top_$hour $zip 5 3600" > archive/oswtop/HighFreq
..
echo "archive/ExadataRDS/${hostn}_ExadataRDS_$hour $zip 10 3600" > archive/ExadataRDS/HighFreq



For lsof, a change in /opt/oracle.oswatcher/osw/ExadataOSW.sh is required. The following will configure to generate lsof dumps at every 5minutes instead 2.

"ExadataOSW.sh"


Line from

"if [[ "$1" =~ '2mY' ]]; then"

To

"if [[ "$1" =~ '5mY' ]]; then"

 

 

 

References

<BUG:16433644> - ALCOA: DB NODE OSWATCHER DATA HITS 3GB LIMIT IN 36 HOURS
<BUG:17155362> - OS WATCHER IS TAKING HUGE SPACE ON DATABASE SERVER
<NOTE:1617454.1> - ExaWatcher utility on Exadata database servers and storage cells

Attachments
This solution has no attachment
  Copyright © 2018 Oracle, Inc.  All rights reserved.
 Feedback