Diameter Signaling Router (DSR) /Var/TKLC/ Filesystem Filled by TKLCsnmp.Log Files

Asset ID:	1-72-1959851.1
Update Date:	2018-01-16
Keywords:

Solution Type Problem Resolution Sure

Solution 1959851.1 : Diameter Signaling Router (DSR) /Var/TKLC/ Filesystem Filled by TKLCsnmp.Log Files

Applies to:

Oracle Communications Diameter Signaling Router (DSR) - Version DSR 5.0 to DSR 7.1.1 [Release DSR 5.0 to DSR 7.0]
Oracle Communications User Data Repository - Version UDR 10.0 and later
Tekelec

Symptoms

DSR server reports syscheck errors indicating /var/TKLC/ filesystem is full. On an application server, this will raise DSR alarms. On a TVOE server, this may result in SNMP notification however will not raise an alarm to the application GUI.
Immediate impact of this condition is loss of logging capability. Syscheck will also report NTP failure and hpacucliStatus becoming stale; these are artifacts of a full filesystem and must be cleaned up upon resolution. Other impacts will depend on server role and other circumstances.

[root@rmsTVOE-DRA-ABC-1-2 ~]# syscheck
ERROR: Could not rename log /var/TKLC/log/syscheck/fail_log to /var/TKLC/log/syscheck/fail_log.1: No space left on device
Use of uninitialized value $time in pattern match (m//) at /usr/TKLC/plat/lib/Syscheck/modules/proc/ntp/Test.pm line 786.
Use of uninitialized value $time in concatenation (.) or string at /usr/TKLC/plat/lib/Syscheck/modules/proc/ntp/Test.pm line 788.
Running modules in class disk...
* fs: FAILURE:: MAJOR::3000000000001000 -- Server Disk Space Shortage Error
* fs: FAILURE:: Space used in "/var/TKLC" exceeds the set limit 90%. 100% used.
* hpdisk: FAILURE:: MAJOR::3000000200000000 -- The hpacucliStatus utility needs intervention.
* hpdisk: FAILURE:: Failure message: The hpacu status is stale, and server has been up longer than 600
One or more module in class "disk" FAILED

Running modules in class hardware...
  OK

Running modules in class net...
  OK

Running modules in class proc...
* ntp: FAILURE:: File has corrupted value for time
* ntp: FAILURE:: TIME:
* ntp: FAILURE:: Could not get last good time info
One or more module in class "proc" FAILED

Running modules in class system...
  OK

LOG LOCATION: /var/TKLC/log/syscheck/fail_log
[root@rmsTVOE-DRA-ABC-1-2 ~]#

Cause

The /var/TKLC/log/snmp/TKLCsnmp.log files will be getting filled with debug and info messages at a rapid rate, causing the total filesize to overrun the limited space allocated to /var/TKLC/.

Example content:

Connection from UDP: [10.121.16.231]:55333->[10.121.15.90]
Connection from UDP: [10.121.16.231]:55333->[10.121.15.90]
  Connection from UDP: [10.121.16.231]:55333->[10.121.15.90]
  …
  error on subcontainer 'ia_addr' insert (-1) {DEBUG level messages}
  error on subcontainer 'ia_addr' insert (-1)
  error on subcontainer 'ia_addr' insert (-1)

This is an SNMP problem thought to be related to the multiple interfaces (bond and bridge) with the same link local address.

Solution

Adjust the snmp logging level to Warning [ '4' ] or above.

Procedure followed to address the problem [limiting to snmp only; cleanup of other syscheck alarms are stemming from this condition are not covered here].

Log into affected server, verify condition matches description.
Delete one of the older TKLCsnmp.log files from /var/TKLC/log/snmp/. Thereafter, other (old) TKLCsnmp.log files can be deleted or compressed to open up space.
Backup & Change parameter in snmptrapd:
# cd /etc/init.d
# cp snmptrapd /tmp/snmptrapd.old
# rcstool co snmptrapd
# vi snmptrapd
Change
OPTIONS="udp:162 udp6:162 -Lf /var/log/TKLCsnmp.log -p /var/run/snmptrapd.pid"
To
OPTIONS="udp:162 udp6:162 -LF 4 /var/log/TKLCsnmp.log -p /var/run/snmptrapd.pid"
Save and check in:

# rcstool ci snmptrapd
Backup & Change parameter in TKLCsnmp.conf

# cd /etc/init/
# cp TKLCsnmp.conf /tmp/TKLCsnmp_conf.old
# vi TKLCsnmp.conf
Change

env OPTIONS="udp:161 udp6:161 -f -Lf /var/TKLC/log/snmp/TKLCsnmp.log"
To

env OPTIONS="udp:161 udp6:161 -f -LF 4 /var/TKLC/log/snmp/TKLCsnmp.log"
Save
Restart both processes

# service snmptrapd restart
# initctl stop TKLCsnmp
# initctl start TKLCsnmp
Verify /var/TKLC/log/snmp/ no longer as TKLCsnmp.log with large filesize, and over time contains little or no data (not uncommon to see filesize 0)
As a final verification, check to ensure both snmp processes are running:

# ps -ef | grep -i snmp
This will produce lines that look like the OPTIONS strings above. Ensure they reflect the changes made.

References

<BUG:20744242> - TKLCSNMP.LOG FILES ARE GROWING TOO LARGE AND FILLING UP DISK SPACE

Attachments

This solution has no attachment