Excessive Memory Consumption by cmha Process Leading to ELAP System Freeze

Asset ID:	1-77-2342529.1
Update Date:	2018-03-26
Keywords:

Solution Type Sun Alert Sure

Solution 2342529.1 : Excessive Memory Consumption by cmha Process Leading to ELAP System Freeze

Applies to:

Oracle Communications EAGLE (Hardware) - Version ELAP 10.1 to ELAP 10.1 [Release ELAP 10.0]
Information in this document applies to any platform.

Description

The cmha process (used by the system for high availability functionality) is consuming memory excessively and failing to release the allocated memory. This consumption of memory is causing the systems to fail once the pool of available memory is exhausted. As the system in this condition will not have available ram, the failure will likely prevent the failover from properly operating.

The fix for this issue is available in ELAP 10.1.2 which is available from the Oracle Software Delivery Cloud.

Occurrence

The cmha is consuming additional memory at a fairly consistent rate. From the start of the cmha process, the application will operate for approximately 150 days before the system will run out of available memory.

Symptoms

The symptoms of a system that has exhausted the memory pool varies depending upon which application is requesting allocation of memory. The most efficient method available at this moment, to determine current system status, is to perform the checks in the workaround section below.

Workaround

Determining Present Memory Usage of cmha

The following steps will be executed to ascertain if a Maintenance window will be required to restart the cmha process.

1. On the Standby and Active ELAP, take output of the following command:

ps aux| grep "cmha"| grep -v grep | awk '{print $4}'

2. If the output is greater than 40 for any server, arrange for a Maintenance Window to restart the processes for cmha.

Restarting the cmha Processes

The following procedure should be run in the Maintenance Window for the servers where output from the step 2 above was greater than 40.

1. On the standby ELAP server, take output of the following command. Note down the percentage of memory consumed by “cmha” process.

[root@elap-b ~]# ps aux| grep "cmha"| grep -v grep | awk '{print $4}'

Also note down the output of the command “free”.

2. On the STANDBY server, restart TKLCha

# hastatus

STANDBY

# service TKLCha restart

3. After the Standby node is started, take output of the following command. Note down the amount of memory consumed by “cmha” process.

[root@elap-b ~]# ps aux| grep "cmha"| grep -v grep | awk '{print $4}'

Also note down the output of the command “free”.

4. See that memory is freed after the restart of the TKLCha service.

The memory consumed by cmha will be less than 5%, it will be mostly 0% at the beginning. The memory consumed by the system (2nd row, second column of the free output) will be significantly less than 8 GB i.e. it will be less than 2 GB.

In the following output of free: The used memory is in the second row (-/+buffers/cache), second column (1430912). The value here is 1430912 KB i.e. 1.43 GB.

[root@elap-b ~]# free
total used free shared buffers cached
Mem: 8050280 7723756 326524 0 780216 5512628
-/+ buffers/cache: 1430912 6619368
Swap: 2048184 0 2048184

5. Login to the ACTIVE server and execute the following command to perform failover

# /usr/TKLC/plat/sbin/hafailover --gostandby

6. On the new STANDBY server, restart TKLCha

# hastatus

STANDBY

# service TKLCha restart

7. After the Standby node is restarted, take output of the following command. Note down the amount of memory consumed by “cmha” process.

# ps aux| grep "cmha"| grep -v grep | awk '{print $4}'

Also note down the output of the command “free”.

8. See that memory is freed after the restart of the TKLCha service.

In the following output of free: The used memory is in the second row (-/+buffers/cache), second column (1430912). The value here is 1430912 KB i.e. 1.43 GB.

[epapdev@inde5epap1d1 ~]$ free
total used free shared buffers cached
Mem: 8050280 7723756 326524 0 780216 5512628
-/+ buffers/cache: 1430912 6619368
Swap: 2048184 0 2048184

9. Login to the current ACTIVE server and perform another failover to make the HA state same as earlier.

# /usr/TKLC/plat/sbin/hafailover --gostandby

# hastatus

STANDBY

History

12-22-2017 Initial Publication

Attachments

This solution has no attachment