ODA Nodes Lacking Space Due to Large Cluster Health Monitor File Crfclust.Bdb

Asset ID:	1-71-1616910.1
Update Date:	2017-08-24
Keywords:

Solution Type Technical Instruction Sure

Solution 1616910.1 : ODA Nodes Lacking Space Due to Large Cluster Health Monitor File Crfclust.Bdb

Applies to:

Oracle Database Appliance - Version All Versions and later
Oracle Database - Enterprise Edition - Version 11.2.0.4 to 11.2.0.4 [Release 11.2]
Information in this document applies to any platform.

Goal

Checking disk space on the ODA nodes, you see the /u01 partition is 66-67% full:

[root@oda1 oda1]# df -h /u01
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroupSys-LogVolU01
97G 61G 32G 67% /u01

[root@oda1 oda1]# df -h /u01
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroupSys-LogVolU01
97G 61G 32G 66% /u01

Checking the size of the GRID_HOME, you see the file Crfclust.Bdb occupies 31G:

[root@oda1 oda1]# ls -lrth
total 33G
-rw-r----- 1 root root 8.0K Oct 18 23:01 repdhosts.bdb
-rw-r----- 1 root root 24K Dec 10 18:48 __db.001
-rw-r--r-- 1 root root 115M Dec 10 18:48 oda1.ldb
-rw-r----- 1 root root 8.0K Dec 10 18:49 crfconn.bdb
-rw-r----- 1 root root 16M Jan 13 08:21 log.0000019932
-rw-r----- 1 root root 306M Jan 13 08:28 crfts.bdb
-rw-r----- 1 root root 472M Jan 13 08:28 crfloclts.bdb
-rw-r----- 1 root root 375M Jan 13 08:28 crfcpu.bdb
-rw-r----- 1 root root 31G Jan 13 08:28 crfclust.bdb <<<<<<<<<<<<<<<<<<<<<<<<<<<
-rw-r----- 1 root root 16M Jan 13 08:29 log.0000019933
-rw-r----- 1 root root 56K Jan 13 08:29 __db.006
-rw-r----- 1 root root 386M Jan 13 08:29 crfhosts.bdb
-rw-r----- 1 root root 375M Jan 13 08:29 crfalert.bdb
-rw-r----- 1 root root 1.2M Jan 13 08:29 __db.005
-rw-r----- 1 root root 392K Jan 13 08:29 __db.002
-rw-r----- 1 root root 2.1M Jan 13 08:29 __db.004
-rw-r----- 1 root root 2.6M Jan 13 08:29 __db.003

This is true for both ODA nodes.

Solution

This is due to a known issue with Cluster Health Monitor (CHM) database taking up too much space.

This is outlined in the following MOS Notes:

Oracle Cluster Health Monitor (CHM) using large amount of space (more than default) (Doc 1343105.1)

db_delete: BDB grown beyond user desired limits disabling loggerd (Doc ID 1574492.1)

To resize this database:

1. As user grid in oda1 execute the following command:

[grid@oda1 ~]$ oclumon manage -repos resize 259200
oda1 --> retention check successful
oda2 --> retention check successful
New retention is 259200 and will use 4516300800 bytes of disk space

CRS-9115-Cluster Health Monitor repository size change completed on all nodes.

2. After that resize, checking repository size will fail with the following error:

[grid@oda1 ~]$ oclumon manage -get repsize
CRS-9011-Error manage: Failed to initialize connection to the Cluster Logger Service

This is due to the CHM repository BDB database being bigger than the retention period.

3. Restart crf on both nodes to resolve the issue:

[grid@oda1 ~]$ crsctl stop res ora.crf -init
CRS-2673: Attempting to stop 'ora.crf' on 'oda1'
CRS-2677: Stop of 'ora.crf' on 'oda1' succeeded
[grid@oda1 ~]$ crsctl start res ora.crf -init

[grid@oda2 oda2]$ crsctl stop res ora.crf -init
CRS-2673: Attempting to stop 'ora.crf' on 'oda2'
CRS-2677: Stop of 'ora.crf' on 'oda2' succeeded
[grid@oda2 ~]$ crsctl start res ora.crf -init

4. Check that the file is now smaller:

[grid@oda1 ~]$ cd /u01/app/11.2.0.3/grid/crf/db/oda1
[grid@oda1 oda1]$ ls -lrth
total 572K
-rw-r----- 1 root root 56K Jan 17 08:04 __db.006
-rw-r----- 1 root root 1.2M Jan 17 08:04 __db.005
-rw-r----- 1 root root 2.1M Jan 17 08:04 __db.004
-rw-r----- 1 root root 392K Jan 17 08:04 __db.002
-rw-r----- 1 root root 24K Jan 17 08:04 __db.001
-rw-r----- 1 root root 8.0K Jan 17 08:04 crfhosts.bdb
-rw-r----- 1 root root 8.0K Jan 17 08:04 crfconn.bdb
-rw-r----- 1 root root 128K Jan 17 08:04 crfclust.bdb <<<<<<<<<<<
-rw-r----- 1 root root 16M Jan 17 08:04 log.0000000001
-rw-r----- 1 root root 2.6M Jan 17 08:04 __db.003
-rw-r----- 1 root root 8.0K Jan 17 08:04 crfts.bdb
-rw-r----- 1 root root 8.0K Jan 17 08:04 crfloclts.bdb
-rw-r----- 1 root root 8.0K Jan 17 08:04 crfcpu.bdb
-rw-r----- 1 root root 8.0K Jan 17 08:04 crfalert.bdb
-rw-r--r-- 1 root root 115M Jan 17 08:05 oda1.ldb

And the disk space is reclaimed:

[grid@oda1 ~]$ df -h /u01
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroupSys-LogVolU01
97G 29G 64G 32% /u01

References

<NOTE:1343105.1> - Oracle Cluster Health Monitor (CHM) using large amount of space (more than default)
<NOTE:1574492.1> - db_delete: BDB grown beyond user desired limits disabling loggerd

Attachments

This solution has no attachment