Information

Title	How to reclaim space used by the graph/activity database

URL Name	cachedata-directory-clear-down-000079680

Article Number	000181523

Environment	Product: OpenEdge Management Version: 11.3.x and later, 12.x OS: All supported platforms Other: Orient Database

Question/Problem Description

Reclaiming disk space used in the cachedata directory by the graph/activity database
What are the consequences of clearing down the cachedata directory
How are the Orient Databases used by OpenEdge Management recreated after being deleted
How to re-create the (embedded) orient databases to upgrade them to the current version
How to upgrade the configuration and activity databases used by OpenEdge Management
How to clear out corrupt graph-cache before restarting the AdminServer

What are these Orient Databases used by OpenEdge Management for:

\<oem dlc>\config\configdb\

\<oem wrk>\cachedata\cachedb\
\<oem wrk>\cachedata\fathomdatacache
\<oem wrk>\graphcache\

Steps to Reproduce

Clarifying Information

Error Message	[STDERR] GRAVE: Exception `741EEABD` in storage `plocal:<path>/cachedata/cachedb`: 2.2.31 (build #, branch 2.2.x) [STDERR] java.lang.OutOfMemoryError: Java heap space [STDERR] at com.orientechnologies.orient.core.storage.impl.local.paginated.wal.OWALPageV2.getRecord(OWALPageV2.java:112) [Fathom] * Fathom startup failed. (9661) [UnexpectedError] * recorded as exception # in file ads0.exp.

Defect Number

Enhancement Number

Cause

Resolution

What are the Orient Databases used by OpenEdge Management

Orient databases are used by OEM since OpenEdge 11.3. There are 3 use-cases for re-creating them:

The Orient databases need to be re-created when the Orient version is upgraded in a newer OpenEdge version or later Service Pack
OEM server disk space is filled up with GB of ocf cpm pcl sbt wal files over time
These databases are not properly closed when the server is rebooted or the AdminServer process is terminated before it has completed shutdown, leading to slower AdminServer startup times, high CPU and memory OS resource usage and possibly java out of memory issues

i) OpenEdge Management configuration database:

<oem dlc>\config\configdb

ii) OpenEdge Management graph cache/activity database:

11.3 + <oem wrk>\cachedata\fathomdatacache and <oem wrk>\graphcache
11.5 + <oem wrk>\cachedata\fathomdatacache and <oem wrk>\cachedata\cachedb

In either case:

a) Delete the content of the folder, never the folder itself
b) These (embedded) orient database files must be deleted offline (proadsv -stop)

i) configuration database:

The configuration database, is used for OpenEdge Management Resource configuration information.

It is rarely necessary to have to re-create this database. There's not much updating unless new Resources are added or properties updated for OEE or OEM, and Jobs, Reports etc for OEM are added or updated. It also keeps information of the current registered state of managed resources and other information important to Log File management/monitoring. Deleting and re-creating it:

Will not typically regain space unless there was an issue with the .wal files (akin to the openedge database bi files)
Will load the default 'fathom.xml' configuration and load a fresh configuration from the .properties files (for OpenEdge Explorer and OpenEdge Management unless the current configuration has previously been dumped, see below). This is particularly important when admserv messages alert to failure such as the following:

OpenEdge Management configuration database update failed.

[STDERR] java.lang.RuntimeException: setProperty(FileName, <.properties; .log) in [hostname]:resource.openedge.* [persistent, ] failed

When the OpenEdge Management license is used, the 'configdb' should not be deleted without first running fathom -dump, to dump the orient configuration content to an xml file. Otherwise tailored configuration information for Jobs, Alerts etc will be lost and when re-created, only the default configuration will be instate and the current content of the ubroker.properties and conmgr.properties files.

$ fathom -dump fathom.xml -httpport 9090 -user admin -password admin

Where:

'admin/admin' are the default credentials used to access the OE Console with the Administrator privilege, which may have been changed. The default user is admin, the password is chosen during the firstuse configuration
When the fathom.xml file is placed in the \<oem dlc>\config directory, the content will be used to rebuild the Configuration Database when the adminserver is restarted.

To delete and re-create the configdb:

Stop the adminserver: $ proadsv -stop [ -keepservers ]
Delete the configdb files (not the folder): $ rm <oem dlc>\config\configdb\*
Ensure the dumped configuration is present (as required): \<oem dlc>\config\fathom.xml
Restart the adminserver: $ proadsv -start
Confirm the fathom-plugin has started: $ fathom -query
When the fathom-plugin starts, the config database will get re-created:
[Fathom] Creating OpenEdge Management configuration database: <oee_install>\config\configdb. (17231)
[Fathom] Loading Fathom project file: <oem dlc>\config\fathom.xml (10185)
[Fathom] OpenEdge Management configuration database created. (17232)

ii) graph cache/activity database:

This database stores all the data from monitored resource polling. The cachedb is very fast for fetching and collating a large number of data points at the same time, which is good for graphschart activity seen in the OE Console Screens
The graph/activity database, can be deleted to regain space.
After a java.lang.OutOfMemoryError condition has been experienced, the graph cache database should always be reset before restarting the AdminServer, as this often becomes corrupted as a result of the Java OOM condition
The impact of deleting this database is that historic chart information will no longer be available.
The template will get re-created when the AdminServer is restarted
This database is rarely used by OpenEdge Explorer

To delete and re-create the cachedb:

Stop the adminserver: $ proadsv -stop -keepservers
Delete the cachedb files (not the folder): $ rm <oem wrk>\cachedata\cachedb\*
Restart the adminserver: $ proadsv -start
Confirm the fathom-plugin has started: $ fathom -query
When the fathom-plugin starts, the cache database will get re-created, using less space:
[Fathom-GRAPH] Creating OpenEdge Management configuration database: <oem_wrk>\cachedata\cachedb. (17231)
[Fathom-GRAPH] OpenEdge Management configuration database created. (17232)

Since OpenEdge 11.7.5, 12.0 the cache database can be re-created with the AdminServer online:

This feature was introduced to resolve slow OE Console access time after deleting the cachedb.
The initcachedb option deletes the existing Orient cache database, disables the .wal file and then recreates the database.

Stop the fathom-plugin: $ fathom -stop
Re-create the cache database: $ fathom -initcachedb
Once the cachedb is created, start the fathom plugin. Loading OEM is much faster since it already exists, and doesn't need to be created: $ fathom -start

For further information refer to Article High i/o on the cachedb

How to configure the graph/activity database to use less space?

After deleting the current Graph Cache:

1. Revise and re-configure the retention period for graph cache depending on the Resource:

Consider re-configuring the retention period for graph cache depending on the Resource. iow, only retain graph history for those resources you're interested in seeing graph data over x time period. The time period of graph cache sample retention results in large increases in disk space used by the graph cache samples (particularly with regards to Database resources). Larger graph caches may also noticeably slow the drawing of graphs due to more substantial paging requirements to access the data.

Resources > Options > Graph Cache Database Configuration
Graph Cache Database Configure

When if the "Sample time period to collect" needs to be changed to the same period for multiple resources, many resources by Resource Type can be added to the "Selected Box" in one go.

Example: Graph Cache Database Configure

Time period setting to apply to selected resources
Sample time period to collect: 8 hours

List resource of type: Database
for AdminServer: <Container name>

To change all monitored databases, select the <container name>.Database* and move it to the right hand 'selected' box
Apply to Selected

Done.

2. Revise and modify the Poll interval of monitored resources.

The defaults are usually POLL 300 (5 minutes) and Trend every 1 poll, which is a bit over usual requirements esp when monitoring many resources.
Then again, it depends on the resource. For example: Polling intervals can tuned down from 5 to 15 minutes for resources that don't need high frequency polls.

[Resource] > Monitoring Plans > Schedule Plan > EDIT : Advanced Settings

To trend data let say with one hour of accuracy:

If the POLL is 30 minutes, the Trend would be every 2 polls
If the POLL is 10 minutes, the Trend would be every 6 polls
If the POLL is 5 minutes, the Trend would be every 12 polls

You don't have to do this for every resource. You can migrate any resource's monitoring to all or some of the same resources through:

OPTIONS > User Preferences > Distribute Resource Properties

Example:

Change it for one type: Database: Monitoring Plans : Schedule Plan EDIT : Advanced Settings :

Change for example:

Schedule: Default_Schedule
Polling Interval: 10 minutes
Alerts Enabled: true
Trend Performance Data: true <-- consider if this resource needs trending at all?
Trend values:
Trend Db_ActAPW every: 6 poll(s)
Trend Db_ActBuf every: 6 poll(s)
Trend Db_ActIOFile every: 6 poll(s)
Trend Db_ActIOType every: 6 poll(s)
Trend Db_ActIdx every: 6 poll(s)
Trend Db_ActLock every: 6 poll(s)
Trend Db_ActLog every: 6 poll(s)
Trend Db_ActRec every: 6 poll(s)
Trend Db_ActServ every: 6 poll(s)
Trend Db_ActSum every: 6 poll(s)
Trend Db_AreaStatus every: 6 poll(s)
Trend Db_Checkpoint every: 6 poll(s)
Trend Db_IndexStat every: 72 poll(s)
Trend Db_TableStat every: 72 poll(s)

Then you can migrate these to all of the same resources through:

OPTIONS > Distribute Resource Properties

The Selection Filters, are for that resource (eg the database) you've just changed. Only tick "Remove and replace existing rules" if you haven't setup different/additional rules to that you've just changed.

The Target Container can be the same container (eg you have several databases on the same container as the "source" or remote containers.

The "Target Resources" are then the resources you want to copy those new monitoring rules to.

3. Additional Considerations

When using OpenEdge 11.6, upgrade to OpenEdge 11.6.3 or later which addresses a known issue with cachedb deadlocking was addressed with an orientdb version upgrade. OpenEdge 11.6.4, 11.7.3 are recommended where the performance on how expired records are deleted were improved. Both the orient databases may need to be deleted so they are created with a fresh upgraded version when the AdminServer is next started. Further detail are provided in Articles:

Tune fathom.init.params

These parameters apply to both the cachedb and configdb (one fathom.init.params is used)
Typically these only need to be considered for OpenEdge Management (OEM) environments which manage many resources (local and/or remote)

storage.record.lockTimeout=75000
storage.wal.maxSegmentSize=64
storage.wal.maxSize=512
storage.diskCache.bufferSize=512
storage.wal.syncOnPageFlush=false
storage.compressionMethod=gzip
fathomWorkQueueSize=400
fathomWorkQueueThreshold=15 [== never more than 20]
fathomWorkThreadMax=36
fathomPriorityThreadMax=36
envinronment.concurrency.level=8
class.minimumClusters=1
memory.chunk.size=64

storage.diskCache.bufferSize - memory size allocated specifically for database cache. Assure enough memory is available and increasing this will not lead to swapping
storage.wal.syncOnPageFlush - disables a sync at the OS level when a page is flushed to improve write throughput

Do not disable the journal (WAL) (storage.useWAL=false). OpenEdge uses orient transactions. A product redesign would be required to allow transactions to be disabled as well (fathom.orientdb.disable.transaction=true)

Disabling WAL could cause database integrity issues with the cachedb and configdb if the AdminServer crashes for any reason, this could prevent the AdminServer from starting and would require recreating these databases
This would be akin to starting an OpenEdge database with no-integrity (-i).

Workaround

Notes

Progress Article:

How to manage the size of the FathomTrend Database

Keyword Phrase

Last Modified Date	6/25/2024 1:38 PM

How to reclaim space used by the graph/activity database