Information

Title	High i/o on the cachedb

URL Name	cachedb-excessive-i-o-000094727

Article Number	000111203

Environment	Product: OpenEdge Version: 11.7.3, 11.7.4 OS: All supported platforms

Question/Problem Description

After deleting the OEM cachedb, OEM is slow to start and exhibits high i/o on the cachedb
Once the AdminServer and fathom-plugin have started, OE Console access/login/navigation is slow

Steps to Reproduce

Clarifying Information

Write Disk i/o on the cachedb .wal files is highest when OEM is started

High i/o on the cachedb leads to longer startup times when the fathom.plugin restarts due to crash recovery WAL logging (like the OE database bi file, crash recovery)

While OEM is running, cachedb writes 4200000 blocks of 4 k in 1 hour, ~ 16 GBytes / hour then drops in the next hours.

Disk i/o to the cachedb bursts during polls, as much as 24MB over 10 seconds:
Disks: % tm_act Kbps tps Kb_read Kb_wrtn
hdisk1 2.0 2468.0 43.3 0 24680

A lot of storage areas are spread across these databases: , typically: 84 Storage Areas, 712 extents

Error Message

Defect Number	Enhancement ADAS-10709, ADAS-11178

Enhancement Number

Cause

OEM was never tested for disk IO impact, only scalability.

In OpenEdge 11.7.3:

The Orient activity database (cachedb) was changed to use full schema mode as opposed to using schema-less mode to gain a performance benefit and a storage space benefit.
At the time about a 1/3 to 1/2 reduction in space used by activity database was observed, ( it varied based on how the testing was done)
What was not considered at the time, is that schema creation is really expensive and ends up writing ~700MB MB to disk, most of which is .wal file tracking schema changes

The disk write overhead during polling was greatly overshadowed by the amount of disk IO during creation of the cache database, which affects the OEM startup times and slow OE console navigation.

Creating the cache database, which after creation is typically about 40MB, the disk writes are in the realm of 700+ MB.
During cachedb database creation as each schema item is created and the database schema size increases, it takes more disk writes to create each additional schema object in the database.
This is due to how the embedded orientdb writes the full schema to disk on each change as opposed to each item.

Resolution

Upgrade to OpenEdge 11.7.5 where:

ADAS-10709 enhanced IO performance of the OpenEdge Management graph cache:

1. Record sizes were reduced which reduces the amount of data needed to write to disk (smaller records). Overtime this is expected to have an impact of the cachedb on disk size
2. Orient transactions are used for all database write operations, which reduced the size of writes to disk during polling by about 50%
3. When OEM performs polling it retrieves data from VSTs for file field, index, area, and areaextent VST tables.

On each poll, the old data is deleted from the graph cache and the new records are written. This overhead is not needed since when values don't or rarely change on these fields
A cache was put in place and checks were added to compare old vs new records, to avoid deleting old VST information if it has not changed in the last polled interval (default 5 minutes)

Yourkit was used to evaluate these changes by recording disk writes events on a database with 100 each of tables, fields, indexes, areas, and extents. Disk writes were reduced to about 25% of the original values (YMMV)

ADAS-11178 enhanced fast creation of cache database.

A new option initcachedb, deletes existing activity database, disables the .wal file, and then recreates the database. Turning off the .wal file reduces the write disk i/o overhead significantly:

$ fathom -stop
$ fathom -initcachedb

Loading OEM is much faster since the activity database already exists, and doesn't need to be created.
$ fathom -start

Workaround

Until upgrading to OpenEdge 11.7.5 Service Pack, the workaround is twofold:

1. Don't delete the cachedb

2. When shutting down the AdminServer, first stop the fathom plugin to ensure the cachedb (and configdb) are properly closed.
This will then minimise the amount of WAL i/o (a bit like BI recovery on OE databases) when the fathom-plugin is restarted.

$ fathom -stop
$ proadsv -stop

Additionally Consider:

Review the Number of events configured for monitoring.
Revise and modify the Poll interval of monitored resources.
Revise and re-configure the retention period for graph cache depending on the Resource
For more detailed instruction on the above, refer to Articles:

Notes

Keyword Phrase

Last Modified Date	11/20/2020 7:03 AM