Salesforce

High i/o on the cachedb

« Go Back

Information

 
TitleHigh i/o on the cachedb
URL Namecachedb-excessive-i-o-000094727
Article Number000111203
EnvironmentProduct: OpenEdge
Version: 11.7.3, 11.7.4
OS: All supported platforms
Question/Problem Description
After deleting the OEM cachedb, OEM is slow to start and exhibits high i/o on the cachedb
Once the AdminServer and fathom-plugin have started, OE Console access/login/navigation is slow
Steps to Reproduce
Clarifying Information
Write Disk i/o on the cachedb .wal files is highest when OEM is started

High i/o on the cachedb leads to longer startup times when the fathom.plugin restarts due to crash recovery WAL logging (like the OE database bi file, crash recovery)

While OEM is running, cachedb writes 4200000 blocks of 4 k in 1 hour, ~ 16 GBytes / hour then drops in the next hours.

Disk i/o to the cachedb bursts during polls, as much as 24MB over 10 seconds:
Disks: %   tm_act Kbps tps Kb_read Kb_wrtn
hdisk1 2.0 2468.0        43.3             0    24680


A lot of storage areas are spread across these databases: , typically: 84 Storage Areas, 712 extents
Error Message
Defect NumberEnhancement ADAS-10709, ADAS-11178
Enhancement Number
Cause
OEM was never tested for disk IO impact, only scalability.

In OpenEdge 11.7.3:
  • The Orient activity database (cachedb) was changed to use full schema mode as opposed to using schema-less mode to gain a performance benefit and a storage space benefit.  
  • At the time about a 1/3 to 1/2 reduction in space used by activity database was observed, ( it varied based on how the testing was done)
  • What was not considered at the time, is that schema creation is really expensive and ends up writing ~700MB MB to disk, most of which is .wal file tracking schema changes
The disk write overhead during polling was greatly overshadowed by the amount of disk IO during creation of the cache database, which affects the OEM startup times and slow OE console navigation.
  • Creating the cache database, which after creation is typically about 40MB, the disk writes are in the realm of 700+ MB.
  • During cachedb database creation as each schema item is created and the database schema size increases, it takes more disk writes to create each additional schema object in the database.
  • This is due to how the embedded orientdb writes the full schema to disk on each change as opposed to each item.
Resolution
Upgrade to OpenEdge 11.7.5  where:

ADAS-10709 enhanced IO performance of the OpenEdge Management graph cache:

1. Record sizes were reduced which reduces the amount of data needed to write to disk (smaller records). Overtime this is expected to have an impact of the cachedb on disk size
2. Orient transactions are used for all database write operations, which reduced the size of writes to disk during polling by about 50%
3. When OEM performs polling it retrieves data from VSTs for file field, index, area, and areaextent VST tables.
  • On each poll, the old data is deleted from the graph cache and the new records are written. This overhead is not needed since when values don't or rarely change on these fields
  • A cache was put in place and checks were added to compare old vs new records, to avoid deleting old VST information if it has not changed in the last polled interval (default 5 minutes)
Yourkit was used to evaluate these changes by recording disk writes events on a database with 100 each of tables, fields, indexes, areas, and extents. Disk writes were reduced to about 25% of the original values (YMMV)


ADAS-11178 enhanced fast creation of cache database.
 
A new option initcachedb, deletes existing activity database, disables the .wal file, and then recreates the database. Turning off the .wal file reduces the write disk i/o overhead significantly:

$   fathom -stop
$   fathom -initcachedb


Loading OEM is much faster since the activity database already exists, and doesn't need to be created.
$   fathom -start
Workaround
Until upgrading to OpenEdge 11.7.5 Service Pack, the workaround is twofold:

1. Don't delete the cachedb

2. When shutting down the AdminServer, first stop the fathom plugin to ensure the cachedb (and configdb) are properly closed.
This will then minimise the amount of WAL i/o (a bit like BI recovery on OE databases) when the fathom-plugin is restarted.
$    fathom -stop
$    proadsv -stop

Additionally Consider:
  1. Review the Number of events configured for monitoring.
  2. Revise and modify the Poll interval of monitored resources.
  3. Revise and re-configure the retention period for graph cache depending on the Resource
  4. For more detailed instruction on the above, refer to Articles:
Notes
Keyword Phrase
Last Modified Date11/20/2020 7:03 AM

Powered by