Does loading small records first still affect fragmentation in the Type II Storage Area architecture ?

« Go Back

Information

Title	Does loading small records first still affect fragmentation in the Type II Storage Area architecture ?

URL Name	P91250

Article Number	000141401

Environment	Product: OpenEdge Version: 10.x, 11.x OS: All supported platforms

Question/Problem Description

Does loading small records first still affect fragmentation in the Type II Storage Area architecture ?
How is fragmentation of records improved in TypeII Storage Areas?

Steps to Reproduce

Clarifying Information

Error Message

Defect Number

Enhancement Number

Cause

Resolution

When optimizing a database, fragmentation and therefore performance is affected by the order of load and how data are being loaded.

For Type I Storage Areas, this still holds true, table fragmentation (and therefore i/o performance when retrieving data) is assisted greatly by the order in which tables are loaded - it is perferable to load the smallest records before large records in order to ensure minimum fragmentation and maximum contiguous compaction (low scatter).

Whether loading into a Type I or Type II Storage Area, there should still only ever be one load operation per Storage Area at any time so as not to introduce fragmentation during the load. However, time-line considerations outweigh this strict "rule" and particularly for data of application-associated tables or static tables that were initially carefully considered during the dump phase ito record sorting (aka: Logical dump). Additionally, the "fragmentation" in the Type II case, will always be at a higher granularity to Type I Storage Areas (see example below).

When loading into an Type II Storage Area, it doesn't really matter if the tables with large records are loaded first, because Blocks in a Cluster are: contiguous and belong to the same object. “Database objects” can be in one or more clusters, but a cluster can only have one database object assigned to it. Unlike Type I Storage Areas, if there's 'spare space' in the blocks in the cluster, this space will not be filled by smaller records from another database object. Again, advice is one load per Storage Area - otherwise loading more than one load per Storage Area, will introduce the "fragmentation" on disk as a scatter of clusters.

Consider the following example:

IF table A is a table with big records and tables B and C are tables with small records - and they all belong to the same Type II Storage Area 1 (SA1):

IF these tables are loaded all at the same time .. the “fragmentation” is at cluster level. A full read of TABLE A, because it's spread over more than one cluster - we have to go to other clusters to read the full table content, there is still some need to reposition on the disk but it is dramatically reduced compared to what one might get if these tables had been loaded concurrently into a Type I Storage Area.

IF the tables are loaded all at once:

SA1 CL 0 Area root cluster
CL 1 TABLE A
CL 2 TABLE B
CL 3 TABLE C
CL 4 TABLE A
CL 5 TABLE B
CL 6 TABLE A

If the tables are loaded one at a time - big records first:

SA1 CL 0 Area root cluster
CL 1 TABLE A
CL 2 TABLE A
CL 3 TABLE A
CL 4 TABLE B
CL 5 TABLE B
CL 6 TABLE C

If the tables are loaded one at a time - small records first:

SA1 CL 0 Area root cluster
CL 1 TABLE C
CL 2 TABLE B
CL 3 TABLE B
CL 4 TABLE A
CL 5 TABLE A
CL 6 TABLE A

As with any optimization, it is critical to understand the nature of the data, the expected pattern of expansion for that data and the variability of that data.
The example above is over simplified - typically it wouldn't be advisable to have a large table with small records in the same storage area as a small table with large records because they would not have the same optimal "records per block".

Workaround

Notes

References to Other Documentation:

OpenEdge Data Management: Database Design, Chapter 1: "The OpenEdge RDBMS"
Open Edge Getting Started: Database Essentials, 3. OpenEdge RDBMS

Progress Article:
Guidelines for Type II Area Blocks per Cluster setting?
How is scatter factor calculated by DBANALYS

Keyword Phrase

Last Modified Date	11/20/2020 7:30 AM