What factors contribute to the variance in performance of an ABL query when a different index is used?

« Go Back

Information

Title	What factors contribute to the variance in performance of an ABL query when a different index is used?

URL Name	000033289

Article Number	000168551

Environment	Product: Progress Version: 9.x Product: OpenEdge Version: All supported versions OS: All supported platforms

Question/Problem Description

What factors contribute to the variance in performance of an ABL query when a different index is used?
Why the same result set returns faster when an ABL query is forced to use a different index with the USE-INDEX option
What could cause the same result set to be returned faster when a different index is used?
What factors contribute to the variance in the query performance?

Steps to Reproduce

Clarifying Information

Error Message

Defect Number

Enhancement Number

Cause

Resolution

Factors contribute to the variance in the query performance:

1. Index Bracketing

Depending on which index is used in a query, result list retrieval times can vary. The difference between worst case and best case retrieval times can be about 3 orders of magnitude, or more. For tables with only a few tens or hundreds of rows, these differences are generally not important.

A query that retrieves the same rows using a different index can take significantly more time. Each index will usually give a different retrieval time. Each index imposes some ordering on the rows, based on the key values. But there is only one storage order where the records are saved in the database. When the index order and the storage order match and the rows are reasonably dense, then good read performance occurs for queries that use the "good" index. For further information refer to Article:

Progress Indexing and Bracketing Rules

2. Storage Order

When using Type I Storage Areas (and no one should), rows from different tables are allocated space from the same storage pool and there is only one rm chain for the entire area. As a result, data blocks will usually have rows from several tables (unless the database has a table per area structure). Over time the storage order for the rows will become fairly randomized as table's records get updated.

In Type II Storage Areas, each table gets its own storage, built up from allocation clusters. There is no mixing of rows from different tables and tables interfere (compete) with each other much less. So the storage order tends to stay much more organized.

Starting with OpenEdge 11.0, if the table is in a Type II Storage Area, the TABLE-SCAN phrase can be used for better performance. However, when the physical storage order of the records created in a table closely match the ordering given by the index that would otherwise be used at compile-time, then the TABLE-SCAN option may not result in better performance. An example is provided in Article:

What is the benefit of using the TABLE-SCAN option?

3. Record Fragmentation

Row fragmentation affects retrieval time. When rows become fragmented, extra random I/O is needed to collect the fragments and construct the full row. This disrupts sequential I/O and also increases the total number of I/O operations needed. For further information refer to Articles:

4. Filesystem

Randomization at the filesystem level also disrupts sequential disk access.

5. Dump and Load re-ordering

A dump and load of a table changes the storage order:

Usually the dump of rows uses an index unless the data are dumped by RECID or with tablescan (-index 0), so the data in the dump file will be in that index order.
When loading that data into a Type II Storage Area or a new Type I area, the storage order will (mostly) match the index order and queries that use that index will be fast(er) when data on disk in the logical order that this data are usually queried by (logical ordering).

Example: sports2000 database:

The Customer Table has the CustNum as the primary index.
Customer data are usually read in by the Customer Name.
By dumping out the customer table using the Name index ordering as opposed to the CustNum ordering, gives better reads from disk as the records are grouped by alphabetical name instead of CustNum numbered order.
To visualize the resulting disk orientation of records when the storage order in index order after the Customer table is dumped and loaded using Name ordering:
- The sequential progression of record numbers shows that multiple records exist within the same block. When more records are retrieved, the records found in database blocks that are adjacent to each other.
- The non-sequential progression of record numbers shows that records are randomly ordered. The query accesses each record in the order described by the index which means that retrieval will move back and forth around the disk. The traversal time between non-contiguous blocks is necessary but the time the disk is not actively reading/writing is inefficient. It is necessary because the index selected due to the usage of the USE-INDEX option is likely needed to enforce a particular order for reporting purposes, however it is inefficient to read the data from disk in this order.

OUTPUT TO custnum.txt.
FOR EACH customer NO-LOCK USE-INDEX custnum.
    PUT UNFORMATTED RECID(customer) SKIP.
END.
OUTPUT CLOSE.

sample of the output in custnum.txt:

Before D&L: 97 ; 98 ; 99 ; 100 ; 101 ; 102 ; 103 ; 104 ; 105 ; 106 ; 107 ; 108 ; 109 ; 110 ; 111 ; 112 ...
After D&L: 789 ; 1322 ; 659 ; 580 ; 834 ; 525 ; 122 ; 562 ; 933 ; 719 ; 726 ; 1262 ; 282 ; 915 ; 658 ; 1285 ...

OUTPUT TO custname.txt.
FOR EACH customer NO-LOCK USE-INDEX NAME.
    PUT UNFORMATTED RECID(customer) SKIP.
END.
OUTPUT CLOSE.

sample of the output in custname.txt:

Before D&L: 1269 ; 1313 ; 1232 ; 377 ; 458 ; 1101 ; 1122 ; 443 ; 613 ; 139 ; 202 ; 516 ; 537 ; 236 ; 667 ; 1065 ...
After D&L: 97 ; 98 ; 99 ; 100 ; 101 ; 102 ; 103 ; 104 ; 105 ; 106 ; 107 ; 108 ; 109 ; 110 ; 111 ; 112 ...

OUTPUT TO custnameU.txt.
FOR EACH customer NO-LOCK USE-INDEX NAME WHERE NAME BEGINS "U":
PUT UNFORMATTED RECID(customer) " ; ".
END.

There are 10 customer records that satisfy this query in sports2000, the Cust_Data area has 32 records per block:

Before D&L: 179 ; 177 ; 821 ; 141 ; 530 ; 503 ; 876 ; 635 ; 842 ; 98 == 9 database blocks i/o
After D&L: 1313 ; 1314 ; 1315 ; 1316 ; 1317 ; 1318 ; 1319 ; 1320 ; 1321 ; 1322 == 1 database block i/o

Other factors that may contribute to the efficiency of an index include:

1. Index compression

All indexes on a table have the same number of entries, but they are not all the same size. The more duplicate key values there are, or the more keys with common leading bytes (i.e. not duplicate keys but almost duplicate), the better the index will compress. The more it compresses, the higher the likelihood that a level will be eliminated. When a level is eliminated this leads to a significant reduction in i/o.

2. The density of the index

When an index is packed into as few index blocks as possible, the number of i/o operations required to scan an index or part of it is reduced.

When to use IDXCOMPACT instead of IDXBUILD?

3. The presence of index placeholder entries

An index that has many placeholders after delete operations, will be less dense than one that has no placeholders and the i/o required for an index scan is increased.

Why deleting records in a table leads to index fragmentation ?

4. Key value size

The larger the key values are, the fewer index entries will fit into an index leaf block. The fewer index entries per block, the more I/O operations will be required to scan the index.

Workaround

Notes

Keyword Phrase

Last Modified Date	11/20/2020 7:02 AM