Salesforce

How to calculate the best number of records per block for Storage Areas

« Go Back

Information

 
TitleHow to calculate the best number of records per block for Storage Areas
URL NameP7561
Article Number000148910
EnvironmentProduct: OpenEdge
Version: All supported versions
OS: All supported platforms
Question/Problem Description
How to calculate the best number of records per block for Storage Areas ?
How to determine the optimal record per block for tables in a Specific Area ?
What Storage Area size for the new records per block ?
Why does the Storage Area take up more space than a DBANALYS reports?
Steps to Reproduce
Clarifying Information
Error Message
Defect Number
Enhancement Number
Cause
Resolution
The discussion below is simplified. The intent is to clarify the considerations and consequences of the record per block setting in:
a. Deciding the best records per block for a Storage Area structure
b. Estimating the minimum Storage Area size for the new Storage Area structure
c. Understanding why the size reported in the tabanalys report is not the same as the size of the area on disk.

a. Deciding the best records per block for a Storage Area structure

To determine the best database block size and records per block (record slots per block) for Storage Areas, first determine which are important tables for your database and application.Usually tables which are very large and used frequently should be considered first. 

Information from a recent DBANALYS report needs to be analysed for those tables, specifically, the tabanalys section.  For discussion let's say that "Employee" is a critical table. It satisfies standard criteria that it is big (both in record count and byte count). In the example below, we see that the average record size is 115 bytes.
 
RECORD BLOCK SUMMARY FOR AREA "DATA_AREA" : 7
-------------------------------------------------------
                                  -Record Size (B)-------Fragments---Scatter
Table            Records    Size   Min   Max  Mean      Count Factor  Factor
PUB.Employee    60309414    5.0G    27  9928   115   60310824    1.0     4.3

The possible database block sizes Progress OpenEdge databases can use are: 1, 2, 4 and 8 KB.
Most commonly a database block size which is equal to or a multiple of the file system block size is chosen.

Defining each Storage Area's records per block will always a balancing act between I/O performance (filling the block and not fragmenting records) and wasted rowids, even for Type II Storage Areas.

For example:
Let's say the database block size is 4K (or 4096 bytes). Progress must use a portion of that 4K for internal referencing in the block header so the usable space for data is typically 100 bytes less than the block size.  The dbanalys/tabanalys report already includes record overhead and entry size, however there are pro and counter arguments for additionally subtracting the Create Limit (150 bytes) in this consideration. For the purposes of this Article, we'll use the 100 bytes ballpark.
How many 115 byte records could fit in 3996? Answer: 34. (database blocksize / mean record size)

The valid number of records per block values for a Progress OpenEdge Progress are: 1, 2, 4, 8, 16, 32, 64, 128, or 256.
So which records per block value should be chosen, 32 or 64?

If 32 records per block are defined:
  • There would be 32 records each 115 bytes in a 4k block.
  • That would utilize 3680 bytes (+ 100 for the block header information).
  • While 316 bytes would be wasted in each database block and no record slots would be unused.
  • Assuming these records don't grow, this would mean that ~581,597 KB of this Storage Area are unused:  (60309414 records / 32 rpb x 316 bytes)
If 64 records per block are defined:
  • There would be 34 records each 115 bytes in a 4k block.
  • That would utilize 3910 bytes (+ 100 for our header information).
  • Only 86 bytes would be wasted and 30 record slots would be unused.
  • Assuming these records don't grow, this would mean that ~148,972 KB of this Storage Area are unused (60309414 records / 34 rpb x 86 bytes)
When defining the number of records per block we are stipulating that within each database block a certain range of record identifiers are assigned (RECIDS | ROWIDS).
These record numbers are assigned explicitly to a block when the Storage Area extents are created with PROSTRCT commands.
The Record Manager controls record placement based on the number of records per block and the available space within that block following specific algorithms and assigns the record identifier accordingly on record creation.

Example: 32 records per block
 
Block 5 = contains records 129 to 160
Block 6 = contains records 161 to 192
 
If all the space in Block 5 is used by 1 record then 31 additional records can't be placed into this block because while there are still record slots available, there is no space remaining in the block. In the above example, the maximum record size is 9928 bytes, which would mean that this record would be fragmented across 3 database blocks: 
  • Block 1: using 1, wasting 31 rowids +
  • Block 2: using 1 for the record fragment, wasting 31 rowids, +
  • Block 3: assuming average record size populates the remaining 2060 bytes in the 3rd block, 18 of the remaining 31 rowids would be used, wasting 13 rowids. (total 75 rowids) 
Let's change the example and say our database block size is 8 KB:
 
8192 - 100 bytes (our header) = 8092
8092 / 115 = 70 records.

The area structure could be configured for 64 or 128 records per block:
  • By defining 128 records per block, each block would hold 70 records, wasting 58 record slots that could not be used and leave 42 bytes remaining (~19,325 KB for the area).
  • By defining 64 rpb, while no record slots would be unused, 732 bytes of each database block would be unused (~673,622 KB for the area).
The extra 6 records afforded by defining 128 rpb instead of 64 rpb would mean:
  • 80,773 less database blocks would have to be read from disk into shared memory for a full table scan (942,357 vs 864,583)
  • 19 MB  vs 658 MB of unused space
 
While it may be inferred from the above discussion that wasting rowids is preferable to wasting space by having better data compaction, there is a limit of 2 billion records per area to consider. At the point in time when approaching 2 billion records in a given area, proactive measures need to be taken to avoid hitting this limit. For further discussion, refer to Article When does the 2 billion rowid limit per Storage Area apply?     
 
b. Estimating the minimum Storage Area size for the new Storage Area structure

Having decided on the best rpb setting for the table, the next logical question is:
How much space will be needed to load this data into the new storage area structure?

The minimum Storage Area size can be estimated by:
a. Take the lesser value of: 
The rbp value decided vs the more likely value of database blocksize / mean record size.
Let's assign the lesser value = R
b. Take the number of records for the table and divide by R
This gives you the number of database blocks that will be used
c. Multiply by the database blocksize to give you a rough estimate of the area size for data.
 
If the table has many records and there have over the history of this table been field schema updates to this table, where the 'historic' records haven't since been touched, the actual size of the loaded data could be a lot larger than this calculation as all records when loaded are bought up to the current schema definition. In other words, a tabanalys report after the load will show that the min, mean and average record sizes are different to the report taken prior to the load.

In our Employee example:
An 8 KB database blocksize with 128 records per block have been configured for the area structure.
There are 60309414 Employee records, R = 70 records 
60309414 \ 70 = 861563 database blocks x 8 =  6.57 GB estimated area space to load this data.

For indexes it's slightly easier, the DBANALYS report under the index section (idxanalys), provides the number of index "Blocks" currently used, total this value and multiply by the database blocksize to give a  estimate of the space needed for the indexes in an area.  Ideally an IDXCOMPACT 100% should be run beforehand to get the best value or factor in the "% Util" value in the report.

In our Employee example:

INDEX BLOCK SUMMARY FOR AREA                                                          
-----------------------------                                                         
Table                      Index  Fields Levels         Blocks    Size  % Util  Factor
PUB.Employee                                                                          
  Comments                    88       1      3          161325   611.3M    97.0     1.1
  DeptCode                    38       2      3           31000   118.7M    98.0     1.0
  EmpNo                       37       1      3           11775    44.9M    97.7     1.0
  Name                        39       1      3           34100   132.4M    99.4     1.0


The Employee indexes will need an estimated: 247753 x 8 = 1936 MB / 2 = 968 MB

Hang on! Why did you divide by 2?
Because the report is in the current number of database blocks, which is 4 KB
And will be loaded into an 8 KB database.

c. Understanding why the size reported in the tabanalys report is not the same as the size of the area on disk.

The above examples also demonstrate why the size of a given table in a tabanalys report is not the same as the size that table takes up on disk.  Our Employee table is reported to use 5GB of space in the report. This does not include unused space in blocks or empty blocks in its calculation, which will be taking up physical disk space for the Storage Area.  For further discussion, Refer to Articles:
Database, not including empty blocks, is substantially larger than the actual data size   
Dbanalys gives contradicting Information about the DB size   
Workaround
Notes

 
Keyword Phrase
Last Modified Date11/20/2020 7:21 AM

Powered by