What is a Block Checksum?
- Block checksum was introduced in OpenEdge 10 for RM and IX blocks in Type II areas
- The checksum is an indicator of corrupt blocks at the filesystem level.
- It is used by the Block Manager to validate a database block when it is first read into shared-memory:
- A block in a Type II Area includes a checksum entry just before it is written to.
- When that block is read back into the buffer pool, this checksum is first validated to match what it was when it was first written.
- If it is not the same, then what is being read is not what was written and indicates the block got corrupted somewhere on it's way from shared memory all the way to disk and error 14410 is reported
- Any online utility may may raise 14410 errors when the block read is not already in the buffer pool. For example DBTOOL Options 3 & 5 execute checksum validation on data blocks but not index blocks.
Under these conditions,
further database integrity checks should be undertaken after correcting block checksums.
DBRPR to fix RM and IX block checksums
DBRPR provides a backend means of correcting block checksums
- The first implementation this DBRPR checksum fix was in OpenEdge 10.0A01, however it repairs one block at a time and for area wide changes, this is not efficient.
- Enhancements were added to DBRPR to fix all block checksums in the Type II Area specified since OpenEdge 10.0B01, 10.1A.
There were some issues at first Block checksum implementation as a consequence these blocks raise a checksum error when validated on a subsequent read,
for example:
- Databases restored with backups taken using the PROBKUP -com parameter
- The checksum for free blocks not being calculated when written.
- When backend DBRPR block repair free's corrupt blocks, the CheckSum may need to be re-calculated.
Steps:
1. Shut down the database and truncate the bi file. If enabled, disable after-imaging.
$ proshut dbname -by
$ proutil dbname truncate bi
$ rfutil dbname -C aimage end
2. Open the database repair utility
$ proutil dbname -C dbrpr
3. From the "Database Repair Menu" which is presented:
- Select Option "9. Change Current Working Area" to the TYPE II Storage Area under investigation.
- Select Option "16. Scan/Fix block checksum (Type II Area)", which presents a Submenu:
BLOCK CHECKSUM SCAN/FIX MENU
1. Report Bad Checksum
2. Fix Bad Checksum
G. Go
Choose Option 1, 2, or both 1 & 2 then "G. Go"
Input the range of blocks to scan/fix, or scan/fix all in current working area:
Scan all blocks in the area? (Y)es/ (N)o /(Q)uit:
- Y will result in DBRPR scan/fix all blocks in current Type II Storage Area,
- N will prompt for the data block dbkey range:
Type the dbkey of the block to start the scan at low range of block dbkey
(Input the low range block dbkey)
Type the dbkey of the block to use as endpoint
(Input the high range block dbkey)
After confirmation, DBRPR will then scan and report/fix all blocks checksum for known issues in the Type II Area or block range provided.
- There is no data loss repairing the block checksum
- An application code recompile is not needed after repairing database check sum errors at the block level. ABL code never has to worry about database block structure, only about schema and content. CRC checksums and database block checksum are two entirely different things and are not connected in any way.
- After fixing checksums, if bad block checksum errors are still reported, error 14410 is reported for it's intended purpose: To validate a database block when it is first read into shared-memory. In which case further memory / hardware investigation are needed. Apart from OS and peripheral investigation, the following Article outlines further advise from the OpenEdge perspective: