CRC-VALUE calculation question

Posted by pdbibby on 08-Aug-2018 04:16

Scenario, using git and bamboo (CI) code is effectively in two directories.
When compiled a small number of our sources are generating different crc, and therefore md5, values - around 1-2% of sources.

Any thought on why?
db's are the same, sources are not changed - just different directory.
I know the filename is part of the crc-value calc BUT if the path was involved surely all our files would get differect crc-values rather than just this small amount

All Replies

Posted by Adriano Correa on 08-Aug-2018 09:16

Maybe you are confusing rcode-info:crc-value with rcode-info:table-crc-list

crc-value is a number for the .r file, what is changed even with a different directory.

table-crc-list is the same number for table crc which it was compiled

Posted by pdbibby on 14-Aug-2018 10:39

I understand the difference and am also using the table-crc-list but when using the MD5 or crc-value I'm noticing that a small number of files that shouldn't be flagged as changed are.

What I cant understand is if the directory where the source or rcode is matters why isnt it all files.

Posted by Adriano Correa on 14-Aug-2018 15:34

Yeah, I don't know. But I don't recommend to use the CRC to identify changes. For example, I have 164 thousand .r files in the system. Hundreds of these .r files have the same CRC value, but the code is completely different. There is a chance that you change a program and it generates the same CRC. It's not easy, but it's possible.

Posted by jbijker on 15-Aug-2018 05:48

Some time back I was also looking into CRC and MD5 checksums. For some unexplained reason the CRC checksum was not as good as the MD5, i.e. I got different CRC checksums on the same file but no source code was changed. Using the MD5 seems to be a better choice - but then you need to compile with the GENERATE-MD5 option.

However I was never able to find a good reason why the CRC would change but the MD5 stays the same.

Posted by gus bjorklund on 15-Aug-2018 09:06

the reason CRC is not as good as MD5 is because CRC was designed for a completely different purpose, namely to detect certain types of transmission errors in data communications. in such uses, the CRC is computed over a relatively small block of data (up to a few thousand bytes). differing inputs can quite normally result in the same checksum values, especially when the data blocks are too large. this makes it it unsuitable for detecting identical or changed files.

Posted by jbijker on 16-Aug-2018 04:25

So why would a CRC change but the MD5 stays the same with exactly the same source code?

Posted by Roger Blanchard on 16-Aug-2018 07:24

CRC will change based on DB schema changes but the MD5 will not.

Posted by gus bjorklund on 16-Aug-2018 15:14

> On Aug 16, 2018, at 5:26 AM, jbijker wrote:

>

> So why would a CRC change but the MD5 stays the same with exactly the same source code?

>

you have not actually explained how you obtain either of these values so i cannot say why the CRC differs and MD5 does not.

Posted by pdbibby on 20-Aug-2018 10:27

the issue seems to have drifted a little

after compiling I'm checking the crc-table-list and the md5, I've also checked the crc-value as a test, and comparing with previous list of same to see which pieces of rcode need adding to a patch library, A small percentage of pieces of rcode are indicated with change due to md5 but I know they are not changed, The only difference is that we have extracted from the git repo into a different directory.

I was wondering if someone could explain why I get different md5 as it cant be source directory as that would flag all code.

Posted by jbijker on 21-Aug-2018 04:54

This is what I'm doing for my tests: I compile from the same source in the same directory and connected to the same DB. The compile is done with GENERATE-MD5.

Then I cycle through all r files and use the following code to get the CRC & MD5:

PROCEDURE checkrcode:

 DEFINE INPUT  PARAMETER ipcFilename AS CHARACTER   NO-UNDO.

 rcode-info:FILE-NAME = ipcFilename.

 EXPORT STREAM sCRC DELIMITER ",":U

     ipcFilename

     RCODE-INFO:CRC-VALUE

     RCODE-INFO:MD5-VALUE.

END PROCEDURE.

I've got about 2500 files and 4 files have different MD5 values which I can't explain. All CRC values match.

If I compile from a different path then I get a lot more CRC mismatches (just more than 200 files), and funny enough the same 4 files as before also give different MD5 values.

My previous test from a couple of years ago was on version 11.3.3 - there I did see a change in CRC without a change in MD5. But now with 11.7.2 it seems to be fixed.

Posted by ske on 21-Aug-2018 05:05

If you do a binary compare of the r-files, what differences do you find?

Many? A few bytes? Where?

Maybe we can figure out what it is.

Posted by jbijker on 21-Aug-2018 05:32

While doing the binary compare I did see some changes that has to do with SQL buffer name, so I had a bit of a closer look and all these 4 programs have SQL statements inside them, either

DELETE FROM

or

SELECT MIN

After converting the SQL statement to 4GL I get a consistent MD5 value back. Interesting.

Posted by jbijker on 22-Aug-2018 06:15

I've logged this with support: case # 00455861

Posted by jbijker on 27-Aug-2018 06:06
This thread is closed