Scenario, using git and bamboo (CI) code is effectively in two directories.
When compiled a small number of our sources are generating different crc, and therefore md5, values - around 1-2% of sources.
Any thought on why?
db's are the same, sources are not changed - just different directory.
I know the filename is part of the crc-value calc BUT if the path was involved surely all our files would get differect crc-values rather than just this small amount
Maybe you are confusing rcode-info:crc-value with rcode-info:table-crc-list
crc-value is a number for the .r file, what is changed even with a different directory.
table-crc-list is the same number for table crc which it was compiled
I understand the difference and am also using the table-crc-list but when using the MD5 or crc-value I'm noticing that a small number of files that shouldn't be flagged as changed are.
What I cant understand is if the directory where the source or rcode is matters why isnt it all files.
Yeah, I don't know. But I don't recommend to use the CRC to identify changes. For example, I have 164 thousand .r files in the system. Hundreds of these .r files have the same CRC value, but the code is completely different. There is a chance that you change a program and it generates the same CRC. It's not easy, but it's possible.
Some time back I was also looking into CRC and MD5 checksums. For some unexplained reason the CRC checksum was not as good as the MD5, i.e. I got different CRC checksums on the same file but no source code was changed. Using the MD5 seems to be a better choice - but then you need to compile with the GENERATE-MD5 option.
However I was never able to find a good reason why the CRC would change but the MD5 stays the same.
the reason CRC is not as good as MD5 is because CRC was designed for a completely different purpose, namely to detect certain types of transmission errors in data communications. in such uses, the CRC is computed over a relatively small block of data (up to a few thousand bytes). differing inputs can quite normally result in the same checksum values, especially when the data blocks are too large. this makes it it unsuitable for detecting identical or changed files.
So why would a CRC change but the MD5 stays the same with exactly the same source code?
CRC will change based on DB schema changes but the MD5 will not.
> On Aug 16, 2018, at 5:26 AM, jbijker wrote:
>
> So why would a CRC change but the MD5 stays the same with exactly the same source code?
>
you have not actually explained how you obtain either of these values so i cannot say why the CRC differs and MD5 does not.
the issue seems to have drifted a little
after compiling I'm checking the crc-table-list and the md5, I've also checked the crc-value as a test, and comparing with previous list of same to see which pieces of rcode need adding to a patch library, A small percentage of pieces of rcode are indicated with change due to md5 but I know they are not changed, The only difference is that we have extracted from the git repo into a different directory.
I was wondering if someone could explain why I get different md5 as it cant be source directory as that would flag all code.
This is what I'm doing for my tests: I compile from the same source in the same directory and connected to the same DB. The compile is done with GENERATE-MD5.
Then I cycle through all r files and use the following code to get the CRC & MD5:
PROCEDURE checkrcode:
DEFINE INPUT PARAMETER ipcFilename AS CHARACTER NO-UNDO.
rcode-info:FILE-NAME = ipcFilename.
EXPORT STREAM sCRC DELIMITER ",":U
ipcFilename
RCODE-INFO:CRC-VALUE
RCODE-INFO:MD5-VALUE.
END PROCEDURE.
I've got about 2500 files and 4 files have different MD5 values which I can't explain. All CRC values match.
If I compile from a different path then I get a lot more CRC mismatches (just more than 200 files), and funny enough the same 4 files as before also give different MD5 values.
My previous test from a couple of years ago was on version 11.3.3 - there I did see a change in CRC without a change in MD5. But now with 11.7.2 it seems to be fixed.
If you do a binary compare of the r-files, what differences do you find?
Many? A few bytes? Where?
Maybe we can figure out what it is.
While doing the binary compare I did see some changes that has to do with SQL buffer name, so I had a bit of a closer look and all these 4 programs have SQL statements inside them, either
DELETE FROM
or
SELECT MIN
After converting the SQL statement to 4GL I get a consistent MD5 value back. Interesting.
I've logged this with support: case # 00455861
Already logged as bug# OCTA-3224 but it is not fixed