GUID or not? - Forum - OpenEdge Development - Progress Community
 Forum

GUID or not?

  • In some current code, I have been using

    base64-encode(generate-uuid)

    to get a character GUID to use as a record identifier.  In Windows 7, it works fine.  In Windows 8.1 it is coming up with duplicates.  In particular, this test program:

    define temp-table ttKeys no-undo
      field chID as character
      field inCnt as integer
      index ixID as primary unique chID
      index ixCnt as unique inCnt.
    define variable pinCnt as integer no-undo.
    define variable pinLimit as integer no-undo initial 400000.
    
    repeat while pinCnt < pinLimit:
      create ttKeys.
      assign   
          pinCnt = pinCnt + 1
          ttKeys.chID = base64-encode(generate-uuid)
          ttKeys.inCnt = pinCnt
          .
      catch eobSysError as Progress.Lang.SysError :
        define variable inWhich as integer no-undo.
        message substitute("Unexpected Exception: &1", eobSysError:getMessage(1)) view-as alert-box.
        message substitute("Count = %1", pinCnt) view-as alert-box.
      end.
    end.
    if pinCnt >= pinLimit then message "Success" view-as alert-box.

    will pretty consistently produce a duplicate key, although sometimes it is 330000 records in before that happens.  With Windows 7, we have gone to a million with no dups.

    Do I have some misunderstanding in thinking this should work or is the 8.1 behavior a bug.  TS is working on it.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • We ran into this as well.. it’s not a Windows 8.1 thing.
     
    The base64-encode function will produce identical values (as far as the 4GL is concerned) but different case.
     
    We caps and hex-encode our id’s to ensure no dupes:
     
    cUUID = caps(hex-encode(generate-uuid)).
     
     
     
    From: Thomas Mercer-Hursh [mailto:bounce-tamhas@community.progress.com]
    Sent: Tuesday, June 03, 2014 2:40 PM
    To: TU.OE.Development@community.progress.com
    Subject: [Technical Users - OE Development] GUID or not?
     

    In some current code, I have been using

    base64-encode(generate-uuid)

    to get a character GUID to use as a record identifier.  In Windows 7, it works fine.  In Windows 8.1 it is coming up with duplicates.  In particular, this test program:

    define temp-table ttKeys no-undo
      field chID as character
      field inCnt as integer
      index ixID as primary unique chID
      index ixCnt as unique inCnt.
    define variable pinCnt as integer no-undo.
    define variable pinLimit as integer no-undo initial 400000.
     
    repeat while pinCnt < pinLimit:
      create ttKeys.
      assign   
          pinCnt = pinCnt + 1
          ttKeys.chID = base64-encode(generate-uuid)
          ttKeys.inCnt = pinCnt
          .
      catch eobSysError as Progress.Lang.SysError :
        define variable inWhich as integer no-undo.
        message substitute("Unexpected Exception: &1", eobSysError:getMessage(1)) view-as alert-box.
        message substitute("Count = %1", pinCnt) view-as alert-box.
      end.
    end.
    if pinCnt >= pinLimit then message "Success" view-as alert-box.

    will pretty consistently produce a duplicate key, although sometimes it is 330000 records in before that happens.  With Windows 7, we have gone to a million with no dups.

    Do I have some misunderstanding in thinking this should work or is the 8.1 behavior a bug.  TS is working on it.

    Jeff Ledbetter

    Roundtable Product Architect

    www.roundtable-software.com

  • Base-64 encoding is case-sensitive and having two values that vary only by case is perfectly acceptable.  So using caps() will induce collisions with other base-64 encoded UUID values and not allow you to recover the initial raw value.   So this is not a valid workaround.

    for example:   the value 2000 encodes to "MjAwMAo=" : if you uppercase and decode "MJAWMAO=" you get an invalid value ( 0?0 ).

  • Jeff was suggesting using hex-encode instead of base64-encode. AFAIK, hex-encode is not case-sensitive, unlike base64-encode.

  • You could define the chID field in your temp-table as CASE-SENSITIVE and still use BASE64-ENCODE. Then it will depends on how the code is using it later (comparisons should then be made against the temp-table field for the comparison to be case-sensitive too).

  • Thomas did you see if the generate-uuid generated duplicates as well? It is most likely that the way the "random" GUID is generated that is at fault even before encoding is done.

    I had to write something in the mainframe world as we did not have standard libraries to create GUID's and  some of the suggested "random" methods did not work at all.

  • If I do MESSAGE GUID(generate-uuid). I generate a lot of numbers which is in format:

    XXXXXXXX—XXXX—XXXX—XXXX—XXXXXXXXXXXX

    but the first 3 groupings are exactly the same, which is not EXPECTED!

    Is it possible to get the code that generate these UUID and GUID?

  • the fact that you do not expect the first 3 groupings to be the same does not mean it is wrong.

    uuids are made up of various parts. one of those is a proxy for location, often derived from an ethernet mac address. it is the whole things that is supposed to be unique, not the parts that go into one.

  • Gus, I do understand that. :-) my gut feeling is just that the smaller the "random part" the greater the chance of duplication.  

    Also the V1 GUID Algorithms tend to use MAC address and Time as part of the value but these are indicated by a 1 as the first digit/character on the 3rd grouping. These ID's are all Version 4 which are PSEUDO random number (number 4 in first digit of the 3rd grouping.

    And the numbers I generated  today all had exactly the same values in the first 4 groupings. The UUID only changes from position 23 onwards.  

    Not saying it is wrong, but merely asking id the GUID's could not be a bit more random?

    But I will go write some code to populate a table with extremely large number of GUID as primary keys and see how often duplication comes up.

    I am just interested in knowing how the IDs are created and if it could be improved upon.  :-)

  • @alex: if you are getting duplicate GUID's then that is a bug and has to be fixed.

    if you are not getting duplicates then it is working as it is supposed to and does not require any improvement. uniqueness is all you have a right to expect.

  • Switching to a case-sensitive field has fixed things for me.  Youssif, who is working with me on my TS case, said:

    I just tested the following two version (one tests values generated by GENERATE-UUID calls, the other tests values generated by the BASE64-ENCODE(GENERATE-UUID) calls. My count was set to 10 million records on Windows 2008 R2 under 11.3.2 without duplicates:

    We still have no idea why, without the case-sensitive, we were getting dups on 8.1 and not on 7.  But, with the case-sensitive I get no dups up to 1 million in multiple passes.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • Is Progress calling an OS function to generate UUIDs?

  • To wrap this up, thanks to the thoroughness of Youssif Shanshiry ...

    One can index a temp-table by a raw, but not a database table, so one *must* convert to character to use a guid as a primary key.  There are three methods:

    base64-encode(generate-uuid)

    Yields a string 24 long, the last 2 of which can be discarded.  The string is case-sensitive.

    guid(generate-uuid)

    Yields a string 36 characters long which is not case sensitive.

    hex-encode(generate-uuid)

    Yields a string 32 characters long which is not case sensitive.

    So, you makes your pick.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • Thanks for the wrap up Thomas, that is quite a relief.

    We are heavily relying on GUID(GENERATE-UUID) and use the generated values as globally unique keys for merging records in distributed databases.

    In the beginning of this thread I was shocked to find that this might not be safe in hindsight when the indexes are not defined as case-sensitive (which the are not). But now it seems to be safe after all, as the documentation has always suggested.

    So the only problem appears to be with base64-encode(generate-uuid).

  • No need to use GUID(GENERATE-UUID), just GUID is enough. Besides a separate guid-generator wrapper class / method or whatever you call it can be handy. The 36 char guid as primary key has a performance trade-off that could be improved in the future. See f.e. www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database and the recent disussions on the clustered index / comb guid on the peg.