GUID/sequence - Forum - OpenEdge Development - Progress Community
 Forum

GUID/sequence

  • Some background info: http://www.codinghorror.com/blog/2007/03/primary-keys-ids-versus-guids.html and a thead on the peg (see below the sig).

    I'm interested to use the GUID as primary unique key, because of the pro's. What I forgot to mention in the thread on the peg is that a sequential number, f.e. for an ordertable is useful/needed. You can make a PUK (orderid field) and assign a GUID to it, and make the index on the ordernumberfield unique

    (and assign it with a sequence). Expect the create of an order to become a bit slower and queries with joins on the GUID fields ditto. But that could be acceptable. Thoughts?

    Regards, Stefan.

    guid vs sequence
    peg
    x

    Stefan Houtzager
    Jan 30
    to peg
    Hi,

    OE 10.2B04/11. Using Java ZK framework as front-end. In new
    applications I often see GUID's being used as unique identifiers for
    records. What would be good reasons to abandon the use of sequences
    and replace them with guid's?
    Julian Lyndon-Smith julian@lyndon-smith.com
    Jan 30
    to me, peg
    One of the reasons is if you ever transfer data from one system to another there will be no clash in id's.
    Another reason is that you don't have to create a sequence for every table that has a unique key requirement
    Scott Augé sauge@amduus.com
    Jan 30
    to me, peg
    Sequences are easily guessed.  GUIDs are not.

    Sequences can be screwed up on db reloads.  GUIDs are not.
    Julian Lyndon-Smith julian@lyndon-smith.com via peg.com
    Jan 30
    to peg
    Antanas Kompanas a.kompanas@baltic-amadeus.lt via peg.com
    Jan 30
    to peg
    On 01/30/12 14:15, Stefan Houtzager wrote:
    > What would be good reasons to abandon the use of sequences
    > and replace them with guid's?

    The only good reason to use guid as primary key is when you somehow need
    to merge the data into one table - that is, when you essentially have a
    distributed database. Other that that, guid's are hard to read/write,
    need additional storage and take more time to generate.
    Mike Fechner mike.fechner@consultingwerk.de
    Jan 30
    to me, peg
    A GUID can be created on the client.

    Plus all the other arguments.

    -----Ursprüngliche Nachricht-----
    Von: postmaster@peg.com [mailto:postmaster@peg.com] Im Auftrag von Stefan Houtzager
    Gesendet: Montag, 30. Januar 2012 13:15
    An: peg@peg.com
    Betreff: guid vs sequence
    Antanas Kompanas a.kompanas@baltic-amadeus.lt via peg.com
    Jan 30
    to peg
    On 01/30/12 14:29, Scott Augé wrote:
    > Sequences are easily guessed.  GUIDs are not.

    I wouldn't be so sure. GUIDs are unique, but they are not necessarily
    hard to guess for a given system.

    For example, if I generate my IDs using my ethernet MAC address and a
    sequence, I get globally unique values but they are completely
    predictable after some recent IDs have been sampled.
    Julian Lyndon-Smith julian@lyndon-smith.com via peg.com
    Jan 30
    to peg
    we just use

    guid(generate-uuid)

    cannot guess a pattern. However, surely guessing a sequence is a
    *whole* lot easier

    1
    2
    3
    *wonder what comes next*

    Antanas Kompanas a.kompanas@baltic-amadeus.lt via peg.com
    Jan 30
    to peg
    On 01/30/12 15:11, Julian Lyndon-Smith wrote:
    > we just use
    >
    > guid(generate-uuid)
    >
    > cannot guess a pattern.

    Run this many times in a tight loop and see how many bits actually change.

    > However, surely guessing a sequence is a
    > *whole* lot easier
    >
    > 1
    > 2
    > 3
    > *wonder what comes next*
    >
    >

    What I wanted to say is that no-one should believe that GUIDs are
    automatically a lot safer in this respect. I can just as well guess
    (more or less, with a bit of luck) what comes next with this:

    e0fb87f6-59f8-04a0-e111-454b9e2ac94f
    e0fb87f6-59f8-04a0-e111-454b5492c94f
    e0fb87f6-59f8-04a0-e111-454b5cadc94f

    Depending on the circumstances, it might be just as easy as 1 2 3.
    Gus Bjorklund gus@progress.com via peg.com
    Jan 30
    to peg
    why does it matter if identifiers can be guessed?

    if it is for some security reason, then both are the wrong solution.

    -gus
    Tom Bascom tom@wss.com via peg.com
    Jan 30
    to peg
    On 1/30/12 7:44 AM, Antanas Kompanas wrote:
    On 01/30/12 14:15, Stefan Houtzager wrote:
    What would be good reasons to abandon the use of sequences
    and replace them with guid's?
    The only good reason to use guid as primary key is when you somehow need
    to merge the data into one table - that is, when you essentially have a
    distributed database. Other that that, guid's are hard to read/write,
    need additional storage
    So far, so good...

    and take more time to generate.
    One should always test performance assertions

    I find that guid( myUUID ) is 30x-40x faster than next-value( someSequence )...
    None the less -- I like sequences in many cases.  Being able to get a quick estimate of how rapidly something is growing simply by watching the current-value can be quite handy.
    Gregory Higgins ablsaurusrex@gmail.com
    Jan 30
    to me, peg
    A sequence implies an ordering that may not be wanted. A guid can
    remove the implied ordering.

    On Mon, Jan 30, 2012 at 7:15 AM, Stefan Houtzager
    <stefan.houtzager@gmail.com> wrote:
    > Hi,
    >
    > OE 10.2B04/11. Using Java ZK framework as front-end. In new
    > applications I often see GUID's being used as unique identifiers for
    > records. What would be good reasons to abandon the use of sequences
    > and replace them with guid's?
    >

    --
    I'm all-in with PPA!
    http://theppa.org
    Gregory Higgins ablsaurusrex@gmail.com
    Jan 30
    to me, peg
    Also, a guid allows any number of sources to be a primary key generator.
    Scott Augé sauge@amduus.com via peg.com
    Jan 31
    to peg
    I agree with this.

    For stuff that will show up on web pages and other dangerous areas (don't trust those web services either!) identifying a record, I use:

    http://www.oehive.org/amduus/ServiceExpress/srvexp/src/MakeID2.i

    with the date and time postfixed.
    Scott Augé sauge@amduus.com via peg.com
    Jan 31
    to peg
    Yes, you are correct.   As noted in another email that just went out, I usually uniquely id a record going into the public with a very random string (with 30 or more characters on it).

    For really important records, the random string seen in public relates to a web state record with the real unique identifier on it tucked away in the database - thus the random string seen in public is throw away, something like:

    FIND WebState NO-LOCK
    WHERE WebState.UserCookie = GET-VALUE("AppCookie")
    AND WebState.WebLookUpData = GET-VALUE("ThrowAway")
    NO-ERROR.

    ...

    FIND RealRecord EXCLUSIVE-LOCK
    WHERE RealRecord.RID = WebState.DBLookUpData
    NO-ERROR.


    I was thinking maybe GUIDs can be used perhaps the same way, but the evidence is against it.
    Stefan Houtzager
    Feb 1
    to Gregory, peg, bcc: engineers, bcc: Lorens.Zhou, bcc: Richard.vanWuy.
    Thanks all!

    Another reason I can think of is that sequeces are specific for the
    progress db. If you want to use another datastore you have to use
    something else then next-value(<sequencename>). The GUID can be used
    for all datastores (except maybe one from before the second worldwar).
    Antanas Kompanas a.kompanas@baltic-amadeus.lt via peg.com
    Feb 1
    to peg
    On 02/01/12 10:47, Stefan Houtzager wrote:
    > Another reason I can think of is that sequeces are specific for the
    > progress db. If you want to use another datastore you have to use
    > something else then next-value(<sequencename>). The GUID can be used
    > for all datastores (except maybe one from before the second worldwar).

    I don't think sequences or some form of them (like serial or autonumber
    fields) are not available in other databases. If I had to guess you
    would find sequences in those DBs "from before the second worldwar",
    although that would not be so true about GUIDs. Mainly because they take
    relatively a lot of space to store.
    Stefan Houtzager
    Feb 1
    to Antanas
    Could be, but you would have to write separate code for that other
    datastore. And then there is the unmanaged datastore...
    Darrell Davis ddavis.gso@gmail.com via peg.com
    Feb 1
    to peg
    >> Another reason I can think of is that sequeces are specific for the
    >> progress db. If you want to use another datastore you have to use
    >> something else then next-value(<sequencename>). The GUID can be used
    >> for all datastores (except maybe one from before the second worldwar).
    >
    > I don't think sequences or some form of them (like serial or autonumber
    > fields) are not available in other databases. If I had to guess you
    > would find sequences in those DBs "from before the second worldwar",
    > although that would not be so true about GUIDs. Mainly because they take
    > relatively a lot of space to store.

    I believe that the he meant the sequence is going to be unique only in
    the progress db that is the source of the sequence.  If you are
    storing data in multiple db's, or the data is sourced from multiple
    sources, a guid is unique across disparate systems while a sequence is
    only unique if you get it from the source db first.

    Hope I said that clearly.

    Darrell Davis
    RHCE / System Analyst / Progress Enthusiast

    717/884-9231 office
    336/324-0598 cell
    Stefan Houtzager
    Feb 1
    to Darrell, peg
    That is said clearly to me at least and it was (implicitly) said by
    someone else before in this thread as a good argument for the use of
    the GUID, but it is not what I meant. I mean that NEXT-VALUE in your
    DA layer or databasetrigger means that you are tied to the progress
    type of db. But of course you could use another way of numbering for
    other datastores. For me the con of the space a guid needs does not
    make a heavy weight on the balance (that now leans to the GUID).
    Richard Kelters richard.kelters@gmail.com
    Feb 1
    to peg
    I believe from the DB point of view there's a penalty of using a character datatype instead of an integer when a field is used in an index.

    Richard

    Op 1-2-2012 16:24, Stefan Houtzager schreef:
    Gregory Higgins ablsaurusrex@gmail.com via peg.com
    Feb 1
    to peg
    Systems are fast, space is cheap. There is some extra overhead I
    supposed inserting a well constructed unique identifier into the
    database, maybe the index pages get used more, but let's be honest,
    it's probably less expensive than the appserver call you made to get
    the data to the data base.
    Antanas Kompanas a.kompanas@baltic-amadeus.lt via peg.com
    Feb 2
    to peg
    On 02/01/12 17:50, Gregory Higgins wrote:
    > Systems are fast, space is cheap.

    Every system has its limits. When every programmer and architect starts
    thinking this way one eventually ends up with a system that's neither
    fast, nor cheap.
    Scott Augé sauge@amduus.com via peg.com
    Feb 2
    to peg
    On Feb 2, 2012, at 4:06 AM, Antanas Kompanas wrote:

    > On 02/01/12 17:50, Gregory Higgins wrote:
    >> Systems are fast, space is cheap.
    >
    > Every system has its limits. When every programmer and architect starts
    > thinking this way one eventually ends up with a system that's neither
    > fast, nor cheap.
    >
    >

    We are not using 486's anymore either  ... but this is a good conversation to have.  Allow me to prattle on like someone who has stayed up all night (which I have.)

    I have certainly seen some bass ackward code written because it was "faster."  Often, it wasn't.  For those rare exceptions, it wasn't as fast as someone thought it would be.

    Then came those scenarios based on use - and often, it didn't matter to the user that the machine spent some more CPU cycles between times it sat there waiting for something to do.

    Actually, often users are interested in seeing the code actually do something (right) - because often having a computer do it is a multitude of times faster than a human doing it - and allows for longer coffee breaks.  Even if the code was "slow" (judged by us old 8 bit assembly programmers who smashed code and data into 16,000 bytes), it is still much faster for the users anyhow.

    What did happen, is that programmers spent overtime and thousands of dollars trying to figure out the "short cut" someone made because it pulled a few micro-seconds out of code execution, and being bass ackwards contained multiple defects on the boundaries, in re-use, or in extension of purpose.

    This of course, pulls in the second no-no of software development - delays.  What should have been "a few minutes to do" turns into a big long frustrating battle with sh!t code written by people with no concept of coupling or cohesion.  (Just speaking from experience - not saying your's is!)  Having experience with C programming, I can certainly remember where people got way out of their league with "faster code" (regardless of if it is "correct code.")  (Entertainment: http://www.ioccc.org/)  I have seen some really horrible "tuned code" in Progress apps also.

    So one spends thousands of dollars and people's time in development, testing, and deployment to shave 120 seconds a day off a program.  I can almost guarantee you the finance people, the people who pay for it, prefer "it just does something."  And if they found out they spent thousands of dollars for 120 seconds... ooooooo.

    Most of us figure out a way for mono-threaded Progress to open up a process behind the scenes (aka a thread-like library) to grind through something while still allowing the user to move onto the next thing.  (Aka my Zeno project: http://www.oehive.org/node/1008)  Because lets face it, a lot of times there is another core sitting there twiddling it's thumbs for something to do.

    Lets look at Facebook - PHP wasn't cutting it anymore.  They switched to automatic PHP -> C++ code conversion (https://developers.facebook.com/blog/post/358/) and compiled C++. (Hmmm, an ABL to C++ project?)

    Most places Progress is used (and I have used it in some pretty big companies with pretty big user counts) only wishes they had Facebook problems.

    Quite often, the solution is a faster computer and/or another computer in the pool (think AppServers/behind the scene processes) for a couple thousand bucks with clear, re-usable, and extensible code.

    I now step off the soap box waving my hands around. 


    Scott Augé
    President
    Amduus Information Works, Inc.
    http://www.amduus.com

    http://www.linkedin.com/in/scottauge
    Twitter: ScAuge

    Store Front: http://stores.lulu.com/store.php?fAcctID=2520351

    PLEASE CONSIDER OUR ENVIRONMENT BEFORE PRINTING THIS EMAIL
    Scott Augé sauge@amduus.com via peg.com
    Feb 2
    to peg
    LOL - I am a Dr. Pepper man - caffeine caffeine caffeine - and the bathroom breaks associated with that!

    Tonight I have been contemplating many things... 

    On Feb 2, 2012, at 5:15 AM, Marian Edu wrote:

    > Well said Scott, what have you been doing all night long and more important... what is that thing that you were using (smoking???)
    >
    > Greg should probably think of a way to add that 'like' button
    Gregory Higgins ablsaurusrex@gmail.com via peg.com
    Feb 2
    to peg
    Seriously? Limits? Have you been in the cloud lately? The most likely
    limitation to any system today is an inadequacy of imagination by its
    architect.
    Antanas Kompanas a.kompanas@baltic-amadeus.lt via peg.com
    Feb 2
    to peg
    On 02/02/12 12:49, Gregory Higgins wrote:
    > Seriously? Limits? Have you been in the cloud lately?

    Not everyone is happy enough to reach the clouds. Some of us here need
    to write apps for earth.

    Seriously, clound services are still too expensive (budget limit) and
    have the unfortunate property of being remote (internet availability,
    latency and throughput limit).
    Scott Augé sauge@amduus.com via peg.com
    Feb 2
    to peg
    Not to mention interesting legal principles and liabilities... especially if housed in a foreign country.
    Rick Terrell terrell.rick@gmail.com via peg.com
    Feb 2
    to peg
    Not to mention also, that some customers absolutely will not put their
    data elsewhere (like Law Enforcement Agencies)...

  • My primary objection against using technical keys is that when there is a functional key available (invoice number, debtor number, g/l account) that any table referencing this data must always join to referential table to be able to show or sort on anything meaningful. Imagine a g/l transaction table with debtor numbers and invoice numbers. The query for showing the results sorted by debtor by invoice number is simply:

    FOR EACH gltrans

       WHERE ...

    BY debtor BY invoice_nr:

       ...

    END.

    When using technical keys this becomes:

    FOR EACH gltrans

       WHERE ...

    , debtor

       WHERE debtor.guid = gltrans.debtor_guid

    , invoice

       WHERE invoice.guid = gltrans.invoice_guid

    BY debtor.debtor_no BY invoice.invoice_no:

       ...

    END.

    I get the impression that result set based databases (SQL Server / Oracle) are very good at doing this. I am not so sure about ABL.

  • Well, yes, but the minute you want the name or default sign or anything else out of the account, then you have to do the join anyway.  The only time you can avoid the join is when the simple account number is all you want and you need no more information.

    One of the big attractions of *not* using functional keys is that it is much more forgiving about mistakes.    Assign the wrong customer number, create a few dozen transactions, and then fixing your mistake becomes "exciting" if you have used the customer number as the link.  But, used a GUID as a link, then one can just change the number and all is well.

    The big attraction of GUID over sequence is that GUID is unique to the record across all tables.  Sequence is not.  To be sure, there is no sensible link in which one would confuse the value across tables, but using GUID it is only going to be a link if it is a valid link.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • tamhas wrote:

    Well, yes, but the minute you want the name or default sign or anything else out of the account, then you have to do the join anyway.  The only time you can avoid the join is when the simple account number is all you want and you need no more information.

    I mentioned two issues. For referential data, yes a join is necessary - and I have no problems with that. The problem I do have is with sorting.

    tamhas wrote:

    One of the big attractions of *not* using functional keys is that it is much more forgiving about mistakes.    Assign the wrong customer number, create a few dozen transactions, and then fixing your mistake becomes "exciting" if you have used the customer number as the link.  But, used a GUID as a link, then one can just change the number and all is well.

    True, but once again when sorting it gets messy. It becomes a balancing act between how often do you need to report and how often do you need to fix mistakes.

    For referntial links I am completely fine and can admire the virtue of a technical key. I am however very worried about the performance impact on sorting and selection. In the one key sort you can take your referential data to the outside part of the query and inner join to your transaction table, but sort on two or more referential technical keys and you are stuck with having to scan through all your transactions to get transactions on debtors between 1000 and 4000 and g/l accounts between 8000 and 8070 (for example).

    If this concern can be taken away - I would be happy to hear it!

    Three of our corporate products, which are SQL Server only, do all this guid joining. But it's SQL Server. It's a result set based database, not a record based database.

    Sure, we need complex recoding logic. But recoding is the exception.

  • Sorting and selection are partly a question of how one roles it... 

    For starters, one might wonder in what context is a range of account numbers a genuinely interesting thing.

    And, of course, there is the direction of the join.   If one wants all transactions within a certain account number range, as one might doing the general ledger, for example, one *could* start with the transactions and that would work efficiently only if the account number itself was in the transaction record.  But, in a general ledger, one *is* going to want to display things like the account name normal CR/DR etc. so one is going to need the join anyway.  Knowing that one needs the join, one can select based on the account table and join efficiently to the transaction table on the GUID.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • tamhas wrote:

    Sorting and selection are partly a question of how one roles it... 

    For starters, one might wonder in what context is a range of account numbers a genuinely interesting thing.

    Up till know our customers think multiple selections are genuinely interesting, so I do not need to wonder about this. Please show me all g/l transactions on g/l accounts x thru y, project a thru c, cost center e and f, cost categories a and b.

    tamhas wrote:

    And, of course, there is the direction of the join.   If one wants all transactions within a certain account number range, as one might doing the general ledger, for example, one *could* start with the transactions and that would work efficiently only if the account number itself was in the transaction record.  But, in a general ledger, one *is* going to want to display things like the account name normal CR/DR etc. so one is going to need the join anyway.  Knowing that one needs the join, one can select based on the account table and join efficiently to the transaction table on the GUID.

    Which is fine for one selection / sorting criterium as I mentioned in my previous post, but fails miserably for multiple selection / sorting criteria. There is a big difference in joining to simply get additional referential data and sorting on primary data.

  • Given the context of reporting such as you describe, it is hard for me to imagine how one avoids the join.  If one has the join, it is a problem in query optimization.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • I find the possibility to assign the PUK on the client (java/.net/progress) a big plus too. But I would be interested in some performanceconsiderations, did anyone of you do some tests?

  • FWIW: We talked about a GUID as part of the Progress Exchange 2014 "InfoExchange" session.  Possible outcome could be a Guid datatype (16 byte binary) to optimize storage.  Currently we store a guid in a character string which would be 32bytes (with the dashes removed).  There was talk about sequential GUIDs also to optimize using as a database key field.

    I'm going to create a Progress "Idea" Enhancement so feel free to vote/comment on that thread.

    community.progress.com/.../abl_datatype_-_guid.aspx

  • The time I had a look I found the sequential GUIDs most interesting. For those new to the matter: www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database

  • Consider this as an attempt to have a serious discussion with Mr. Houtzager.:
     
    Have you tried to implement this in ABL already in your framework? How is the runtime of the GUID generator compared to the plain GUID function?

     

    Architect of the SmartComponent Library and WinKit

    Consultingwerk Ltd.

  • Mike,

    I leave it to others to see your attempt as serious or not (in case not they might not dare to say so). I for myself think you're far too smart to ask this seriously. You try to test my knowledge I think. That's ok, but I think what I can answer (clustered indexes do not exist in the openedge db, some cut and paste comments of others like "Is the slower creation a concern?  Personally, I find it acceptable.  Unless your application involves very frequent inserts (in which case a GUID key may not be ideal for other reasons), the cost of occasional GUID creation will pale in comparison to the benefits of faster database operations." would not add much to the discussions that have taken place already on the peg and at exchange (I assume). Besides I'm not a db guru. You better discuss this with f.e. Gus and Tom.

  • Believe it or not – I am really curious to see it this has real benefit.
    And anybody following the ABL documentation should be able to use the C# solution from within the ABL – hence my interest if it was tried already.

    Architect of the SmartComponent Library and WinKit

    Consultingwerk Ltd.

  • According to what is written in the link I sent: www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database you can expect the comb guid creation to be a bit slower than the plain guid as we have it now. The clustered indexes would be needed for the real benefits: faster reads. I'll copy a discussion from the peg, you can find more info there. And maybe some who involved in the discussion at exchange want to explain more.

    Re: clustered indexesImportant mainly because of your interaction with messages in the conversation.

    Click to teach Gmail this conversation is not important.

    pegx

    Tom BascomOn 6/4/14, 2:39 AM, Stefan Houtzager wrote: > > Has the possibity to mark an ...

    Jun 4

    On 6/4/14, 2:39 AM, Stefan Houtzager wrote: > > Has the possibity to mark an ...

    Tom Bascom  

    Jun 4

    to peg

    On 6/4/14, 2:39 AM, Stefan Houtzager wrote:

    Has the possibity to mark an index as clustered in a progress db been

    considered bij psc by the way?

    A clustered index is one where the data is arranged in index order on disk.  (The name strikes me as misleading -- "clustered data" seems more appropriate...)

    There can only be one such index per table because the data can only be in one "physical" order.

    When reading data in the clustered index order you can mostly skip reading the index -- so it is much faster (sort of like the oe11 TABLE-SCAN option except that the data is ordered).

    But when writing it can be *very* expensive -- if you insert a record in the middle everything above it has to be shifted.

    Personally I don't see the feature as being very attractive.  I suspect that it gets a lot of attention because in the Microsoft world they get created automatically and they are a way to do fast table scans.  So it is "received wisdom" that clustered indexes are an important feature.  Somewhat like the idea that fields should have a  known width that is never exceeded ;)

    The ordering part is, maybe, occasionally useful but the write-performance trade-off seems far too high to me.  I also suspect that a great many table scans either don't care about the ordering or will need a sort order different from such a key.  My understanding is that with MS SQL these are often found in association with the "primary" key which in most Progress databases that I have seen is probably not the order that you are likely to want data in.  Lastly -- if a GUID is used for the key to cluster around it would likely be an even less useful sort order.  Even if the GUID were to be a monotonically increasing sort order the replication/merge use-case, which IMHO is the strongest reason to consider GUIDs as keys, makes the order attribute basically useless.

    One Progress-specific issue that immediately springs to mind when thinking about Progress is that RECIDs would become volatile even within a session (not just across dump & load etc).  You could read a record, someone could insert another record in the table, and your RECID would no longer point to what it used to point to.

    --

    Tom Bascom

    Reply Reply to all Forward

    Rob FitzpatrickFollowing that train of thought re volatile recids, if an insert of one recor...

    Jun 4

    Following that train of thought re volatile recids, if an insert of one recor...

    Rob FitzpatrickJun 4

    Loading...

    Rob Fitzpatrick  

    Jun 4

    to peg

    Following that train of thought re volatile recids, if an insert of one

    record caused a change in the recids of all the following records (in

    clustered index order) then wouldn't any index entries for those records

    in other indexes on that table also require the updated recids?  If so I

    wonder how SQL Server deals with that.  Or are their data structures and

    physical storage layout so different from Progress' that the question

    isn't meaningful?

    -----Original Message-----

    From: Tom Bascom

    One Progress-specific issue that immediately springs to mind when

    thinking about Progress is that RECIDs would become volatile even within

    a session (not just across dump & load etc).  You could read a record,

    someone could insert another record in the table, and your RECID would

    no longer point to what it used to point to.

    Reply Reply to all Forward

    Gus Bjorklundwhat makes you guys think that everything else moves when you add a row to a ...

    Jun 4

    what makes you guys think that everything else moves when you add a row to a ...

    Gus BjorklundJun 4

    Loading...

    Gus Bjorklund

    Jun 4

    to peg

    On 6/4/14 9:17 AM, "Rob Fitzpatrick"  wrote:

    >Following that train of thought re volatile recids, if an insert of one

    >record caused a change in the recids of all the following records (in

    >clustered index order) then wouldn't any index entries for those records

    >in other indexes on that table also require the updated recids?  If so I

    >wonder how SQL Server deals with that.  Or are their data structures and

    >physical storage layout so different from Progress' that the question

    >isn't meaningful?

    >

    what makes you guys think that everything else moves when you add a row to

    a table that has a clustered index ?

    -gus

    Reply Reply to all Forward

    12 older messagesTim KuehnFrom the definition - "data is in index order on disk"

    Jun 4

    From the definition - "data is in index order on disk"

    Tim KuehnJun 4

    Loading...

    Tim Kuehn

    Jun 4

    to peg

    On Wed, Jun 4, 2014 at 9:45 AM, Gus Bjorklund wrote:

    > what makes you guys think that everything else moves when you add a row to

    > a table that has a clustered index ?

    From the definition - "data is in index order on disk"

    --

    Tim Kuehn:  Senior Consultant  - TDK Consulting Services

    Ontario PUG President

    PUG Challenge Americas Executive, Program Committee Chair

    Reply Reply to all Forward

    Rob FitzpatrickGus, I don't know how clustered indexes work. That's why I wrote my question ...

    Jun 4

    Gus, I don't know how clustered indexes work. That's why I wrote my question ...

    Rob FitzpatrickJun 4

    Loading...

    Rob Fitzpatrick  

    Jun 4

    to peg

    Gus,

    I don't know how clustered indexes work.  That's why I wrote my question

    in conditional form.  If the condition is false then my question is moot

    and can be ignored.

    -----Original Message-----

    what makes you guys think that everything else moves when you add a row

    to a table that has a clustered index ?

    -gus

    Reply Reply to all Forward

    DavidMSSQL will leave some gaps, especially when using something like a guid. I gu...

    Jun 4

    MSSQL will leave some gaps, especially when using something like a guid. I gu...

    DavidJun 4

    Loading...

    DavidMSSQL will leave some gaps, especially when using something like a guid. I gu...

    Jun 4

    MSSQL will leave some gaps, especially when using something like a guid. I gu...

    David

    Jun 4

    to peg

    On 6/4/2014 8:51 AM, Rob Fitzpatrick wrote:

    Gus,

    I don't know how clustered indexes work.  That's why I wrote my question

    in conditional form.  If the condition is false then my question is moot

    and can be ignored.

    -----Original Message-----

    what makes you guys think that everything else moves when you add a row

    to a table that has a clustered index ?

    -gus

    MSSQL will leave some gaps, especially when using something like a guid.   I guess its similar to create/toss limits in Progress.  Guids are a poor choice for a clustered key since they are effectively random.   Also, the clustered key becomes the implied last component of all other indexes.  The ideal clustered key will be a monotonically increasing integer (eg. an identity column or a sequence assigned column)

    Reply Reply to all Forward

    Tom BascomBecause that's what the Wiki that I found for "clustered indexes" says happen...

    Jun 4

    Because that's what the Wiki that I found for "clustered indexes" says happen...

    Tom BascomJun 4

    Loading...

    Tom Bascom  

    Jun 4

    to peg

    On 6/4/14, 10:12 AM, David wrote:

    On 6/4/2014 8:51 AM, Rob Fitzpatrick wrote:

    Gus,

    I don't know how clustered indexes work.  That's why I wrote my question

    in conditional form.  If the condition is false then my question is moot

    and can be ignored.

    -----Original Message-----

    what makes you guys think that everything else moves when you add a row

    to a table that has a clustered index ?

    Because that's what the Wiki that I found for "clustered indexes" says happens.  Various other material says the same thing.

    It passes the sniff test because of the "stored in order on disk" characteristic.

    I suppose it would not be a problem if records are always appended but I did say "insert a record in the middle".

    -gus

    MSSQL will leave some gaps, especially when using something like a guid.   I guess its similar to create/toss limits in Progress. Guids are a poor choice for a clustered key since they are effectively random.   Also, the clustered key becomes the implied last component of all other indexes.  The ideal clustered key will be a monotonically increasing integer (eg. an identity column or a sequence assigned column)

    So maybe it doesn't *always* cause everything to move but eventually gaps will get filled and when they do it is going to be really painful.

    Sounds like a bad idea to me.

    --

    Tom Bascom

    Reply Reply to all Forward

    DavidLogically, the rows are ordered, but not necessarily physically. That is why ...

    Jun 4

    Logically, the rows are ordered, but not necessarily physically. That is why ...

    DavidJun 4

    Loading...

    David  

    Jun 4

    to peg

    Logically, the rows are ordered, but not necessarily physically.  That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL.

    On 6/4/2014 9:44 AM, Tom Bascom wrote:

    On 6/4/14, 10:12 AM, David wrote:

    On 6/4/2014 8:51 AM, Rob Fitzpatrick wrote:

    Gus,

    I don't know how clustered indexes work.  That's why I wrote my question

    in conditional form.  If the condition is false then my question is moot

    and can be ignored.

    -----Original Message-----

    what makes you guys think that everything else moves when you add a row

    to a table that has a clustered index ?

    Because that's what the Wiki that I found for "clustered indexes" says happens.  Various other material says the same thing.

    It passes the sniff test because of the "stored in order on disk" characteristic.

    I suppose it would not be a problem if records are always appended but I did say "insert a record in the middle".

    -gus

    MSSQL will leave some gaps, especially when using something like a guid.   I guess its similar to create/toss limits in Progress. Guids are a poor choice for a clustered key since they are effectively random.   Also, the clustered key becomes the implied last component of all other indexes.  The ideal clustered key will be a monotonically increasing integer (eg. an identity column or a sequence assigned column)

    So maybe it doesn't *always* cause everything to move but eventually gaps will get filled and when they do it is going to be really painful.

    Sounds like a bad idea to me.

    Reply Reply to all Forward

    Thomas Mercer-Hursh, Ph.D.Is it, perhaps, the case that clustered index is used only on table which are...

    Jun 4

    Is it, perhaps, the case that clustered index is used only on table which are...

    Tom BascomSorry, you lost me. I would have thought that the index is the "logical" part...

    Jun 4

    Sorry, you lost me. I would have thought that the index is the "logical" part...

    Tom BascomJun 4

    Loading...

    Tom Bascom  

    Jun 4

    to peg

    On 6/4/14, 10:59 AM, David wrote:

    Logically, the rows are ordered, but not necessarily physically.  That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL.

    Sorry, you lost me.  I would have thought that the index is the "logical" part of things and that the ordering in the file is the "on disk" or "physical" portion.

    Can you elaborate?

    --

    Tom Bascom

    Reply Reply to all Forward

    DavidI can't recall the specifics, but there is some kind of pointer left where th...

    Jun 4

    I can't recall the specifics, but there is some kind of pointer left where th...

    DavidJun 4

    Loading...

    David

    Jun 4

    to peg

    On 6/4/2014 10:07 AM, Tom Bascom wrote:

    On 6/4/14, 10:59 AM, David wrote:

    Logically, the rows are ordered, but not necessarily physically.  That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL.

    Sorry, you lost me.  I would have thought that the index is the "logical" part of things and that the ordering in the file is the "on disk" or "physical" portion.

    Can you elaborate?

    I can't recall the specifics, but there is some kind of pointer left where the new middle row would logically be.  Like a "jump to this block".

    Reply Reply to all Forward

    Gus Bjorklundthere are a variety of ways to implement storage for clustered index tables. ...

    Jun 4

    there are a variety of ways to implement storage for clustered index tables. ...

    Gus BjorklundJun 4

    Loading...

    Gus Bjorklund

    Jun 4

    to peg

    On 6/4/14 11:07 AM, "Tom Bascom"  wrote:

    >On 6/4/14, 10:59 AM, David wrote:

    >> Logically, the rows are ordered, but not necessarily physically.  That

    >> is why there is a need to periodically do a rebuild/reorg of the

    >> tables in MSSQL.

    >

    >Sorry, you lost me.  I would have thought that the index is the

    >"logical" part of things and that the ordering in the file is the "on

    >disk" or "physical" portion.

    >

    >Can you elaborate?

    >

    >

    there are a variety of ways to implement storage for clustered index

    tables. and for handling inserts and page splits. you don't need to

    reorganize the whole table for every insert. that would  negate whatever

    advantage you might get from the clustering.

    note that "physical order" does not have much meaning with modern disk and

    disk arrays. nobody stores things in disk sector order anymore. clustered

    indexes serve to reduce the I/O overhead but don't have to be perfectly

    optimized. good enough is good enough.

    regards, gus

    "Culture eats strategy for breakfast." -- Peter Drucker

    Reply Reply to all Forward

    Tom BascomThat would seem to defeat the whole "in order on disk" assumption that the be...

    Jun 4

    That would seem to defeat the whole "in order on disk" assumption that the be...

    Tom BascomJun 4

    Loading...

    Tom Bascom  

    Jun 4

    to peg

    On 6/4/14, 11:13 AM, David wrote:

    On 6/4/2014 10:07 AM, Tom Bascom wrote:

    On 6/4/14, 10:59 AM, David wrote:

    Logically, the rows are ordered, but not necessarily physically.  That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL.

    Sorry, you lost me.  I would have thought that the index is the "logical" part of things and that the ordering in the file is the "on disk" or "physical" portion.

    Can you elaborate?

    I can't recall the specifics, but there is some kind of pointer left where the new middle row would logically be.  Like a "jump to this block".

    That would seem to defeat the whole "in order on disk" assumption that the benefits of clustered indexes are supposedly based on.

    I guess that if it is a rare event that it might not be too horrible?

    --

    Tom Bascom

    Reply Reply to all Forward

    DavidYou have some control over how rare. If you have an ascending numeric key, yo...

    Jun 4

    You have some control over how rare. If you have an ascending numeric key, yo...

    DavidJun 4

    Loading...

    David

    Jun 4

    to peg

    On 6/4/2014 10:20 AM, Tom Bascom wrote:

    On 6/4/14, 11:13 AM, David wrote:

    On 6/4/2014 10:07 AM, Tom Bascom wrote:

    On 6/4/14, 10:59 AM, David wrote:

    Logically, the rows are ordered, but not necessarily physically.  That is why there is a need to periodically do a rebuild/reorg of the tables in MSSQL.

    Sorry, you lost me.  I would have thought that the index is the "logical" part of things and that the ordering in the file is the "on disk" or "physical" portion.

    Can you elaborate?

    I can't recall the specifics, but there is some kind of pointer left where the new middle row would logically be.  Like a "jump to this block".

    That would seem to defeat the whole "in order on disk" assumption that the benefits of clustered indexes are supposedly based on.

    I guess that if it is a rare event that it might not be too horrible?

    You have some control over how rare.  If you have an ascending numeric key, you can pack rows pretty tightly.  If you have something like a GUID, then you'd set the "fill %" accordingly when you rebuild the index.  This underscores why an plain GUID for a clustered key is not a great idea, since you have to have your fill % lower than other alternatives.  The article mentioned earlier in the thread describes a way around this by manipulating the GUID to be ordered of sorts.

    Reply Reply to all Forward

    Stefan Houtzager

    Jun 5

    to David, peg

    Seems like important content in the article I sent (

    www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database)

    is forgotten by some in this thread. "This article outlines an

    approach for using GUID values as primary keys/clustered indexes that

    avoids most of the normal disadvantages, adapting the COMB model for

    sequential GUIDs".

     So it is not about the normal guid with a clustered index.

    On Wed, Jun 4, 2014 at 5:44 PM, David wrote:

    > On 6/4/2014 10:20 AM, Tom Bascom wrote:

    >>

    >> On 6/4/14, 11:13 AM, David wrote:

    >>>

    >>> On 6/4/2014 10:07 AM, Tom Bascom wrote:

    >>>>

    >>>> On 6/4/14, 10:59 AM, David wrote:

    >>>>>

    >>>>> Logically, the rows are ordered, but not necessarily physically.  That

    >>>>> is why there is a need to periodically do a rebuild/reorg of the tables in

    >>>>> MSSQL.

    >>>>

    >>>>

    >>>> Sorry, you lost me.  I would have thought that the index is the

    >>>> "logical" part of things and that the ordering in the file is the "on disk"

    >>>> or "physical" portion.

    >>>>

    >>>> Can you elaborate?

    >>>>

    >>> I can't recall the specifics, but there is some kind of pointer left

    >>> where the new middle row would logically be.  Like a "jump to this block".

    >>

    >>

    >> That would seem to defeat the whole "in order on disk" assumption that the

    >> benefits of clustered indexes are supposedly based on.

    >>

    >> I guess that if it is a rare event that it might not be too horrible?

    >>

    > You have some control over how rare.  If you have an ascending numeric key,

    > you can pack rows pretty tightly.  If you have something like a GUID, then

    > you'd set the "fill %" accordingly when you rebuild the index.  This

    > underscores why an plain GUID for a clustered key is not a great idea, since

    > you have to have your fill % lower than other alternatives.  The article

    > mentioned earlier in the thread describes a way around this by manipulating

    > the GUID to be ordered of sorts.

    >

    --

    Kind regards,

    Stefan Houtzager

    Houtzager ICT consultancy & development

    www.linkedin.com/in/stefanhoutzager

    Reply Reply to all Forward

    Tom Bascom  

    Jun 5

    to me, peg

    We didn't forget that.  It has been mentioned several times.  I changed the subject text though because topic of clustered indexes is interesting on its own and comes up once in a while and I wanted to talk about it more generally - not necessarily in relation to guids.

    --

    Tom Bascom

    > On Jun 5, 2014, at 3:05 AM, Stefan Houtzager wrote:

    >

    > Seems like important content in the article I sent (

    > www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database)

    > is forgotten by some in this thread. "This article outlines an

    > approach for using GUID values as primary keys/clustered indexes that

    > avoids most of the normal disadvantages, adapting the COMB model for

    > sequential GUIDs".

    >  So it is not about the normal guid with a clustered index.

    >

    >

    >> On Wed, Jun 4, 2014 at 5:44 PM, David wrote:

    >>> On 6/4/2014 10:20 AM, Tom Bascom wrote:

    >>>

    >>>> On 6/4/14, 11:13 AM, David wrote:

    >>>>

    >>>>> On 6/4/2014 10:07 AM, Tom Bascom wrote:

    >>>>>

    >>>>>> On 6/4/14, 10:59 AM, David wrote:

    >>>>>>

    >>>>>> Logically, the rows are ordered, but not necessarily physically.  That

    >>>>>> is why there is a need to periodically do a rebuild/reorg of the tables in

    >>>>>> MSSQL.

    >>>>>

    >>>>>

    >>>>> Sorry, you lost me.  I would have thought that the index is the

    >>>>> "logical" part of things and that the ordering in the file is the "on disk"

    >>>>> or "physical" portion.

    >>>>>

    >>>>> Can you elaborate?

    >>>> I can't recall the specifics, but there is some kind of pointer left

    >>>> where the new middle row would logically be.  Like a "jump to this block".

    >>>

    >>>

    >>> That would seem to defeat the whole "in order on disk" assumption that the

    >>> benefits of clustered indexes are supposedly based on.

    >>>

    >>> I guess that if it is a rare event that it might not be too horrible?

    >> You have some control over how rare.  If you have an ascending numeric key,

    >> you can pack rows pretty tightly.  If you have something like a GUID, then

    >> you'd set the "fill %" accordingly when you rebuild the index.  This

    >> underscores why an plain GUID for a clustered key is not a great idea, since

    >> you have to have your fill % lower than other alternatives.  The article

    >> mentioned earlier in the thread describes a way around this by manipulating

    >> the GUID to be ordered of sorts.

    >

    >

    >

    > --

    > Kind regards,

    >

    > Stefan Houtzager

    >

    > Houtzager ICT consultancy & development

    >

    > www.linkedin.com/in/stefanhoutzager

    >

    >

    Reply Reply to all Forward

    Stefan Houtzager

    Jun 5

    to Tom, peg

    Ah, dummy me. David wrote "The article mentioned earlier in the thread

    describes a way around this by manipulating the GUID to be ordered of

    sorts." indeed.

    On Thu, Jun 5, 2014 at 12:53 PM, Tom Bascom  wrote:

    > We didn't forget that.  It has been mentioned several times.  I changed the subject text though because topic of clustered indexes is interesting on its own and comes up once in a while and I wanted to talk about it more generally - not necessarily in relation to guids.

    >

    > --

    > Tom Bascom

    >

    >> On Jun 5, 2014, at 3:05 AM, Stefan Houtzager wrote:

    >>

    >> Seems like important content in the article I sent (

    >> www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database)

    >> is forgotten by some in this thread. "This article outlines an

    >> approach for using GUID values as primary keys/clustered indexes that

    >> avoids most of the normal disadvantages, adapting the COMB model for

    >> sequential GUIDs".

    >>  So it is not about the normal guid with a clustered index.

    >>

    >>

    >>> On Wed, Jun 4, 2014 at 5:44 PM, David wrote:

    >>>> On 6/4/2014 10:20 AM, Tom Bascom wrote:

    >>>>

    >>>>> On 6/4/14, 11:13 AM, David wrote:

    >>>>>

    >>>>>> On 6/4/2014 10:07 AM, Tom Bascom wrote:

    >>>>>>

    >>>>>>> On 6/4/14, 10:59 AM, David wrote:

    >>>>>>>

    >>>>>>> Logically, the rows are ordered, but not necessarily physically.  That

    >>>>>>> is why there is a need to periodically do a rebuild/reorg of the tables in

    >>>>>>> MSSQL.

    >>>>>>

    >>>>>>

    >>>>>> Sorry, you lost me.  I would have thought that the index is the

    >>>>>> "logical" part of things and that the ordering in the file is the "on disk"

    >>>>>> or "physical" portion.

    >>>>>>

    >>>>>> Can you elaborate?

    >>>>> I can't recall the specifics, but there is some kind of pointer left

    >>>>> where the new middle row would logically be.  Like a "jump to this block".

    >>>>

    >>>>

    >>>> That would seem to defeat the whole "in order on disk" assumption that the

    >>>> benefits of clustered indexes are supposedly based on.

    >>>>

    >>>> I guess that if it is a rare event that it might not be too horrible?

    >>> You have some control over how rare.  If you have an ascending numeric key,

    >>> you can pack rows pretty tightly.  If you have something like a GUID, then

    >>> you'd set the "fill %" accordingly when you rebuild the index.  This

    >>> underscores why an plain GUID for a clustered key is not a great idea, since

    >>> you have to have your fill % lower than other alternatives.  The article

    >>> mentioned earlier in the thread describes a way around this by manipulating

    >>> the GUID to be ordered of sorts.

    >>

    >>

    >>

    >> --

    >> Kind regards,

    >>

    >> Stefan Houtzager

    >>

    >> Houtzager ICT consultancy & development

    >>

    >> www.linkedin.com/in/stefanhoutzager

    >>

    >>

    --

    Kind regards,

    Stefan Houtzager

    Houtzager ICT consultancy & development

    www.linkedin.com/in/stefanhoutzager

    Reply Reply to all Forward

    Marian Edu

    Jun 5

    to me, peg

    have found this video to be quite helpful on explaining the differences - www.youtube.com/watch

    main idea, for a clustered index all record data is available (leaf index structure = table structure) while for non-clustered if values from non-index columns are needed the rowid will be used to get the record from table structure... however I don't think this should require reshuffling whole records when adding an 'out of order' record, meaning that might be one implementation but it's definitively a bad one :)

    didn't actually get the idea of the article though, depending on how rows gets stored in the heap using a sorted or random value for the field of clustered index (pk defaults or is mandatory clustered) the time required to insert a record might be higher but that is probably because of the algorithm the engine uses to find the right table structure to put it in and not because it has to rearrange all existing records in some cases.

    and beside, imho the performance impact of using guid (string) as pk is more on the read end than on insert/create one

    On 06/05/2014 02:09 PM, Stefan Houtzager wrote:

    Ah, dummy me. David wrote "The article mentioned earlier in the thread

    describes a way around this by manipulating the GUID to be ordered of

    sorts." indeed.

    On Thu, Jun 5, 2014 at 12:53 PM, Tom Bascom  wrote:

    We didn't forget that.  It has been mentioned several times.  I changed the subject text though because topic of clustered indexes is interesting on its own and comes up once in a while and I wanted to talk about it more generally - not necessarily in relation to guids.

    --

    Tom Bascom

    On Jun 5, 2014, at 3:05 AM, Stefan Houtzager wrote:

    Seems like important content in the article I sent (

    www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database)

    is forgotten by some in this thread. "This article outlines an

    approach for using GUID values as primary keys/clustered indexes that

    avoids most of the normal disadvantages, adapting the COMB model for

    sequential GUIDs".

     So it is not about the normal guid with a clustered index.

    On Wed, Jun 4, 2014 at 5:44 PM, David wrote:

    On 6/4/2014 10:20 AM, Tom Bascom wrote:

    On 6/4/14, 11:13 AM, David wrote:

    On 6/4/2014 10:07 AM, Tom Bascom wrote:

    On 6/4/14, 10:59 AM, David wrote:

    Logically, the rows are ordered, but not necessarily physically.  That

    is why there is a need to periodically do a rebuild/reorg of the tables in

    MSSQL.

    Sorry, you lost me.  I would have thought that the index is the

    "logical" part of things and that the ordering in the file is the "on disk"

    or "physical" portion.

    Can you elaborate?

    I can't recall the specifics, but there is some kind of pointer left

    where the new middle row would logically be.  Like a "jump to this block".

    That would seem to defeat the whole "in order on disk" assumption that the

    benefits of clustered indexes are supposedly based on.

    I guess that if it is a rare event that it might not be too horrible?

    You have some control over how rare.  If you have an ascending numeric key,

    you can pack rows pretty tightly.  If you have something like a GUID, then

    you'd set the "fill %" accordingly when you rebuild the index.  This

    underscores why an plain GUID for a clustered key is not a great idea, since

    you have to have your fill % lower than other alternatives.  The article

    mentioned earlier in the thread describes a way around this by manipulating

    the GUID to be ordered of sorts.

    --

    Kind regards,

    Stefan Houtzager

    Houtzager ICT consultancy & development

    www.linkedin.com/in/stefanhoutzager

    --

    m.edu

    keep it simple

    http://www.ganimede.ro

    ro.linkedin.com/.../marianedu

    Reply Reply to all Forward

    Stefan Houtzager

    Jun 5

    to Marian, peg

    > the performance impact of using guid (string) as pk is more on the read end than on insert/create one

    And that's where the 'comb guid' with the clustered index comes in

    (see the article).

    On Thu, Jun 5, 2014 at 2:38 PM, Marian Edu wrote:

    > have found this video to be quite helpful on explaining the differences -

    > www.youtube.com/watch

    >

    > main idea, for a clustered index all record data is available (leaf index

    > structure = table structure) while for non-clustered if values from

    > non-index columns are needed the rowid will be used to get the record from

    > table structure... however I don't think this should require reshuffling

    > whole records when adding an 'out of order' record, meaning that might be

    > one implementation but it's definitively a bad one :)

    >

    > didn't actually get the idea of the article though, depending on how rows

    > gets stored in the heap using a sorted or random value for the field of

    > clustered index (pk defaults or is mandatory clustered) the time required to

    > insert a record might be higher but that is probably because of the

    > algorithm the engine uses to find the right table structure to put it in and

    > not because it has to rearrange all existing records in some cases.

    >

    > and beside, imho the performance impact of using guid (string) as pk is more

    > on the read end than on insert/create one

    >

    >

    > On 06/05/2014 02:09 PM, Stefan Houtzager wrote:

    >>

    >> Ah, dummy me. David wrote "The article mentioned earlier in the thread

    >> describes a way around this by manipulating the GUID to be ordered of

    >> sorts." indeed.

    >>

    >> On Thu, Jun 5, 2014 at 12:53 PM, Tom Bascom  wrote:

    >>>

    >>> We didn't forget that.  It has been mentioned several times.  I changed

    >>> the subject text though because topic of clustered indexes is interesting on

    >>> its own and comes up once in a while and I wanted to talk about it more

    >>> generally - not necessarily in relation to guids.

    >>>

    >>> --

    >>> Tom Bascom

    >>>

    >>>> On Jun 5, 2014, at 3:05 AM, Stefan Houtzager

    >>>> wrote:

    >>>>

    >>>> Seems like important content in the article I sent (

    >>>>

    >>>> www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database)

    >>>> is forgotten by some in this thread. "This article outlines an

    >>>> approach for using GUID values as primary keys/clustered indexes that

    >>>> avoids most of the normal disadvantages, adapting the COMB model for

    >>>> sequential GUIDs".

    >>>>   So it is not about the normal guid with a clustered index.

    >>>>

    >>>>

    >>>>> On Wed, Jun 4, 2014 at 5:44 PM, David wrote:

    >>>>>>

    >>>>>> On 6/4/2014 10:20 AM, Tom Bascom wrote:

    >>>>>>

    >>>>>>> On 6/4/14, 11:13 AM, David wrote:

    >>>>>>>

    >>>>>>>> On 6/4/2014 10:07 AM, Tom Bascom wrote:

    >>>>>>>>

    >>>>>>>>> On 6/4/14, 10:59 AM, David wrote:

    >>>>>>>>>

    >>>>>>>>> Logically, the rows are ordered, but not necessarily physically.

    >>>>>>>>> That

    >>>>>>>>> is why there is a need to periodically do a rebuild/reorg of the

    >>>>>>>>> tables in

    >>>>>>>>> MSSQL.

    >>>>>>>>

    >>>>>>>>

    >>>>>>>> Sorry, you lost me.  I would have thought that the index is the

    >>>>>>>> "logical" part of things and that the ordering in the file is the

    >>>>>>>> "on disk"

    >>>>>>>> or "physical" portion.

    >>>>>>>>

    >>>>>>>> Can you elaborate?

    >>>>>>>

    >>>>>>> I can't recall the specifics, but there is some kind of pointer left

    >>>>>>> where the new middle row would logically be.  Like a "jump to this

    >>>>>>> block".

    >>>>>>

    >>>>>>

    >>>>>> That would seem to defeat the whole "in order on disk" assumption that

    >>>>>> the

    >>>>>> benefits of clustered indexes are supposedly based on.

    >>>>>>

    >>>>>> I guess that if it is a rare event that it might not be too horrible?

    >>>>>

    >>>>> You have some control over how rare.  If you have an ascending numeric

    >>>>> key,

    >>>>> you can pack rows pretty tightly.  If you have something like a GUID,

    >>>>> then

    >>>>> you'd set the "fill %" accordingly when you rebuild the index.  This

    >>>>> underscores why an plain GUID for a clustered key is not a great idea,

    >>>>> since

    >>>>> you have to have your fill % lower than other alternatives.  The

    >>>>> article

    >>>>> mentioned earlier in the thread describes a way around this by

    >>>>> manipulating

    >>>>> the GUID to be ordered of sorts.

    >>>>

    >>>>

    >>>>

    >>>> --

    >>>> Kind regards,

    >>>>

    >>>> Stefan Houtzager

    >>>>

    >>>> Houtzager ICT consultancy & development

    >>>>

    >>>> www.linkedin.com/in/stefanhoutzager

    >>>>

    >>>>

    >>

    >>

    >

    >

    > --

    > m.edu

    > keep it simple

    > http://www.ganimede.ro

    > ro.linkedin.com/.../marianedu

    --

    Kind regards,

    Stefan Houtzager

    Houtzager ICT consultancy & development

    www.linkedin.com/in/stefanhoutzager

    Reply Reply to all Forward

    Tom Bascom  

    Jun 5

    to peg

    Interesting.  I don't think I followed everything.

    I especially don't understand the point of apparently duplicating the data in both the index and in the table.  I'm going to need to go back and watch it again.

    On 6/5/14, 8:38 AM, Marian Edu wrote:

    have found this video to be quite helpful on explaining the differences - www.youtube.com/watch

    main idea, for a clustered index all record data is available (leaf index structure = table structure) while for non-clustered if values from non-index columns are needed the rowid will be used to get the record from table structure... however I don't think this should require reshuffling whole records when adding an 'out of order' record, meaning that might be one implementation but it's definitively a bad one :)

    didn't actually get the idea of the article though, depending on how rows gets stored in the heap using a sorted or random value for the field of clustered index (pk defaults or is mandatory clustered) the time required to insert a record might be higher but that is probably because of the algorithm the engine uses to find the right table structure to put it in and not because it has to rearrange all existing records in some cases.

    and beside, imho the performance impact of using guid (string) as pk is more on the read end than on insert/create one

    On 06/05/2014 02:09 PM, Stefan Houtzager wrote:

    Ah, dummy me. David wrote "The article mentioned earlier in the thread

    describes a way around this by manipulating the GUID to be ordered of

    sorts." indeed.

    On Thu, Jun 5, 2014 at 12:53 PM, Tom Bascom  wrote:

    We didn't forget that.  It has been mentioned several times.  I changed the subject text though because topic of clustered indexes is interesting on its own and comes up once in a while and I wanted to talk about it more generally - not necessarily in relation to guids.

    --

    Tom Bascom

    On Jun 5, 2014, at 3:05 AM, Stefan Houtzager wrote:

    Seems like important content in the article I sent (

    www.codeproject.com/.../GUIDs-as-fast-primary-keys-under-multiple-database)

    is forgotten by some in this thread. "This article outlines an

    approach for using GUID values as primary keys/clustered indexes that

    avoids most of the normal disadvantages, adapting the COMB model for

    sequential GUIDs".

     So it is not about the normal guid with a clustered index.

    On Wed, Jun 4, 2014 at 5:44 PM, David wrote:

    On 6/4/2014 10:20 AM, Tom Bascom wrote:

    On 6/4/14, 11:13 AM, David wrote:

    On 6/4/2014 10:07 AM, Tom Bascom wrote:

    On 6/4/14, 10:59 AM, David wrote:

    Logically, the rows are ordered, but not necessarily physically.  That

    is why there is a need to periodically do a rebuild/reorg of the tables in

    MSSQL.

    Sorry, you lost me.  I would have thought that the index is the

    "logical" part of things and that the ordering in the file is the "on disk"

    or "physical" portion.

    Can you elaborate?

    I can't recall the specifics, but there is some kind of pointer left

    where the new middle row would logically be.  Like a "jump to this block".

    That would seem to defeat the whole "in order on disk" assumption that the

    benefits of clustered indexes are supposedly based on.

    I guess that if it is a rare event that it might not be too horrible?

    You have some control over how rare.  If you have an ascending numeric key,

    you can pack rows pretty tightly.  If you have something like a GUID, then

    you'd set the "fill %" accordingly when you rebuild the index.  This

    underscores why an plain GUID for a clustered key is not a great idea, since

    you have to have your fill % lower than other alternatives.  The article

    mentioned earlier in the thread describes a way around this by manipulating

    the GUID to be ordered of sorts.

    --

    Kind regards,

    Stefan Houtzager

    Houtzager ICT consultancy & development

    www.linkedin.com/in/stefanhoutzager

    --

    Tom Bascom

    Reply Reply to all Forward

    Gus Bjorklundmaybe. there are tradeoffs, always these d**n tradeoffs. when multiple users ...

    Jun 5

    maybe. there are tradeoffs, always these d**n tradeoffs. when multiple users ...

    Gus BjorklundJun 5

    Loading...

    Gus Bjorklund

    Jun 5

    to peg

    On 6/5/14 8:38 AM, "Marian Edu" wrote:

    >and beside, imho the performance impact of using guid (string) as pk is

    >more on the read end than on insert/create one

    >

    >

    maybe.  there are tradeoffs, always these d**n tradeoffs.

    when multiple users are making inserts to the same index, if the key

    values are monotonically increasing then all the inserts can end up in the

    same leaf block of a b-tree.  then there will be contention and for locks

    on the index block.  the transactions serialize and you lose concurrency.

    regards, gus

    "Culture eats strategy for breakfast." -- Peter Drucker

    Reply Reply to all Forward

    Marian Edutrue, but my point was in normal circumstances one will read that data more t...

    Jun 5

    true, but my point was in normal circumstances one will read that data more t...

    Marian EduJun 5

    Loading...

    Marian Edu

    Jun 5

    to peg

    On 06/05/2014 05:29 PM, Gus Bjorklund wrote:

    On 6/5/14 8:38 AM, "Marian Edu" wrote:

    and beside, imho the performance impact of using guid (string) as pk is

    more on the read end than on insert/create one

    maybe.  there are tradeoffs, always these d**n tradeoffs.

    when multiple users are making inserts to the same index, if the key

    values are monotonically increasing then all the inserts can end up in the

    same leaf block of a b-tree.  then there will be contention and for locks

    on the index block.  the transactions serialize and you lose concurrency.

    true, but my point was in normal circumstances one will read that data more than once while there can only be one insert for the lifespan of an 'event'

    and that assertion holds true for all indexes, back to those darn trade-offs I guess... how often one insert data in a table, how often data will be read and how will help a designer/dba to put those on balance and try to find the lesser of two evils :)

    --

    m.edu

    keep it simple

    http://www.ganimede.ro

    ro.linkedin.com/.../marianedu

    mobile: +40 740 036 212

    skype: marian.edu

    Reply Reply to all Forward

    Tom BascomSome data is write-only. (Most implementations of auditing spring to mind...)

    Jun 5

    Some data is write-only. (Most implementations of auditing spring to mind...)

    Tom Bascom  

    Jun 5

    to peg

    true, but my point was in normal circumstances one will read that data more than once while there can only be one insert for the lifespan of an 'event'

    Some data is write-only.

    (Most implementations of auditing spring to mind...)

    --

    Tom Bascom

    Reply Reply to all Forward

    Marian Edu

    Jun 6

    to peg

    On 06/05/2014 09:33 PM, Tom Bascom wrote:

    true, but my point was in normal circumstances one will read that data more than once while there can only be one insert for the lifespan of an 'event'

    Some data is write-only.

    (Most implementations of auditing spring to mind...)

    yeah, but in those cases one could argue there is no need for a primary-key... the most probably index on date (time-stamp) will lead to theracing condition Gus was mentioned on the b-tree leaf anyway since time flow isn't too random unfortunately, always increasing :)

    but that makes me think... a good index must have a good distribution, does this mean a sequence should not be used as a primary key? An always incrementing number (like order number mentioned before) will lead to a skew index and not so great data distribution... well, luckily it's Friday :)

    as for clustered index, wouldn't aordinary b-tree index on all table's fields will be the exact equivalent(not considering limitations)... this will of course duplicate the whole table data (table structure and b-tree index)

    feel like those kind of thoughts would make a great camp fire subject at pug's/exchange events if kept at a level where developers can follow instead of flag it as `dba only` :)

    - what makes a good index

    - when enough is enough(don't over index)

    - composed key vs. single key index (when, why,will a composed key be enough for a query if all fields required are there or will fetch the whole record from table anyway)

  • The implementation for OpenEdge is very relevant.

    From the article:

    I could generate a million random GUIDs in 140 milliseconds, but sequential GUIDs took 2800 milliseconds

    With 11.4 it takes me 498 milliseconds to generate only 10,000 guids.

    def var ii as int no-undo.
    
    def var itime as int no-undo.
    
    itime = etime.
    
    do ii = 1 to 10000:
    
      guid.
    
    end.  
    
    message etime - itime view-as alert-box.