A case for deferred TT creation - Forum - OpenEdge Development - Progress Community
 Forum

A case for deferred TT creation

  • This DBI listing came from a development system after the client had problems with users on the production system "running out of space" on a HP-UX box running 10.1B

    (user info) 168230912 Sep 30 11:56 /protemp/DBIa22729

    in this case, the user went to a order-entry screen, and then did nothing. The net result is a DBI file of 168,230,912 bytes for a set of TTs that barely hold anything, or nothing at all!

    I'm not sure where the TTs that go into this file are coming from either as there are plenty of cases where TT definitions may be included in a procedure but never used.

    However, the time spent creating this DBI file may be the single root-cause why my client's procedure-managed code is taking so long to load.

  • That's a lot of TTs! It would be interesting to know how many.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • It may not be, because if the DBI area is a Type II structure, then the DBI file size makes sense since for every TT (and possibly each of their associated indexes) there may be at least one cluster's worth of space required. If a cluster is a couple MB in size, then getting to 160MB may not take that much trying.

    What I find puzzling is that, in cases where a static TT is never referenced either explicitly or via a handle, why the AVM would make an entry in the DBI file for it - with the resulting space usage and time delay taken while writing to disk.

  • I don't know if it is correct, but that is certainly an interesting model. One thinks of a non-instantiated TT as a tiny little thing, using very little memory, but if it is one per cluster in the DBI, then certainly it would add up pretty quickly. And, it could well be that PSC never thought that any one would use 50-100 TTs all at once, much less multiple hundreds (like the guy on PEG).

    Have you considered taking this and the other information to tech support to see if you can get an explanation?

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • I don't know if it is correct, but that is certainly

    an interesting model. One thinks of a

    non-instantiated TT as a tiny little thing, using

    very little memory,

    Exactly! There should be some kind of placeholder in memory, and the TT instantiated when it's first used and not before.

    Have you considered taking this and the other

    information to tech support to see if you can get an

    explanation?

    Yep.

    "Not a bug, this is expected behavior, we're not going to change it, move along."

    They're actually the ones who told me that TTs are setup when a new program is run, which provided me with the clue needed to look for things like this rather impressively sized DBI file.

  • We have a smoking gun.....

    Tim

    ID: P122597

    Title: "The size of the DBI file is much bigger in OpenEdge 10"

    Created: 03/02/2007 Last Modified: 01/08/2008

    Status: Unverified

    Symptoms:

    1. The size of the DBI file is much bigger in OpenEdge 10

    2. The DBI file size is much bigger after upgrading to OpenEdge

    3. The session DBI file in Progress 9.x is typically a few K blocks in size

    4. In OpenEdge 10.x the DBI file is a few hundred K blocks for the same application code

    5. A temp table in OpenEdge increases the size of the DBI file much more than in Progress version 9

    Facts:

    1. All Supported Operating Systems

    2. OpenEdge 10.x

    Cause:

    The cause of this is because the DBI file utilizes Type II Storage Areas in OpenEdge 10. As such the allocation of the DBI file uses clusters in OpenEdge instead of blocks, and subsequently the DBI file uses much more disk space.

    The size of the DBI file can be estimated. It should be about 9 blocks * # of ACTIVE temp tables. (unless any one temp table contains more than 8 blocks of data).

    Also, be aware that, the default block size of the temp tables has changed in OpenEdge 10.1B from 1K to 4K. This is to support the new large index key entries feature. So, if there are no plans to use large index key entries, then request a 1K temp table block size by specifying -tmpbsize 1 on client startup.

    Fixes:

    This is expected behavior.

    To reduce the size of the DBI file use the client startup parameter -tmpbsize set to 1.

    Notes:

    References to Written Documentation:

    Progress Solution(s):

    P81853, "Is there any advantage to using Type II Areas for a table when I "

    P81745, "Guidelines for Type II Area Blocks per Cluster setting with Progress OpenEdge 10.X?"

  • another one...

    ID: P134680

    Title: "Increasing temptable blocksize decreases performance due to DBI growth."

    Created: 08/29/2008 Last Modified: 08/29/2008

    Status: Unverified

    Symptoms:

    1. Increasing temptable blocksize decreases performance due to DBI growth.

    Facts:

    1. After upgrading to OpenEdge 10.1B the DBI size increases.

    2. Increasing tmpbsize from 1kb to 8kb increases DBI size substantially.

    Cause:

    Increasing the temptable blocksize will increase the amount initially allocated for temptables. The more temptables that are defined, the more memory/DBI space is consumed.

    Using an 8kb temptable blocksize will allocate approximately 8 times the amount compared to using a 1kb blocksize.

    Note that it is only the initial allocation that increases to this degree. When the temptables actually get used then they are more comparable.

    For example:

    1kb blk 8kb blk

    Defining only 442368 3538944

    Populating 15884288 17629184

    Due to the increase in temptable space allocation, if the DBI file is being used a lot more then it can negatively impact performance especially when large numbers of users are involved.

    In 10.1B the default temptable blocksize has been increased from 1kb to 4kb.

    Fixes:

    There are a couple of solutions to the problem:

    1. Decrease the temptable blocksize by using the -tmpbsize parameter:

    -tmpbsize 1

    2. Increase the amount of memory each client allocates for temptable buffers by using the -Bt parameter. The DBI file is used once the temptable buffers have all been filled. Note that this memory is allocated for EACH CLIENT. Increasing this parameter on a system that doesn't have adequate memory can cause negative performance.

    The -Bt parameter specifies the number of buffers to allocate, each buffer is the size of a temptable block.

    In Progress versions up to 10.0B the default is -Bt 10.

    In Progress version 10.1A onward the default is -Bt 255.

  • Here's the difference running the same code:

    -tmpbsize 8

    52,887,552 Sep 30 17:17 DBIa27416

    -tmpbsize 1

    6,709,248 Sep 30 17:15 DBIa27253

  • So, what kind of performance are you seeing with the 1K size?

    Just thinking off the top of my head here, but it seems to me that if there are a lot of TTs defined, there is a good chance that many are either empty or have only a few records (or, at least if there are a large number of all active TTs all with a lot of data, then one shouldn't be surprised at less than good performance). That being the case, then it seems like going for a small blocksize would be indicated.

    Deferred instantiation would only actually help if there were many more TTs defined than were actually used. I suppose that is not a very ordinary programming model. Instead, I would think that if there were a TT (or group) that might or might not get used, then one would put it (them) into their own .p or .cls and only instantiate it when needed.

    In any event, it does seem to produce a very strong argument against the use of a TT in a class to hold entity data since that is bound to result in a large number of very small TTs and thus a huge disk overhead.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • So, what kind of performance are you seeing with the

    1K size?

    Operationally - I didn't try it, this was just to see if -tmpbsize 1 would shrink the DBI very much. I would think that the user could handle the operational slowdown compared to the time delay loading the program set.

    Just thinking off the top of my head here, but it

    seems to me that if there are a lot of TTs defined,

    there is a good chance that many are either empty or

    have only a few records (or, at least if there are a

    large number of all active TTs all with a lot of

    data, then one shouldn't be surprised at less than

    good performance). That being the case, then it

    seems like going for a small blocksize would be

    indicated.

    Yes.

    Deferred instantiation would only actually help if

    there were many more TTs defined than were actually

    used. I suppose that is not a very ordinary

    programming model.

    In our case, there are a number of cases where a set of related TTs are defined in a .tt file, even though the including program may only use one of those TTs, so there could be a number of un-used TTs created when the procedures are instantiated.

    Instead, I would think that if

    there were a TT (or group) that might or might not

    get used, then one would put it (them) into their own

    .p or .cls and only instantiate it when needed.

    If one wanted to get that fine-grained about it, sure. But I don't see that happening in a typical shop.

    In any event, it does seem to produce a very strong

    argument against the use of a TT in a class to hold

    entity data since that is bound to result in a large

    number of very small TTs and thus a huge disk

    overhead.

    Which is what I'll bet was going on when running code with a high TT count from the the class instance generator I posted earlier.

  • So, are you going to try this at the customer with the problem? It would certainly be interesting to see the result.

    In our case, there are a number of cases where a set of related TTs are defined in a .tt file, even though the including program may only use one of those TTs, so there could be a number of un-used TTs created when the procedures are instantiated.

    Doesn't seem like the way I would design. Reminds me of the old days where people had these huge include files with a zillion shared variables and any one program might only access a couple of them.

    If one wanted to get that fine-grained about it, sure. But I don't see that happening in a typical shop.

    I don't suppose I can make any claims for typicity, but it is sure the way things are done around here. It is very, very rare these days that I use an include file of any form ... and the only instance I can think of off the top of my head in several years was a consequence of not having generics for the collection classes.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • So, are you going to try this at the customer with

    the problem? It would certainly be interesting to

    see the result.

    I'll certainly bring it up.

    In our case, there are a number of cases where a set

    of related TTs are defined in a .tt file, even though

    the including program may only use one of those TTs,

    so there could be a number of un-used TTs created

    when the procedures are instantiated.

    Doesn't seem like the way I would design. Reminds me

    of the old days where people had these huge include

    files with a zillion shared variables and any one

    program might only access a couple of them.

    With associated TTs defined in a file, there are no 'zillions' of things to maintain in different places like with shared vars.

    If that approach wasn't followed, then it would have to be a bunch of base tt defn's, and then a multitude of different includes to get the different tt combinations. Or multiple includes in a given parent file, one for each tt - but tha would require remembering those combinations when passing multiple TTs to an API.

    If one wanted to get that fine-grained about it,

    sure. But I don't see that happening in a typical

    shop.

    I don't suppose I can make any claims for typicity,

    but it is sure the way things are done around here.

    It is very, very rare these days that I use an

    include file of any form ... and the only instance I

    can think of off the top of my head in several years

    was a consequence of not having generics for the

    collection classes.

    So how do you pass TTs between APIs? Code generator defines the TT in a class, and then use the class to manage the tt?

  • So how do you pass TTs between APIs?

    Current theory is to pass XML if over a wire or the enclosing class if not.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • Current theory is to pass XML if over a wire or the

    enclosing class if not.

    which is why I do things the way I do - there's no "enclosing" class in this application.

    Yet.

  • Doesn't need to be a .cls. A .p is just as enclosing.

    The question to ask here is whether or not there is a sound reason to have large numbers of temp-tables. If not, then discovering this "property" that large numbers of temp-tables uses up large amounts of disk and slows performance is an interesting fact, but not really a problem that needs to be solved. If they were a large number of TTs with a lot of data, then the problem is inherently complex and one probably has modest expectations of performance. Moreover, lazy instantiation wouldn't help. Further, a smaller block size probably wouldn't help either.

    Your issue, though, is that you have a large number of temp-tables, many of which are not being used. So, for you, you don't expect the performance issue because you aren't doing that much. So, for you, lazy instantiation would help and a smaller block size might provide some level of help ... we hope that is the case and are looking forward to hearing what you find.

    The key question, though, is whether PSC should invest the work necessary to move to lazy instantiation. I don't know how complex this is, but it sounds like it might be difficult. So, we have situations where people have wanted to use large numbers of temp-tables, like that fellow on PEG who put one in every entity object. Lazy instantiation doesn't help there since there is actually data in them. All that would help there is not allocating disk space until it was necessary to overflow memory and giving it enough memory to work with.

    So, I guess we actually have two versions of lazy instantiation. 1) Not building a structure at all until needed; and 2) Not allocating disk space until an overflow condition was reached. Reminds me a bit of the problem with some Unix systems in which one had to allocate large amounts of disk space for swap even though one had more than enough RAM to never swap.

    Both 1 & 2 sound like they could be good ideas, but the question to ask is how important they are. #1 is only relevant if there are large numbers of TTs which are never used. My own reaction is to think that possibly this isn't the best possible programming and that if one encapsulated things, the issue wouldn't arise. I have some trouble coming up with examples where I would want to do such a thing by design.

    #2 seems like it could be of wider applicability since much of the time we are likely to provide enough memory for the TTs to all be entirely in memory and on those rare occasions when we want really large TTs, then we are unlikely to mind a tiny performance hit for initializing the overflow area.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com