Slow indexes - Forum - OpenEdge Development - Progress Community
 Forum

Slow indexes

  • Hi again,

    Just wondered if there is anything to look out for when creating indexes on a table?

    We have a program searching through an index (I have checked the xref and it is definitely using the index) minus a couple of fields.  But it still sometimes takes 30 - 40 seconds to return 3000-4000 records.

    Are there any combinations of data etc... that can cause an index to still be slow?

    As a reference, this table has well over 1 million records so without using the index it would take 30-40 minutes to do the same thing.

    Thanks for any help/pointers to reference text.

    Oliver

  • The ABL Handbook has a bit of info about the use of multi-component indexes. If your index comprises columns A, B and C in that order; and your query is using columns A and C, you will only get the benefit of column A when the index is bracketed.

  • Thanks,

    The index uses column A, B, C and D (for example).

    The "for each" that we have uses columns A, B and C (but not D) in fact we actually have another variable on the end of A, B and C that is separate from the index altogether.

    A, B and C usually return about 3000 records.  Using the final variable takes them down to 5 or 6.

  • A much better way than studying the xref for determining what is going is is studying the vst table in order to find out how many reads there are on what tables and what indexes.

  • Is it any faster returning the 5 or 6 than the 3000 (assuming they aren't all in memory already due to -B)? If the final variable is not a component of the index, then you will only be bracketing on A,B and C (assuming the check on A and B is for equality).

  • No, the query seems to be slow even without the last variable.

    We've added a log for this particular search to put out some params and the time it takes (on the live database) and it seems to be run all the time (i.e. keeping it in memory).  At the moment each time it is run takes about 300-400 ms (according to ETIME).

    Next time we get a complaint that it is going slowly we'll be able to see exactly how long the query took (hopefully).

    The query is ....

    (input list of numbers (depot numbers))

    loop each number in the list.

    for each blah where blah.int1 = 1

    and blah.char = "value"

    and blah.depot = current entry in list

    and blah.dec = 1.00

    no-lock:


    The blah.dec is the extra variable not included in the index.

    Going through some of the different depots and checking the number of records returned by the index part of the query I get the following...

    3754

    6028

    3237

    5224

    3319

    and so on.

    with the final variable I get 3 at the most returned.

  • A few results from the log show that there is a definite problem.

    Because the search is run fairly often it seems to be in the memory a lot.

    The query generally takes around half a second to run.

    But if it hasn't been run for around 10 or 11 minutes then the next time it runs it will take 8 minutes to run before then going back to half a second.

    This is for exactly the same query with exactly the same parameters.

  • Seems to me that -B it too small then.

  • That would be one fix but it's kind of ignoring the problem at hand.

    It just means that there will be a longer period between queries when it suddenly takes 8 minutes again instead of half a second.

    TBH I think we're going to have to rework how this query works completely.

  • The problems are:

    1) A, B and C describes a set of several thousand records, so your query will return all of these records.  The "extra variable" is then used to filter that result set.  Adding an index on A, B, C and the extra variable would be the obvious way to make it so that the query only returns the 5 or 6 records that you need.

    2) Because you are processing thousands of records you need a substantial number of IO ops when those records are not present in the buffer pool.  You might be able to mitigate this by increasing the size of the buffer pool (small increases are pointless, effectiveness of -B follows an inverse square law which means that you must (roughly) double it in size to cut IO in half) or by decreasing the demand for IO ops (see #1).  Depending on your storage area configuration you might also be able to reorganize the data to improve the effectiveness of the IO ops that you are doing and thus reduce the number needed (IOW if you are using type 1 areas the data may be scattered 1 record per block all over the area whereas a well designed type 2 area could pack as many as 256 records into a block and thus reduce IO substantially...)

    As Peter said, using the VSTs to see how many requests you are really making would shed more light on the problem than etime() and an examination of XREF alone can.  Tools like ProTop - http://www.dbappraise.com/protop.html  are very useful in such an endeavour.

    --
    Tom Bascom
    tom@wss.com

  • A much better way than studying the xref for determining what is going is

    is studying the vst table in order to find out how many reads there are on

    what tables and what indexes.

    Peter, have you tried using the -logentrytypes QryInfo ? If so, how does it compare with the VST data?

    -- peter

  • It's interesting information but it is not very handy to work with.

    1) You have to plan ahead and specify a log file.

    2) Managing individual log files for each session on large systems promises to be "unwieldly".

    3) Having to scan through the log file and parse the output is decidedly unpleasant.  Particularly at any scale.

    4) The best bit of info is "4GL records" as opposed to "records from server".  So far as I know this is the only way to get that precious nugget.  (It needs to find its way into a VST.)

    So, my opinion, almost usable, albeit ugly, for single user development issues when you already know that a particular bit of code might be a problem but almost completely useless for a deployed system.

    --
    Tom Bascom
    tom@wss.com

  • When you say "current entry in list", are you talking about a single variable which contains the extracted value of the current entry or are you talking about entry(n, list)?

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • pjudge wrote:

    Peter, have you tried using the -logentrytypes QryInfo ? If so, how does it compare with the VST data?

    Hi Peter,

    Tom beat me to it. Indeed QryInfo provides interesting information but it is not easy to process, although it all depends on the tools at hand. VSTs certainly are not easy to process either, but there are some good tools available to make them visible such as Protop, the VST viewer and the Query Tester and that makes all the difference. I suppose one could develop such a tool for processing the contents of the log file, but it would not be as easy and you have to deal with filtering the data, cleaning up the file etc. Personally I have never been tempted to undertake that endeavor.

  • Tom beat me to it. Indeed QryInfo provides interesting information but it is

    not easy to process, although it all depends on the tools at hand. VSTs

    Thanks to you both for your replies.

    I suppose one could develop such a

    tool for processing the contents of the log file, but it would not be as easy

    and you have to deal with filtering the data, cleaning up the file etc.

    Personally I have never been tempted to undertake that endeavor.

    I should note that a tool to help read the logs exists already; it's in the code share section at http://communities.progress.com/pcom/docs/DOC-3498 *

    -- peter

    • I've not figured out how to 'properly' link in email replies