Use-index prevent multiple brackets: documented?

Posted by Rob Fitzpatrick on 09-Feb-2015 20:23

I'm using both 10.2B and 11.3.

I have a question about the effect of USE-INDEX on index bracketing (and I'd rather not get mired in a discussion of the general evils of USE-INDEX... :) ).  I've been through the documentation and so far I haven't found a reference to the behaviour I have seen.  First, some back story.  I came across a query like this:

for each table no-lock where
  code = 'a' and
  (status = 'w' or
   status = 'x' or
   status = 'y' or
   status = 'z') and
   log = false:


The order in which records are returned by the query is not important.  For the purpose of this discussion, let's say there are two indexes: code-num (components "code" and "num") which is primary and non-unique, and code-stat (components "code" and "status") which is non-unique.  When the query is written as above, the compiler selects index code-num and brackets only on "code".  This is inefficient as the bracket ends up being almost all of the records in the table.  A little less than half of the records in the table satisfy the WHERE clause.

First, the developer changed the query by adding "use-index code-stat", thinking this would improve things.  Using -zqil and QryInfo I demonstrated that this query still only bracketed on "code" and read the same number of records and index keys as the first query.  I suggested rewriting the query like this:

for each table no-lock where
  (code    = 'a' and
   status  = 'w' and
   log     = false)  or
  (code    = 'a' and
   status  = 'x' and
   log     = false)  or
  (code    = 'a' and
   status  = 'y' and
   log     = false)  or
  (code    = 'a' and
   status  = 'z' and
   log     = false):


Written this way, -zqil and INDEX-INFORMATION both showed that the compiler selected index code-stat and used four brackets, each an equality match on both components.  QryInfo showed that this query was much more efficient than the first two; blocks accessed dropped in half and records read were just slightly more than the number of records that satisfy the query.  Good stuff.  I understand that using multiple brackets means record order is not guaranteed, but as I said in this particular case it doesn't matter.

Just for kicks I tried adding "use-index code-stat" to the end of the query.  Can't hurt, right?  It's going to use that index anyway.  Wrong.  This is the least efficient of the queries: it results in a table scan.  Re-reading the docs (FOR statement, Triggers and Indexes) and various KB articles (Index Brackets Explained, Single Index Selection Explained, Multiple Index Selection Explained, Using USE-INDEX Explained, etc.), I couldn't understand this at first.  Then I found this info in the article "4GL Query Concepts" (S000012195):

"In version 6, FOR EACH queries use a single index cursor and a single index bracket, and their performance is similar to that of FIND queries. In version 7 and later, they use server queries, and utilize multiple index brackets and multiple indexes when possible.

There are two ways to force FOR EACH queries in version 7 to behave like they did in version 6: by using the -v6q startup parameter, or by using the USE-INDEX clause."

So that explains the runtime behaviour.  But I may still file a documentation bug.  Apart from the KB article mentioned above, does anyone know if this side-effect of USE-INDEX is explicitly mentioned anywhere in the OE docs?  Any help is appreciated.

All Replies

Posted by Tim Kuehn on 09-Feb-2015 22:05

In your final scenario, USE-INDEX disables index bracketing, and the query isn't structured in a way the compiler can map anything in the WHERE phrase to the leading components of the specified index, so the compiler wrote the query as a TABLE-SCAN.

Posted by S33 on 10-Feb-2015 09:19

A better approach to solve the real issue is to create a workfile or temp-table and populate it with the 4 values for status, and then use that in the "for each" to make better use of the keys.

def temp-table tt1 no-undo

   field s as c.

do transaction:

   create tt1. tt1.s = 'w'.

   create tt1. tt1.s = 'x'.

   create tt1. tt1.s = 'y'.

   create tt1. tt1.s = 'z'.

end.

for each tt1 no-lock,

   each table where table.code = 'a' and table.status = tt1.s and table.log = false:

Posted by TheMadDBA on 10-Feb-2015 09:21

I don't know if it is specifically spelled out in the docs that USE-INDEX will cause the query to use one index exactly one time, but it is certainly implied in the function name. It has certainly worked this way since the introduction of USE-INDEX.

If you check the XREF you will see that the version without USE-INDEX will show 4 SEARCH entries for that index, because Progress is using that index 4 times. With USE-INDEX the XREF will only show 1 SEARCH entry.

The documentation could certainly use more detail on the evils of USE-INDEX, especially about how it will not magically break the indexing rules to force proper usage of that index.

Posted by Rob Fitzpatrick on 11-Feb-2015 20:04

While I agree that "USE-INDEX" does imply that only one index will be used, I wouldn't go so far as to say it also implies the use of only a single cursor and a single bracket on that index.  It would be helpful to have that spelled out in the docs.

Thanks to all for the input.  I will open a case and ask TS to get the documentation clarified in a future release.

Posted by dbeavon on 30-Oct-2018 08:09

Rob, Did Progress ever clarify the unintentional and non-intuitive impact of USE-INDEX on query performance?  

I agree that this should be documented.  Many developers these days use SQL engines on a regular basis as well.  Many would think of "USE-INDEX" as little more than a "hint" to guide the query optimization process (especially if that syntax is used dynamically at run-time and included in a QUERY-PREPARE statement).  When using "USE-INDEX" as a "hint", a developer would not assume that the syntax would preclude multiple brackets.

Posted by Brian K. Maher on 30-Oct-2018 08:18

USE-INDEX has always told the AVM to ignore anything it might be able to do and just ‘use this index’.  By using it you are telling us that you know your data better than we can know it and to just ‘do what i tell you to do’.
 
Search KB for “4GL Query Concepts and Usage” article.

Posted by marian.edu on 30-Oct-2018 08:20

I would say this is pretty well documented already, in record phrase it does say it identifies the index to use otherwise Progress will select one based on query selection... it doesn't say it's just a hint that might well be ignored by the engine :)


Marian Edu

Acorn IT 
www.acorn-it.com
www.akera.io
+40 740 036 212

Posted by Rick Terrell on 30-Oct-2018 08:42

The documentation is buried in the ABL reference in the notes for both FOR and OPEN QUERY. 


Multi bracketing is not a given on any query, especially complex ones. To get multi bracketing usually requires repeating fields in the query, particularly if an “or” is involved. All criteria must be in both (or all) sides of the “or”. 


Rick Terrell 
Principle Consultant, Professional Services 
Progress

Sent from my iPhone

On Oct 30, 2018, at 8:21 AM, Brian K. Maher <bounce-maher@community.progress.com> wrote:

Update from Progress Community
Brian K. Maher

USE-INDEX has always told the AVM to ignore anything it might be able to do and just ‘use this index’.  By using it you are telling us that you know your data better than we can know it and to just ‘do what i tell you to do’.
 
Search KB for “4GL Query Concepts and Usage” article.

View online

 

You received this notification because you subscribed to the forum.  To unsubscribe from only this thread, go here.

Flag this post as spam/abuse.

Posted by dbeavon on 30-Oct-2018 08:43

It should be replaced with a more explicit keyword (maybe "USE-INDEX-WITHOUT-MULTIPLE-BRACKETING").  PDSOE's auto-completion should help with that. ;)

I ran into the same confusing issue that Rob originally described, only at run-time with a dynamically-created QUERY-PREPARE statement.  The index is fine and I do want to use it.  But what is counter-intuitive is where OE decides it cannot use the index more than *once*.  Because of that decision on OE's part, the keyword essentially becomes self-defeating.

Posted by Brian K. Maher on 30-Oct-2018 08:50

Please submit an enhancement request
 
 
Brian Maher
Principal Engineer, Technical Support
Progress
Progress
14 Oak Park | Bedford, MA 01730 | USA
phone
+1 781 280 3075
 
 
Twitter
Facebook
LinkedIn
Google+
 
 

Posted by Rob Fitzpatrick on 30-Oct-2018 10:16

David, thanks for bringing this up.  It relates to some developer training we'll be doing internally and to a conversation I was in recently at PCA. 

The consensus was that there are a lot of developers who don't understand the nuances of USE-INDEX and that it isn't well documented.  This isn't changed by the fact that it has always worked the same way or that there are people out there who understand it perfectly.  New developers shouldn't have to spend years unknowingly writing bad code or develop the intuition that the reference docs don't tell the whole story so they have to go spelunking in the KB. 

> Rob, Did Progress ever clarify the unintentional and non-intuitive impact of USE-INDEX on query performance? 

This is what the 11.7 docs say under "Record phrase":

USE-INDEX index

Identifies the index you want to use while selecting records. If you do not use this option, the AVM
selects an index to use based on the criteria specified with the WHERE, USING, OF, or constant
options.

No mention of bracketing.  Clearly, this isn't the whole story on USE-INDEX.  Is there some other place in the docs that explains it in more detail?

I looked back through my old cases and now realize I didn't file a doc bug as I said I would.  I'll do that.  Thanks.

> I would say this is pretty well documented already

I wouldn't.  :)

> Multi bracketing is not a given on any query, especially complex ones. To get multi bracketing usually requires repeating fields in the query, particularly if an “or” is involved. All criteria must be in both (or all) sides of the “or”.

This still doesn't mention USE-INDEX's effects on bracketing.

> It should be replaced with a more explicit keyword (maybe "USE-INDEX-WITHOUT-MULTIPLE-BRACKETING").

I'd be happy with simply improving the reference docs for USE-INDEX, adding a sentence or two to say what the feature actually does.  It would be a quick, cost-effective change that would help people.  Without that change it's an incomplete reference.

Posted by Lars Neumeier on 31-Oct-2018 13:20

Hello,

This is a very interesting topic, when I discovered this problem in december 2014 I thought that this is a bug, but unfortunately it was expected behavior. (case-number: 00294978).

attached article:

knowledgebase.progress.com/.../P7066

knowledgebase.progress.com/.../21097

knowledgebase.progress.com/.../21094

USE-INDEX has some advantages when working with dataservers (Oracle):

(case-number: 00330487)

knowledgebase.progress.com/.../DataServer-for-Oracle-generates-different-queries-depending-on-use-of-USE-INDEX-or-BY-clause

Kind regards,

Lars Neumeier

Posted by dbeavon on 31-Oct-2018 14:23

Thanks for sharing the links with us.  I see you have a lot of experience running ABL on machines that are remote from the database (ie. not running in "shared memory").  One thing I noticed - and made me a bit envious - was your DataServer references.  They talk about "server-side joins" which is something we don't even have yet for code that connects from ABL-to-OE.  

On the CVP forums I had tried to start a discussion about the possibility of allowing us access to a "DataServer for Progress SQL" (see community.progress.com/.../38825  )  It seems to me that this might be a path towards having server-side joins in our client-server code (ABL-to-OE).  

I'm assuming that the OpenEdge "DataServer for ODBC" could be made to work against Progress SQL, and I'm assuming that unlike regular client-server connectivity,  the DataServer WOULD actually have server-side join functionality (just like the other DataServers).  That would be awesome.  It is a missed opportunity that Progress has never allowed us to connect to its SQL engine via DataServer.  It would probably alleviate many performance problems which exist in their standard client-server connectivity...  

My understanding that for OE 12 they are trying to enhance their standard client-server it to allow server-side joins but it seems to me that this would already be available to us by now if they had ever given us "DataServer for ODBC".

(Sorry to take things a bit off-course with a discussion about another type of performance concern)

Posted by Stefan Drissen on 03-Nov-2018 05:08

A few years ago, prior to the recent single-shot query DataServer adjustments, we adjusted our report engine to use send-sql-statement to send the sql statements directly to the database allowing database join resolution. We implemented this for both Oracle and SQL Server DataServers. I had a look at also implementing this for the OpenEdge side (using ado.Net I think) where it also made a substantial difference.

The only reason I did not continue on this path was the deployment issue, the database would need to have a sql port open (security concern) and the clients would need to have an odbc connection - both which current installations did not yet have.

Posted by ske on 07-Dec-2018 09:27

> ...I had tried to start a discussion about the possibility of allowing us
> access to a "DataServer for Progress SQL" ... It seems to me that this
> might be a path towards having server-side joins in our client-server
>code (ABL-to-OE).

Some years ago, I think the standard answer to such problems would have been to use a local AppServer on the database server machine (or close to it over high-speed LAN) with pre-programmed procedures to perform all complex queries (both joins or any kind of complex data processing), and then return the data to the client. That removes the inefficient network traffic for executing remote database queries over multiple tables from ABL. (Maybe it is still the answer.)

I'm guessing that remote access from clients directly to the database for complex queries may not have been so common, since those would usually have been implemented via an AppServer, and that way most direct database access from ABL did have access to shared-memory access or at least high-speed network access, and thus Progress did not prioritize (or were able to hold off) optimizing remote database access within ABL queries.

Posted by dbeavon on 07-Dec-2018 13:31

@ske

I think you are correct about the workaround of running appserver locally (in shared memory). There is a KB which is intended to discuss the topic of client/server database performance : knowledgebase.progress.com/.../18342   ... but that KB concludes by explaining the "APPSERVER ADVANTAGE", just as you did.  It almost sounds like a capitulation, and encourages people to avoid client/server in the first place.

However, these days people want their ABL logic running on another, lower-cost, tier instead of putting it all on the same hardware that is shared by the OE database.  Moving this stuff to another tier also adds fault-tolerance and load-balancing.  I think the new PASOE will probably run with client/server connections more frequently than "classic" appserver ever did.  Anyone trying to deploy PASOE on DOCKER, for example, will probably avoid shared memory connections.  Hopefully the client/server configuration will start becoming a high priority for Progress.

This thread is closed