Indexing of Multiple (lots) fields in tables - Forum - Community Groups - Progress Community

Indexing of Multiple (lots) fields in tables

 Forum

Indexing of Multiple (lots) fields in tables

  • OK:

    I'm a little stumped by this one, so thought I would ask the Progress community for some input.

    I have a client who has run a package for about 15 years. There is now a request to index (or at least have a quick search process) for an almost unlimited number of data elements in a table (eg reference#1, #2, .....#n), some currently inside extents.

    Of course, I can create 10, 15 100 separate indexes against this table for all of these tracing data, but that is bordering on pointless, involves a lot of rewriting, and is an immense resource grab. Not a big deal for 1000 records (say a customer table), but in an order table with maybe several million and Gigs of raw data?

    My initial reaction is just to create an indexed cross referencing table with all of the fields and the reference relating back to the order (in this case), but that requires a lot of manual coding as I would need to maintain all the references on every update.

    I cannot think of any easy way around this, so anyone do this? I'm sure it's not an isolated request, so maybe the concepts are already in production somewhere.

    Hope this makes sense, and thanks for your time folks.

    Jim

  • I have a client who has run a package for about 15 years. There is now a request to index (or at least have a quick search process) for an almost unlimited number of data elements in a table (eg reference#1, #2, .....#n), some currently inside extents.

    Can you provide samples of required new searches and existing indexes?

    Requests like this are often successfully solved with word indexes, sometimes with custom word-break tables to support a tokenized search. In a write trigger you could build a super-index, like

    "SalesRep_BBB Country_USA State_NH Name_Lift$Line$Skiing" etc.

    That will allow searching for any combination of the tokens using the Contains phrase:

    CONTAINS "SalesRep_BBB State_N"

  • Sure:

    Some indexes below:

    I would need to make accessible (typically text) for a selection of disparate data entered, so a word index based around a trigger could do this fairly easily.

    As there are multiple companies (entity below in indexes), this could fall apart unless I write something intelligent relating to each company scan as that could be duplicated in the word search. Maybe a bit dangerous.

    Thanks,

    Jim

    ADD INDEX "ndx_ord-pro" ON "order"             

    AREA "Index Area"                            

    UNIQUE                                       

    PRIMARY                                      

    INDEX-FIELD "entity" ASCENDING               

    INDEX-FIELD "pro#" ASCENDING                 

    INDEX-FIELD "pro-sub" ASCENDING 

    ADD INDEX "ndx_ord-barc" ON "order"            

    AREA "Index Area"                            

    INDEX-FIELD "entity" ASCENDING               

    INDEX-FIELD "bar-code" ASCENDING   

    ADD INDEX "ndx_ord-bill" ON "order"            

    AREA "Index Area"                            

    UNIQUE                                       

    INDEX-FIELD "entity" ASCENDING               

    INDEX-FIELD "cust-num" ASCENDING             

    INDEX-FIELD "o_servdt" DESCENDING            

    INDEX-FIELD "pro#" DESCENDING                

    INDEX-FIELD "pro-sub" DESCENDING    

    ADD INDEX "ndx_ord-con#" ON "order"            

    AREA "Index Area"                            

    UNIQUE                                       

    INDEX-FIELD "entity" ASCENDING               

    INDEX-FIELD "o_cnum" ASCENDING               

    INDEX-FIELD "o_servdt" DESCENDING            

    INDEX-FIELD "pro#" DESCENDING                

    INDEX-FIELD "pro-sub" DESCENDING     

    ADD INDEX "ndx_ord-cons" ON "order"            

    AREA "Index Area"                            

    INDEX-FIELD "entity" ASCENDING               

    INDEX-FIELD "o_cname" ASCENDING                I

    NDEX-FIELD "o_servdt" ASCENDING             

    INDEX-FIELD "pro#" ASCENDING                 

    INDEX-FIELD "pro-sub" ASCENDING       

    ADD INDEX "ndx_ord-cref" ON "order"            

    AREA "Index Area"                            

    INDEX-FIELD "entity" ASCENDING               

    INDEX-FIELD "o_cref#" ASCENDING              

    INDEX-FIELD "o_servdt" DESCENDING            

    INDEX-FIELD "pro#" ASCENDING

  • As there are multiple companies (entity below in indexes)

    Assuming Entity is a 4 digit integer, you could add that to the indexed string like this:

    Entity_0001

    Entity_0002

  • so you would need a double "contains"

    (Contains Entity__x

         AND contains Reference IDy) in an element that you load with all the references you need.

    How big a hit is this on resources?

    Thanks

  • so you would need a double "contains"

    That would just be a single contains:

    Contains "Entity__x Reference_IDy"

    The word index is able to resolve this very effectively. AND is the default, OR is optional. The word index can dynamically use the required number of brackets.

  • ID: P12969
    Title: "4GL Query concepts (FOR EACH, FIND, GET, INDEX)"

    ...


    QUERY BY WORDS

    Queries by words are supported through the 4GL CONTAINS clause. They use word-indexes, which have the same structure as regular indexes, but contain an entry for each word in a character field rather than one entry for the whole field, as regular indexes do.

    Evaluating CONTAINS clauses differs from evaluating other index expressions in two ways:

    - a CONTAINS clause is viewed as a single index bracket by the
    client, but may actually use multiple brackets, depending on the
    expression. For example, the query

    for each claim
    where (description contains "lawyer | attorney"):
    end.

    requires two brackets. The decision of how many brackets and how to use them is done by the server, at run-time. This allows the 4GL program to not only change the words in the clause, but also the operators between the words.

    - the CONTAINS clause cannot be evaluated during selection - a
    word-index must always be used. As a result, the client cannot
    deal with it, and the server must evaluate it using the word-index.

    ...

  • Could work. Not sure about scanning an entire set of records for one entry.

    How efficient is the word indexing? How much overhead would this take up in a fairly large scan, say 2 million records looking for 2 elelments in a fairly large word key, maybe 15 - 30 individual elements?

  • I have no data at hand.

    But I've used that technique in the past with very large tables and many different search tokens. It did perform very good. Only specialized indexes should perform better, because the server wouldn't need to merge the result set of multiple brackets.

    But the disadvantage of specialized indexes is, that you'll need a lot of them - for any possible combination - as you mentioned in your initail post.

  • OK.

    Sounds like its a worthwhile option.

    I'll give it a try.

    As the system is already slow during peak activity, this sounds like a great way to leverage a new server.

    Thanks for your help.

    Jim