A hundred million rows? - Forum - OpenEdge RDBMS - Progress Community

A hundred million rows?

 Forum

A hundred million rows?

This question is not answered

I'm having trouble with a table that has a total of about a hundred million rows.  Is that the limit of what the DBMS is capable of? 

For example adding new indexes is extremely painful.  On top of everything else, I think I've run into a new bug where the OE database refuses to create an active, unique index via a DDL statement.  The statement runs for about 20 mins and then the disk rate slowly drops off to zero, but the CPU remains on 100% indefinitely.  I knew that adding the active index via DDL was going to take a lot longer than using the proutil/idxbuild...  but I didn't imagine that it would run forever!

In SQL Server I can do a hundred million rows (per year) pretty easily, especially if I give it a clustered index of my own choosing (usually some type of custom, surrogate key/s). 

We are using the enterprise edition of the OE RDBMS, but haven't purchased the "table partitioning" feature yet.  I suppose that is the next step forward, right?  Can someone tell me how my table compares to their biggest table (in a single database *without* partitioning).  I'd also like to know how you would suggest adding an index to it, if you were allowed a large outage window, eg 10 hours or so.

All Replies
  • I think one has to step back and realize that while adding an index to a 100M row table might seem like just adding some reference information, one *is* actually adding 100M pieces of data.  While there might be ways to do this particularly quickly with a DB that is inactive and not being replicated, if the DB is active and being replicated, then that is a huge amount of new information being added to the DB and a huge amount of information which must be passed to the replicate.  Hence the whole thing taking a long time using SQL and/or with replication active.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • 20 years ago HP used to make some nice servers.

    I cannot think of any reason why I would want to still be running on HPUX in 2020.  You may as well shoot yourself in the foot while your foot is inserted in your ear.  Every moment that you spend not migrating away from HPUX is more agony for your users.

    I feel the same way about Solaris.

    AIX still has some value at the extreme high end but it isn't very competitive in the mid range or lower.

    --
    Tom Bascom
    tom@wss.com

  • Tom, I was thinking earlier along these lines, to wit, if one is trying to do things of this scale and the performance isn't what one would like, should one actually be blaming Progress or, how likely is it that it is really the platform to blame.  Do you or others have experience with tasks of equal or greater scale which perform at dramatically higher levels without being on exotic hardware which would be beyond a site like this?  Perhaps it is time to point the finger at the real problem?  I know you have a rich store of examples where enterprise IT departments have dictated storage for DBs that is not good for supporting DBs and the contrast between that and some in box disks or even SSDs is staggering.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • We been running HPUX for 20 years and the performance has been great for the last 14 years.  I personally love HPUX.  It's really all about the storage and ABL programs, anyhow we are going to migrate to RH Linux in the near future, not because of performance, because OE 12 does not support HPUX anymore.

  • I think that unless you have hobbled your new servers by choosing lots of slow cores instead of a small number of fast cores or by continuing to use slow SAN storage that you will very likely be amazed at just how much faster a modest Linux box is.  There is simply no comparison.  HP and SUN lost their way a long, long time ago and it hasn't been close for 10+ years.

    --
    Tom Bascom
    tom@wss.com

  • > So in reality I would probably add it inactive, leave replication running, and build it online with a combination of idxfix and idxactivate.

    That is want I would do too. But the question is How much time it would take to run idxactivate for the 400 million row table. I have several tables over 400 million rows. And while it is not even close to Georges case, I do not think single-threaded idxactivate is possible to do in 24 or 48 hours.

    idxbuild on an other hand is multi-threaded and runs circles around idxactivate (and idxdix) !

    As to "adding 100M pieces of data", yes SQL DDL would take some time in Oracle DDL too. I am easily adding indexes to the Oracle tables of the size much greater than we discussed here ( except George's client of course ). And IMHO, adding an index to a big table is the biggest thing that defines the difference between Progress and others (Oracle) for me. Everything else I can tolerate.

    I hope that the speed of idxactivate will be matching idxbuild soon! We really desperately need it!

    Dmitri Levin

    Alphabroder

  • Hello

    YMMV , I let you estimate and do your own test on idxactivate , …..

    I did some tests on idxactivate (with and without idxfix before) on a sports2000 db and another test DB (each time 2 fields in the new index with more fields => more time).

     - On a 10 million row table it took around 10 minutes to achieve idxactivate (mean value 614 secunds).

     - On a  1 million row table it took around 1 minutes to achieve idxactivate (mean value 60 secunds).

    YMMV , I let you estimate and do your own test on idxactivate ,

    I do not test with 400 million row table ...

    On idxactivate there are 2 different possibilities

    -  useindex [UNIQ INDEX]

    -  useindex [NOT UNIQ INDEX]

    Progress write that you can use use IDXACTIVATE “ONLINE”

    - with use-index “not uniq index” you cannot create update or delete row of the table

    - with use-index “uniq index” you “can” create update or delete row of the table

    In the case of using an “Uniq Index” to do the idxactivate,

    I did some tests with an create / update/ delete activity on this table when I did my idxactivate.

    I notice some few things

    - At the end of the wait time (of the usernotifytime) the schema lock to update this index could crash my other session (for example message “6527” “Table <table name> is locked.  Record create failed.”)

                -> No activity when you launch it

    - During the IDXACTIAVTE (create index keys) , I could have some messages “2624” due to the lock of idxactivate on the session that do the update activity

             ->  Test and probability show that decreasing the recs from 5000 to 50 have some benefit on those messages

            -> But it increase my AI file size of 3 % and the elapse time of 1.5 %

    I notice too that idxfix before idxactivate could be suppress in case of activating a non-unique Index ,

    The block of the “inactive” index create by the idxfix are dropped at the begining of IDXACTIVATE .

    So it increases for IDXACTIVATE  time ( 0.8 %) and AI file size  (3.5 to 4 % depending on the recs value 50 or 5000)

    Remember YMMV.

    Patrice

  • Hi

    Forgot to thanks Paul and James for their workshop and breakout session.

    Patrice

  • >> I did some tests on idxactivate ...

    Your times are very similar to what I experienced.  I was adding two indexes of about 5 fields each to a table of 100 million rows.  The addition of each of the new indexes was taking about two hours.  For my tests I used two approaches that were both done on an offline database (1) adding the INDEX so that it is initially active via SQL, and (2) adding the INDEX as an inactive one and then using idxactivate afterwards.

    Did you happen to monitor your CPU and disk while doing this?  My observation is that CPU runs on a single core at 100% the entire time and disk is *not* really taxed at all (no issues related to the response times on disk, especially when using a fast SSD).

    The bottleneck definitely appears to be CPU (or more specifically the clock speed of a *single* core).

    Insofar as performance goes, it seemed that "idxactivate" was similar to adding the an INDEX that is initially active via SQL.  The main difference is whether the system is available to user while the index is being built.

  • If I understand correctly, the activity of idxactivate consists of:
    1. Record reads using a base index (“useindex”);
    2. Index key reads from the uncompleted tree of the activated index;
    3. Creating in memory the recovery notes for the index blocks.

    Idxactivate does not write BI blocks, it’s a duty of BIW process. Transaction size (the "recs" option of idxactivate) or bi cluster size may affect the result but it's an indirect influence and it can be more or less eliminated.

    So the main activity is a reading, me think. Did anybody compare the idxactivate time with the time to read a table using a base index and/or the activated index?

    My guess: idxactivate is faster than idxfix 4. Cross-reference check of multiple indexes for a table (if you will select two indexes – the base and activated ones). Note that idxfix supports the -rusage option.

    And, of course, the reading tests require to empty the database and filesystem caches.

  • > On Jan 22, 2020, at 7:29 AM, George Potemkin wrote:

    >

    > Update from Progress Community

    >

    > George Potemkin

    >

    > If I understand correctly, the activity of idxactivate consists of:

    > 1. Record reads using a base index (“useindex”);

    > 2. Index key reads from the uncompleted tree of the activated index;

    > 3. Creating in memory the recovery notes for the index blocks.

    >

    note on 1: mostly sequential if reference index matches storage order

    note on 2: random reads not sequnential

    4. creating index entry in new index

    5. writing modified index blocks to disk

  • > 5. writing modified index blocks to disk

    Should it be done by APWs?

  • Hi ,

    Extract of my tests , on orderline with 10 000 000 rows , idxactivate index  OL_new  using itemnum (or orderline).

    Table activity durng idxactivate :

    - READ :

       - _File              ->10 000 410

       -  OrderLine     ->10 000 000

       - _Field            ->            111

       - _Index           ->             42

       - _Index-Field  ->               6

    - Update

       - _Index          ->              1

    INDEX ACTIVITY during IDXACTIVATE

    - Create

        - OrderLine.OL_new                   ->10 010 615      

    - Read

        - OrderLine.itemnum                   ->10 000 001      (same value for OrderLine.orderline )

        - _File._File-Name                      ->            407

        - _Index-Field._Index/Number    ->              10

        - _Index._File/Index                    ->                5

    - Split

        - OrderLine.OL_new                  ->       10  637

    I think you can add

    => For each record of the table (orderline) , you do  1 read of _file

    I find a huge difference between IDXACTIVATE using an "uniq index" and using a "non-uniq Index" , on the "Record LOCK" of promon :

    - the difference is 20 000 000

     - Probably explainable by the fact that the update are alowed when we use an "Uniq index"

            =>  cannot create the new index key when its record is locked by another user

    One more point ,  After IDXACTIVATE , you probably need to do an IDXCOMPACT on the new index.

    Patrice

  • > On Jan 22, 2020, at 2:17 PM, George Potemkin wrote:

    >

    > > 5. writing modified index blocks to disk

    >

    > Should it be done by APWs?

    >

    Yes. But it must be done and is part of the additional load on the database and system imposed by index activate. As are the additional transaction log writes.

  • > On Jan 23, 2020, at 4:13 AM, Patrice Perrot wrote:

    >

    > - _File ->10 000 410

    >

    >

    should not be necessary to read _File ten million times. smells like a bug.=