A hundred million rows? - Forum - OpenEdge RDBMS - Progress Community

A hundred million rows?

 Forum

A hundred million rows?

This question is not answered

I'm having trouble with a table that has a total of about a hundred million rows.  Is that the limit of what the DBMS is capable of? 

For example adding new indexes is extremely painful.  On top of everything else, I think I've run into a new bug where the OE database refuses to create an active, unique index via a DDL statement.  The statement runs for about 20 mins and then the disk rate slowly drops off to zero, but the CPU remains on 100% indefinitely.  I knew that adding the active index via DDL was going to take a lot longer than using the proutil/idxbuild...  but I didn't imagine that it would run forever!

In SQL Server I can do a hundred million rows (per year) pretty easily, especially if I give it a clustered index of my own choosing (usually some type of custom, surrogate key/s). 

We are using the enterprise edition of the OE RDBMS, but haven't purchased the "table partitioning" feature yet.  I suppose that is the next step forward, right?  Can someone tell me how my table compares to their biggest table (in a single database *without* partitioning).  I'd also like to know how you would suggest adding an index to it, if you were allowed a large outage window, eg 10 hours or so.

All Replies
  • No such limit.

    I just took a quick look at a db that I manage which is easy for me to take a peek at.  The largest table in terms of number of rows is: 468,323,164.  There are others that are more of a pain to login and look at that are probably quite a bit larger.

    There are indirect limits on number of rows related to storage area type, block size, and storage area rows per block.  But unless you have a type 1 storage area it is extremely unlikely that you need to worry about that. Those hypothetical limits are huge.  If you do have data in type 1 areas then that is a serious problem that should have been fixed a long time ago and which should be a very high priority for your DBA team.

    When you say "via a DDL statement" are you implying that you are using SQL to add the index?

    If I had a 10 hour outage window to work with I would add the index inactive and do a simple offline proutil idxbuild.  I wouldn't expect it to take more than an hour if I had decent modern hardware and a reasonably up to date release of Progress.

    --
    Tom Bascom
    tom@wss.com

  • > Can someone tell me how my table compares to their biggest table (in a single database *without* partitioning).

                                                               -Record Size (B)-           ---Fragments--- Scatter
    Table                                    Records    Size   Min   Max  Mean                Count Factor  Factor
    PUB.table                            24987814259    3.4T    44 32017   151          24988196147    7.1     1.0
    
  • wrt type ii data areas, tom said:

    > On Jan 13, 2020, at 11:17 AM, ChUIMonster wrote:

    >

    > Those hypothetical limits are huge.

    That's what i said in the v6 days. and i remember when the first customer hit the 2 gb /database/ size limit.

  • When I read the original post, I thought to myself ... just wait until George responds, then the hammer will fall! :)

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice  http://www.cintegrity.com

  • Forgot to add - I don't see a reason to bring table partitioning into the picture.  It *might* make sense but I wouldn't do that merely because of 100 million rows.  I would want to understand the usage of that data and how that usage might be improved via partitioning before I pursued something like that.

    --
    Tom Bascom
    tom@wss.com

  • you win, george :)

  • indeed. table partitioning can be quite useful, especially for data which are historical or time-related and older data are write-locked and purged when retention period has expired. but for sheer table size, may be counter-productive, depending on queries used and types of updates performed.

  • >> When you say "via a DDL statement" are you implying that you are using SQL to add the index?

    Yes.  For the majority of our database schema it is possible to maintain it via SQL statements.  But it seems as if there is a limit that you eventually reach based on the size of data, where you have to switch modes and start using proutil commands as part of the workflow.  It feels like "cheating"... mainly since proutil doesn't feel like a regular part of the DBMS, but more like a back-door tool for editing the internal files *without* the help of a DBMS layer.  (Perhaps it would at least feel less like "cheating" if this stuff could be accomplished with the standard data dictionary user interface).

    >> If I had a 10 hour outage window to work with I would add the index inactive and do a simple offline proutil idxbuild.

    Yes, I think I'll have to start factoring in the need for proutil whenever working with our larger OE tables.  I guess the SQL/DDL interface isn't cut out for that.

    >> PUB.table                            24,987,814,259

    Wow, I had to put the commas in there to see that this is table that is 250 times more difficult to manage than the one I'm dealing with.

    I'm betting that is a table whose schema is rarely changed.  Maybe in its own database?  Full backup only once a year?  What happens if a user kicks off the wrong query?  It hurts to think about...

  • Dr Evil wins with one hundred billion rows!

  • I reported this case to Progress.  My table is under 100 million rows but it is still quite difficult to manage.  If Progress doesn't intend for large tables to be managed via SQL statements then hopefully that can be documented in the KB.  I've never heard anything to that effect.  

    >> If I had a 10 hour outage window to work with I would add the index inactive and do a simple offline proutil idxbuild

    Would it change the answer if you had to break replication and re-enable it again (and your database was in the ballpark of 1 TB?)  This becomes a difficult compromise because the idxbuild runs so fast, but having to re-enable replication is a pain (although that can be done outside of the outage window).

    >> PUB.table    24,987,814,259

    I would still love to hear more about this table!  I believe you ... but it seems very hard to envision how we could ever manage a table like this ourselves.  And I want to get the whole story in case I ever need to retell it to someone.  

    The main thing I'm wondering about is how many years of data this includes, and whether you keep a window/range of dates or if the data grows indefinitely.  I'm also wondering if you would consider simply archiving or deleting much of this data in the event of a schema change (like a new index, or a new column of data that was derived from other pre-existing columns).  

  • If replication is involved then "it depends".  The db that my 468 million row example comes from is almost exactly 1TB.  I can back that up in about 90 minutes and restore it in a bit less than an hour (which is the bulk of the effort required to re-initialize replication).  So that still fits pretty easily into a 10 hour outrage window.

    As it happens I do not generally have 10 hours to work with.  So in reality I would probably add it inactive, leave replication running, and build it online with a combination of idxfix and idxactivate.  See this: pugchallenge.org/.../318_IndexMaintenance.pdf

    OTOH, as I recall, you are on an old klunky HPUX server with some sort of horrifically awful SAN (which is sort of from the department of redundant statements department).  There's an argument to be made that you could get migrated to Linux before that index will be built.

    --
    Tom Bascom
    tom@wss.com


  • I'm guessing the "10 hour outrage window" is an apt description? Or was it a Freudian slip?

    Asking for a friend.

  • > I would still love to hear more about this table! I believe you ... but it seems very hard to envision how we could ever manage a table like this ourselves. And I want to get the whole story in case I ever need to retell it to someone.

    BTW, there are the tables with similar size in the other branches of the same customer.

    The customer does not add new indexes to this table.
    The existent indexes:

    Table                      Index  Fields Levels         Blocks    Size  % Util  Factor
    PUB.table
      index1                       9       3      5       53213748  114.8G    28.4     2.4
      index2                      10       3      4       21102108   89.1G    55.6     1.9
      index3                       8       6      5      144165188  680.6G    62.1     1.8
      index4                      11       2      4       30022068  129.1G    56.6     1.9
    
                        ------------------------------------------------------------------
    Totals:                                              248503234 1013.7G    53.6    1.9
    

    There is a real problem with such large tables: it's not possible to check the health of its indexes for a reasonable timeframe. The fastest way is to build the indexes from a scratch. We can only trust that the indexes are not corrupted.

  • >> you are on an old klunky HPUX server ...

    Is there any another kind of HPUX server?  Do you suppose anyone is buying the "new kittson" version of the itanium chip? (Especially now that HPUX has an end-of-life date!)

    We do much of our preproduction work on an old processor (Intel(R)  Itanium(R)  Processor 9350 1.73 GHz).  It has 8 logical processors which is fairly decent for a development box.... but any time our OE -related work is being done single-threaded (as SQL "create index" appears to be) then we have lots of trouble.  Thankfully we have finally made a tentative plan to move off of HPUX but it is going to take a while!  We are moving to Windows Server on modern 4 GHz x64 chips.   Both PASOE and the OE database are running pretty smoothly on Windows, based on my testing.  And the Progress tech support cases we have opened have gone much smoothly now that they are based on the Windows platform rather than HPUX.

    Since we are already on the topic of HPUX, has anyone heard of "project kiran" aka "Portable HP-UX"?  In theory it is a binary instruction translator that will allow IA64 code from HP-UX to run seemlessly on Linux x64, to help with migrations....  I don't think it is supposed to be a full VM as such, but something along those lines.  It seems a bit far-fetched to me and I can hardly find any references to this project in my searches (maybe it is supposed to be a secret?)  The only place I found something was in a youtube video: www.youtube.com/watch

    Even if HPE is working on a project like that, it is doubtful that Progress would ever support the hosting of their OpenEdge product within that type of a container.  I don't have much hope for "project kiran" ... but it would be interesting to hear what other HPUX customers think about it.  I suspect this is just giving people false hope, and a reason to drag feet while staying on the expensive HPUX hardware for a few more years.

    >> The customer does not add new indexes to this table.

    Makes sense.  I suppose much of the data itself is static too.  The size of "index3" alone is huge!  It is as big as you might expect for an entire ERP database.

  • > I suppose much of the data itself is static too.

    No. Table activity per sec:

    Time     Reads  Updates Creates Deletes
    04:30:02 2620.93  11.08  488.19   87.94
    05:00:01 1230.22   8.00  389.70  236.52
    05:30:05 1273.29   1.57  851.79  204.68
    06:00:03 1239.86   2.39  761.11  190.76
    06:30:02 1411.50   0.96  347.13  222.14
    07:00:01 4008.59   1.06  417.46  245.09
    07:30:02 7769.64   1.82  346.03  232.30
    08:00:03 2848.69   1.93  357.93  230.18
    08:30:02 2120.96   2.81  765.26    5.71
    09:51:16 1775.03   0.04   19.70    0.01
    10:00:06 5874.99  13.08  265.98    1.32
    10:30:02 5062.48  13.13  397.14    0.15
    11:00:03 1119.75  34.45  573.81    0.92
    11:30:01 1474.63  59.05  723.82    0.62
    12:00:04 2793.17  71.65  639.60    0.74