I was planning to add some indexes to a table with 20 years of data and about 1,000,000 rows per year (a relatively small table, compared to others). I will be dropping an index and adding two others. Here are the two new indexes:
CREATE INDEX "my_tbl7" ON PUB."my_tbl" ("fy" ASC , "period" ASC , "div" ASC , "center" ASC , "acct" ASC) AREA "my_tbl_idx" PRO_DESCRIPTION '' PRO_ACTIVE 'y';
CREATE UNIQUE INDEX "my_tbl8" ON PUB."my_tbl" ("cy" ASC , "cperiod" ASC , "div" ASC , "center" ASC , "acct" ASC, "cdetail" ASC) AREA "my_tbl_idx" PRO_DESCRIPTION '' PRO_ACTIVE 'y';
What surprised me is that this takes many hours longer than an alternative approach whereby I create the indexes as "PRO_ACTIVE 'n' ", then activate via the idxbuild command:
proutil lumbertrack -C idxbuild table my_tbl -B 5000 -T \usr\tmp -rusage -thread 1 -threadnum 4 -SG 64 -TB 64 -TM 32 -TMB 1024 -TF 80 -datascanthreads 12 -mergethreads 4
The idxbuild variation only takes a few minutes. It seems odd that there would be such a vast difference in performance. Ideally the two approaches would be done in a similar amount of time. The first approach substantially under-utilizes the disk and CPU. But unfortunately that is our normal procedure for schema changes and we avoid running proutil commands except in fairly unusual circumstances.
IMHO Adding a couple of indexes doesn't seem to qualify as an unusual circumstance. Yet the difference in performance seems extreme, so perhaps we should be thinking about this type of schema change differently than we have in the past.
Is it a common thing for OE dba's to resort to using "proutil" commands when adding indexes? I see that there is a KB out there about this, and I'm wondering if we have been doing it wrong. (see https://knowledgebase.progress.com/articles/Article/000041377 ) I suppose it may depend on whether the dba can tolerate waiting for many hours to perform a basic schema change.
the OpenEdge database index build utility in proutil uses a much faster and more optimised algorithm to create an index as compared to the algo used by the OpenEdge SQL engine.
the command line tools are generally faster for most database maintenance operations. this has always been so.
It is unfortunate that we should do our schema maintenance one way for tables under 1,000,000 rows and another way for tables greater than that. The difference in this case is a few minutes vs a few hours (ie. extreme).
I don't think most managers would be that comfortable shelling to the command line and running "proutil" commands as a standard operating procedure. Generally speaking, the task of applying our standard monthly dictionary changes is done by one role/individual, and the task of running proutil commands against the database is done by a different type of role/individual (and that happens in exceptional circumstances).
This seems like this is a fairly significant limitation in scaling up an OE database.
> Is it a common thing for OE dba's to resort to using "proutil" commands when adding indexes?
It's not an option if db is participating in OE replication
>> It's not an option if db is participating in OE replication
Its odd that the KB doesn't mention this limitation.
We do have OE replication in production. Hopefully the proutil command would have stopped us rather than breaking replication.
> Hopefully the proutil command would have stopped us rather than breaking replication.
Yes. OE Repl requires AI. A proutil idxbuild won't run when you have AI enabled.
"You cannot perform this operation with after-imaging or 2phase enabled (1525)"
For those who is using any kind of replication (OE replication or AI replication), meaning running After-images, proutil idxbuild will not work as mentioned above.
For big tables one needs to add indexes inactive and then run proutil idxactivate. That is a great addition to the available tools.
Note, You have to be on 11.7.2 and higher or version 12. And use -usernotifytime as a database startup parameter to be able to activate an index online without kicking everyone off.
That would be all nice and fine if idxactivate would run multi-threaded like idxbuild. But unfortunately that is not the case !!!
There is the big, huge speed difference between the two. And when idxbuild generally runs fast enough for all cases that I have seen, idxactivate could run for days, months and even years.
In my case I was initially adding inactive indexes and then using "idxbuild". You are right that it was very fast and it fully utilized my CPU and disk. (I think the bottleneck was actually on the CPU cores, given the number of threads that I had requested in the proutil command.) Unfortunately we will NOT be able to do the "idxbuild" in production where we have replication enabled.
My recollection is that the idxactivate didn't seem much faster than simply making the indexes initially active in the first place (via SQL92). I may have to test that again to be sure....
It is too bad that these types of schema operations are limited by the clockspeed of a single core. It has been quite a long time since multi-processor servers became normal. At these moments I realize how fortunate I am to do much of my development work on SQL Server, where parallel execution plans are generated on a regular basis for all types of different SQL queries, and DDL operations.
To advocate Progress: idxbuild can use the methods that only work if data are not changing. Would you agree to use online idxbuild that locks the whole table while the utility is building one of table’s indexes?
The ones who complain about slow online index build are ignoring their duty to check periodically index health. Otherwise you would complain about slow index check. The one of our customers with large database was recently tried to run the option 1 of idxfix (scan records for missing index entries) for only one index. The estimated time to complete the scan was 16 years. They did not try to use the option 2 (scan indexes for invalid index entries) – it could take the hundreds of years. What could they do when they will face the critical index corruption but they will not have the full list of the records with the missed index keys? What could they do when they will get at least one (just one!) invalid index key? Only to run offline idxbuild. For their data it would take one or more days and it’s, or course, absolutely not acceptable for their business.
>> Would you agree to use online idxbuild that locks the whole table while the utility is building one of table’s indexes?
Absolutely, if the outage window is reasonable (as is the case when using idxbuild with 16 cores running at 4GHz with SSD). Databases operate by locking resources as a matter of course. Locking a table is conceptually similar to locking all the records within the table. We can reword your question and ask: would you agree to allow users of a database to lock multiple records in a table for a reasonable window of time?
(... and this brings to mind another key SQL Server feature - lock escalation! We have our "-L" in production at 500,000, and there have been times when this was actually needed! It would be extremely hard to imagine SQL ever taking that many locks on individual records at any one moment of time.)
I tested the "idxactivate" and it seems just as slow as adding the index in the normal way (as an activated index via a SQL DDL statement). It sounds like we are going ahead with the "idxbuild" approach. Unless we want the operation to be dragging on indefinitely, the idxbuild seems to be the only realistic option (at least for moderately large tables).
We will just have to break, and re-establish OE replication again after the change. Fortunately that OE replication topic is only relevant to our production environment. In all our pre-production databases we haven't configured it.
This seems like a lot of hoop-jumping just to add a new index!
> On Jan 11, 2020, at 9:39 PM, dbeavon wrote:
> This seems like a lot of hoop-jumping just to add a new index!
i completely understand your point. however, (as of now) this action is a non-trivial event with respect to replication as the index must be created on the replica side as well when using openedge replication which is based on transaction log records.
a worthwhile enhancement would be for replication to ignore updates for indexes it does not know about.