Binary Load - Super Slow on 11.7.2

Posted by ahmed_seedat on 08-Mar-2019 05:51

I have run a binary dump and load on a 280 Gig Database in progress version 11.7.2.

Dump tmei - 30 Minutes

Load time - 11:49 Minutes

Index rebuild - 59 Minutes.

We have noticed in version 11.7.2 the binary load takes long and this should be the quick part of the load.

has anybody else experienced this as we are busy doing maintenance on our clients where we are merging DB's and we doing binary dumps from 1 DB to the merged DB.

All Replies

Posted by George Potemkin on 08-Mar-2019 06:14

> We have noticed in version 11.7.2 the binary load takes long and this should be the quick part of the load.

Dump time highly depends from the used indexes if db is hosted on HDD disks (the ones with seek time). Try to use the smallest indexes.

> Index rebuild - 59 Minutes

What the pararameters were used to run idxbuild?

Posted by ahmed_seedat on 08-Mar-2019 06:20

The following is the Index rebuild

"proutil hollvalicf -C idxbuild -TB 64 -TM 32 -TMB 4096 -SG 64 -TF 80 -T /valdb/tmp/ -threadnum 4 -merg

ethreads 4 -datascanthreads 16 -pfactor 80 -rusage -z"

I think my explanation is not correct. The load is taking 11 hours 49 Minutes. this is the issue.

Posted by George Potemkin on 08-Mar-2019 06:35

> The load is taking 11 hours 49 Minutes.

How do you run the binary load?

Posted by ahmed_seedat on 08-Mar-2019 06:42

I run this single user via a load.sh script. Example below.

proutil hollvalicf -C load /dbbackup/dump/gsc_custom_procedure.bd

proutil hollvalicf -C load /dbbackup/dump/gsc_dataset_entity.bd

Here is the dump command. Example below

proutil hollvalicf -RO -index 0  -thread 1  -C dump gsc_custom_procedure /dbbackup/dump

proutil hollvalicf -RO -index 0  -thread 1  -C dump gsc_dataset_entity /dbbackup/dump

proutil hollvalicf -RO -index 0  -thread 1  -C dump gsc_data_tag /dbbackup/dump

I have run a binary load on 2 X 250 Gig DB's and the load took 3 hours total time. Progress version 11.3.3.

Posted by Kim Davies on 08-Mar-2019 06:50

Obviously the disk you are writing to for the dump is reasonably fast and I assume you are reading from the same disk location when running the binary load, so disk should not be a factor.

Index rebuild is about what you would expect for a database of this size.  So to echo George's request, how do you perform the binary load?

Posted by George Potemkin on 08-Mar-2019 06:51

> I run this single user via a load.sh script.

Can you post all messages of the last load (only from one binary load session)?

Load in multi-user mode would be faster.

Posted by George Potemkin on 08-Mar-2019 07:01

Regarding idxbuild. Here is the results of rather random tests run by a customer of us.

Table size = 982.5G (1.3T with indexes)

17073.806 sec 04:44:33.806 -TMB  256 -TF 80 -B 512  -datascanthreads 128 -mergethreads 32
17110.734 sec 04:45:10.734 -TMB  256 -TF 80 -B 5120 -datascanthreads  96 -mergethreads  8 
17133.188 sec 04:45:33.188 -TMB  256 -TF 80 -B 512  -datascanthreads 128 -mergethreads 16
17134.440 sec 04:45:34.440 -TMB  256 -TF 80 -B 512  -datascanthreads  96 -mergethreads  8 
17229.221 sec 04:47:09.221 -TMB  256 -TF 40 -B 512  -datascanthreads  64 -mergethreads  8
17244.425 sec 04:47:24.425 -TMB  256 -TF 80 -B 512  -datascanthreads 128 -mergethreads  8 
17419.778 sec 04:50:19.778 -TMB  256 -TF 80 -B 512  -datascanthreads  96 -mergethreads  4 
17419.839 sec 04:50:19.839 -TMB  256 -TF 40 -B 512  -datascanthreads 128 -mergethreads  8
17503.518 sec 04:51:43.518 -TMB 1024 -TF 80 -B 512  -datascanthreads 128 -mergethreads 16
17965.673 sec 04:59:25.673 -TMB  256 -TF 40 -B 512  -datascanthreads  32 -mergethreads  8
19762.750 sec 05:29:22.750 -TMB  256 -TF 40 -B 512  -datascanthreads  16 -mergethreads  8
19986.706 sec 05:33:06.706 -TMB  256 -TF 40 -B 512  -datascanthreads  16 -mergethreads  4
22282.903 sec 06:11:22.903 -TMB  256 -TF 80 -B 512  -datascanthreads   8 -mergethreads  4
23180.969 sec 06:26:20.969 -TMB  256 -TF 40 -B 512  -datascanthreads   8 -mergethreads  4
28684.978 sec 07:58:04.978 -TMB  256 -TF 80 -B 512  -datascanthreads   4 -mergethreads  2
29606.891 sec 08:13:26.891 -TMB  256 -TF 40 -B 512  -datascanthreads   4 -mergethreads  2

Posted by ahmed_seedat on 08-Mar-2019 07:28

My biggest concern at the moment is why is the load taking a long time.
 
Here is the log for the load. The size of the file is 146 Gigs.
 
                Thu Mar  7 18:36:01 2019
[2019/03/07@18:36:01.453+0200] P-58220      T-140714784515904 I BINLOAD  : (451)   Binary load session begin for root on /dev/pts/2.
[2019/03/07@18:36:01.453+0200] P-58220      T-140714784515904 I BINLOAD  : (15321) Before Image Log Initialization at block 42  offset 7127.
[2019/03/07@18:36:01.460+0200] P-58220      T-140714784515904 I BINLOAD  : (7129)  Usr 0 set name to root.
[2019/03/07@18:36:01.463+0200] P-58220      T-140714784515904 I BINLOAD  : (6203)  Binary Dump created on Thu Mar  7 17:16:10 2019
from database /valdb/db/hollvalicf.
[2019/03/07@18:36:01.463+0200] P-58220      T-140714784515904 I BINLOAD  : (6204)  Loading table gst_audit, Table number 93
starting with record 1, section 1.
 
                Fri Mar  8 05:58:36 2019
[2019/03/08@05:58:36.810+0200] P-58220      T-140714784515904 I BINLOAD  : (13633) Table /dbbackup/dump/gst_audit.bd has been loaded.
[2019/03/08@05:58:36.834+0200] P-58220      T-140714784515904 I BINLOAD  : (15109) At Database close the number of live transactions is 0.
[2019/03/08@05:58:36.834+0200] P-58220      T-140714784515904 I BINLOAD  : (15743) Before Image Log Completion at Block 40 Offset 602.
[2019/03/08@05:58:36.878+0200] P-58220      T-140714784515904 I BINLOAD  : (334)   Binary load session end.
 

Posted by George Potemkin on 08-Mar-2019 07:46

Binary load did 3.65 MB/sec - writing to db files as well as to bi file.

Can you run the Furgal test?

proutil empty -C truncate bi -bi 16384

proutil empty -C bigrow 2 -zextendSyncIO

Posted by ahmed_seedat on 08-Mar-2019 07:52

I created test DB using 8k. Below is the results.

               Fri Mar  8 09:49:01 2019

[2019/03/08@09:49:01.901+0200] P-10638      T-140587748644672 I DBUTIL   : (451)   Truncate bi session begin for root on /dev/pts/5.

[2019/03/08@09:49:01.901+0200] P-10638      T-140587748644672 I DBUTIL   : (15321) Before Image Log Initialization at block 0  offset 1004.

[2019/03/08@09:49:01.905+0200] P-10638      T-140587748644672 I DBUTIL   : (7129)  Usr 0 set name to root.

[2019/03/08@09:49:01.906+0200] P-10638      T-140587748644672 I DBUTIL   : (1620)  Before-image cluster size set to 16384 kb.

[2019/03/08@09:49:01.910+0200] P-10638      T-140587748644672 I DBUTIL   : (123)   .bi file truncated.

[2019/03/08@09:49:01.911+0200] P-10638      T-140587748644672 I DBUTIL   : (15109) At Database close the number of live transactions is 0.

[2019/03/08@09:49:01.911+0200] P-10638      T-140587748644672 I DBUTIL   : (15743) Before Image Log Completion at Block 0 Offset 1004.

[2019/03/08@09:49:01.926+0200] P-10638      T-140587748644672 I DBUTIL   : (334)   Truncate bi session end.

               Fri Mar  8 09:49:25 2019

[2019/03/08@09:49:25.142+0200] P-11384      T-139803562612544 I DBUTIL   : (451)   Bigrow session begin for root on /dev/pts/5.

[2019/03/08@09:49:25.142+0200] P-11384      T-139803562612544 I DBUTIL   : (15321) Before Image Log Initialization at block 0  offset 0.

[2019/03/08@09:49:37.200+0200] P-11384      T-139803562612544 I DBUTIL   : (7129)  Usr 0 set name to root.

[2019/03/08@09:49:37.200+0200] P-11384      T-139803562612544 I DBUTIL   : (6600)  Adding 2 clusters to the Before Image file.

[2019/03/08@09:49:43.516+0200] P-11384      T-139803562612544 I DBUTIL   : (15109) At Database close the number of live transactions is 0.

[2019/03/08@09:49:43.516+0200] P-11384      T-139803562612544 I DBUTIL   : (15743) Before Image Log Completion at Block 0 Offset 235.

[2019/03/08@09:49:43.529+0200] P-11384      T-139803562612544 I DBUTIL   : (334)   Bigrow session end.

[root@hssmiplive-ind:tmp]# l

total 102288

drwxr-xr-x 2 root root      4096 Mar  8 09:49 .

drwxr-xr-x 9 root root      4096 Mar  8 06:20 ..

-rw-r--r-- 1 root root 100794368 Mar  8 09:49 test.b1

-rw-r--r-- 1 root root   3276800 Mar  8 09:49 test.d1

-rw-r--r-- 1 root root    655360 Mar  8 09:49 test.db

-rw-r--r-- 1 root root      3502 Mar  8 09:49 test.lg

-rw-r--r-- 1 root root        33 Mar  8 09:48 test.st

Posted by ChUIMonster on 08-Mar-2019 07:57

11 hours 49 minutes to load something that took 30 minutes to dump is certainly not the expected behavior.

Have you previously executed similarly sized binary loads *on this hardware* with other versions of Progress that performed better?  I ask because you specifically indicated that you think 11.7.2 is the cause of the slowness and I see that you have had faster results with 11.3 but was it the same HW?  (And if it was has anything been changed?)

My first guess is that the load target is on a SAN, probably with RAID5 (or, worse, 6) and that the disks are saturated.  Although that should have also impacted the index rebuild so that idea may not hold water.

I think it would be very interesting to run the Furgal test (ProTop automates this with syncio.sh) to either confirm or rule out a write performance issue.

Posted by Libor Laubacher on 08-Mar-2019 07:58

> proutil hollvalicf -C load /dbbackup/dump/gsc_custom_procedure.bd

Try running with -i and -r as well.

Posted by George Potemkin on 08-Mar-2019 08:01

4 bi clusters (64 MB) were added in 12.058 sec, 2 bi clusters (32 MB) - in 6.329 sec. In other words, bi wites are 5.06-5.31 MB/sec. Binary load did 3.65 MB/sec. But it wrote to db files as well.

Posted by George Potemkin on 08-Mar-2019 08:03

> Try running with -i and -r as well.

Plus in multi-user mode with BIW/APW running.

Posted by ChUIMonster on 08-Mar-2019 08:03

Libor is right.  I can't believe I didn't see that...

(Of course that means that you should have a good backup first.  Or be willing to throw away the target.)  

Posted by ahmed_seedat on 08-Mar-2019 09:23

The Furgal Test:

[root@hssmiplive-ind:ahmed]# /apps/mipusr/protop/bin/syncio.sh

Procopy session begin for root on batch. (451)

Database copied from /apps/dlc117/sports. (1365)

Procopy session end. (334)

OpenEdge Release 11.7.2 as of Tue Oct 24 18:20:59 EDT 2017

Before-image cluster size set to 16384 kb. (1620)

-rw-r--r-- 1 root root 8192 Mar  8 11:20 sports.b1

Please wait...

real    0m18.161s

9 seconds =  10MB/sec -- which is barely acceptable, anything longer and your disk susbsystem is junk.

5 seconds =  20MB/sec -- "ok"

3 seconds =  30MB/sec -- "good"

1 second  = 100MB/sec -- which is "excellent!"

Posted by ChUIMonster on 08-Mar-2019 09:27

So it is also inconsistent.  Or getting worse under load as the day goes on.

Posted by George Potemkin on 08-Mar-2019 09:38

> So it is also inconsistent.

It's consistent:

First run - 18.387 sec

Run as syncio.sh - 18.161s.

But 5 MB/sec is very slow.

Posted by ahmed_seedat on 08-Mar-2019 18:52

I ran the dump and load using the Dictionary Dump, Bulk load and Index Re-Build and here are my results on the same DB and same server and same Disk Sub System.

Dump - 15:36 - 17:47 - 2 Hours 11 Minutes.

Load - 17:58 - 19:26 - 1 hour 28 Minutes

Index Rebuild - 19:45 -20:31 - 46 Minutes

Can somebody explain the difference in time for the Binary Load that took 11 Hours 49 Minutes compare to the Bulk Load that took 1 Hour 28 Minutes.?

Posted by mfurgal on 08-Mar-2019 18:56

Bulkload by uses –i.  Binary load does not, but –i can be specified.  If you use –i on binary load, how long does it take?

Mike
-- 
Mike Furgal
Director – Database and Pro2 Services
PROGRESS
617-803-2870 


Posted by Paul Koufalis on 08-Mar-2019 18:59

Is your bulk load test using a new DB or did you procopy empty into the pre-grown shell of the first test? There may be a cost to growing the data files.

Did you try the binary load with -r ?  -r gives you the benefits of -i (FS cached writes) without some of the drawbacks.

Posted by ahmed_seedat on 08-Mar-2019 19:28

@Mike - I will need to re-run the Binary Dump and Load again. Will try this now. Will run the load with -i and -r. I am using the same DB with the same data so this gives me consistency.

@Paul - I created new DB's and the extents grew as I ran the load. But this could not have an impact on Binary Load of 11 Hours 49 Mins compared to Bulk Load of 1 Hour 28 Mins.

My opinion is it seems to be a Progress 11.7.2 issue.!!

Posted by Paul Koufalis on 08-Mar-2019 19:34

[mention:94ec1a223dfc4a1f959f430a2ceea152:e9ed411860ed4f2ba0265705b8793d05] wrote that bulk load runs with -i so that seems to be the most likely explanation for the discrepancy.

If you have plenty of time, I think that running a dictionary load with all indexes deactivated should simulate a bulk load without the -i.

In summary, these are the various combinations:

1. Binary load w/out -i / -r: 12 hours

2. Binary load w/ -r: TO DO

3. Bulk load (default w/ -i): 2h

4. Bulk load w/out -i (simulated by dict load with inactive indexes): TO DO

Posted by ahmed_seedat on 08-Mar-2019 22:05

I added the -i -r option on the binary load and i have the following results.

Binary Dump - 25 Minutes

Binary Load - 50 minutes - Time down from 11 Hours 49 Minutes to 50 Minutes.

Index Rebuild - 52 minutes

Posted by Paul Koufalis on 08-Mar-2019 22:07

That makes a whole lot more sense!

Posted by Paul Koufalis on 08-Mar-2019 22:07

BTW you don't need both -i and -r. -r alone will suffice.

Posted by ahmed_seedat on 08-Mar-2019 22:13

Thank you all for all your input. It has been a day of learning.

This thread is closed