Binary Dump and Load - minimise downtime

Posted by James Palmer on 02-Mar-2016 11:05

10.2B08 (yes I know). 

Just wondering what methods are out there for minimising down time on a binary D&L? We have a number of clients who are still running Type I storage on the basis they can't understand why so much downtime is required. 

Any thoughts much appreciated. 

All Replies

Posted by Keith Sudbury on 02-Mar-2016 11:27

Quite a bit of your options are going to depend on how your existing Type I areas are set up (tables/indexes per area) , your available hardware/OS and if you really need to do everything at once or can break into multiple maintenance windows (doing selected tables during each window).

The short version is multi threading the process as much as possible and avoiding any dead times. Like waiting for the dump to completely finish before starting the load.

In most cases I have had better luck with the following overall approach:

1) Running multiple proutil dump processes at the same (some with threads and some without) with the server running with a large -B. Large -B and server running can be skipped if you don't have a lot of mixed table areas.

2) Loading the data into the new database with all of new the extents allocated, -i and reasonable startup settings. Don't build indexes as you load.

3) Running a single multithreaded idxbuild per area once the load is completed for that area.

The big variations are going to be tables/indexes that are meaningful percentages of the database and if you are dumping and loading on the same hardware.

It isn't easy to come up with a generic solution that works for every database/server but these have served me pretty well over the years.

Posted by Thomas Mercer-Hursh on 02-Mar-2016 12:01

I believe Bravepoint has a product intended to migrate the bulk of the data in advance, limiting the amount that needs to be transferred during the down time.  I also recall Tom Bascom having given a paper relating to this ... but I don't remember any details since it was not a paper I attended.  Which said, a lot of what is possible might depend on the hardware resources, so you might say something about what those usually are.  Obviously, there are a lot of possibilities moving to new hardware that don't exist staying on the same machine, especially if that machine is just barely enough to run the application.

Posted by George Potemkin on 02-Mar-2016 12:15

Addition to what Keith Sudbury has wrote: carefully choose an index for dump. Dump time of the same table may vary by the tens times depending from the chosen index.

But instead of option 1) I'd suggest to dump db in read-only mode. No resource locks (and no resource conflicts between sessions). And each session will use its own (small) buffer pool.

Posted by Keith Sudbury on 02-Mar-2016 13:27

George is correct about the index you choose for the dump impacting the speed of the dump. Take care that it doesn't cause performance issues when the data is loaded though.

As for using -RO for dump.. I have found some cases where that did indeed work better than with the server running. It seems to be directly related to how heavily (or not) your tables are intermixed with each other (Type I). Both are valid approaches to try on your specific databases.

Bravepoint does have the PRO D&L product  www.bravepoint.com/products-pro-dump-and-load.shtml

but there is obviously a cost involved (and most likely some replication triggers added like the PRO2 series). It may or may not make sense if you are just trying to convert a few customers from Type I to Type II.

Posted by gus on 02-Mar-2016 15:04

George is right. However, there is more to the story.

If you choose a dump index with the goal of minimising dump time, this will also establish the order in which the records are loaded. this means that theordering imposed by the index you used for the dump will match the phyiscal storage order. That index will therfore have the highest cluster ratio and an index or table scan using that index will require fewer disk reads and (often more importantly) fewer disk seeks than using another index.

This may not be desireable if the application is doing a high percentage of its queries via another index - performance will be higher (possibly a /lot higher/ ) if you dump each table by an index preferred by the application.

Posted by Thomas Mercer-Hursh on 02-Mar-2016 15:19

Gus, how true is this in practice.  I can see that, if one is doing a table scan regularly and there is a benefit in doing the table scan in the order of a particular index, that the performance would be better if the physical layout matched the logical one.  But then, one rarely wants to be doing a table scan and, with a sufficiently modern version of Progress, using the table-scan option *instead* of any index is going to be preferred.  Certainly, finding individual records the physical order should be irrelevant.  I suppose I can see that, if one is finding a group of records and the bracketing is occurring on the same index which determined the physical ordering, that there might be a slight benefit, but how common or significant is that really.  I thought the idea these days was that routine dump and load as a maintenance strategy was not warranted.

Posted by George Potemkin on 03-Mar-2016 00:36

BTW, how we can estimate how many records in average are returned per query?

Too detailed information: -logentrytypes QryInfo

For remote clients we can use promon -> Activity: Servers -> Records sent vs. Queries received

I has checked promon data for one customer and the typical ratio is 2-3. For another customer who uses the same application the ratio is 10.

Are there any other methods to estimate the average length of the query result lists?

Posted by Keith Sudbury on 03-Mar-2016 08:24

How much this matters varies greatly by application and table size. If you are lucky enough to be able to cache your entire working set in one of the buffer pools and you don't shut down the DB nightly... you aren't likely to notice the difference.

The benefits come from a potential reduction in the number of blocks needed to buffer the "active" records, how many buffer blocks must be read to satisfy a query and by maximizing the benefit of the read ahead logic on most SAN/Volume managers when the blocks must be read from disk.

Back in the 32 bit days this was much more of an issue for me personally with one specific app and a few specific tables. I could see a 40% improvement in report processing when the core table was loaded in the same order that the application used it. This was a 2TB database with several key tables that were in excess of 20GB in size that had close to a billion reads per day... certainly not your average OE install.

Type II areas and 64 bit buffer pools have certainly reduced the impact but I still see issues from time to time, only on very large and very active tables. I am not quite ready to declare D&Ls dead and buried, but they certainly aren't needed for most tables on a regular basis.

Posted by George Potemkin on 03-Mar-2016 09:08

For terrabyte databases the main factor that affects the performance of the reads from disks is a seek time. Slowness is growing much faster than db size. Also the record fragmentation "hurts" performance much stonger in the large databases. In presentation "DBAnalys To The Rescue!" we had an example:

Table size: 25.5G

20% records are fragmented

33.6% blocks are on RM Chain

Reads of the whole table from disk was 35 times slower than when all records are not fragmented.

Fortunately the record fragmentation can be fixed online.

Seek time factor can be eliminated by SSD disks.

Posted by Dmitri Levin on 04-Mar-2016 16:53

Anyone interested in fixing fragmentation online?

Posted by gus on 09-Mar-2016 08:48

> On Mar 4, 2016, at 5:54 PM, Dmitri Levin wrote:

>

> Anyone interested in fixing fragmentation online?

if you come to the pug challenge at the end of june, you will find out how to do that and also a online table move that will not blow up the before-image log.

Posted by gus on 09-Mar-2016 08:54

> On Mar 2, 2016, at 4:20 PM, Thomas Mercer-Hursh wrote:

>

> Gus, how true is this in practice. I can see that, if one is doing a table scan regularly and there is a benefit in doing the table scan in the order of a particular index, that the performance would be better if the physical layout matched the logical one.

in practice, here is a real example from a production system i did last Sunday:

two tables of almost the same size.

table 1 has 60,367,550 rows with 4.7 GB of data.

table 2 has 56,560,647 rows with 3.4 GB of data.

binary dump of table 1 takes 13 minutes

binary dump of table 2 takes 86 minutes

the worst case is /much, much, much/ worse than this. fortunately it is not common.

Posted by George Potemkin on 09-Mar-2016 08:59

> if you come to the pug challenge at the end of june, you will find out how to do that and also a online table move that will not blow up the before-image log.

Is it something new that comes in future Progress versions?

I had planned to write the variations of AreaDefrag.p program. I'd be happy if I can cancel my plans. ;-)

Posted by ChUIMonster on 09-Mar-2016 09:09

You'll have to come to PUG Challenge and find out!

:)

But I happen to know that you do not have to wait for a future version.

Posted by George Potemkin on 09-Mar-2016 09:11

> I can see that, if one is doing a table scan regularly and there is a benefit in doing the table scan in the order of a particular index, that the performance would be better if the physical layout matched the logical one.

I run the tests with idxfix. Its option 2 (scan indexes) works exactly as 4GL queries. Idxfix spent a few hours to scan one index and it needed approximately an /year/ (sic!) to scan another index of the same table.

> But then, one rarely wants to be doing a table scan and, with a sufficiently modern version of Progress, using the table-scan option *instead* of any index is going to be preferred.

The table-scan option can be very very slow as well if the records are fragmented. For example, 10% fragmented records can slow down the reading by 4000% .

Posted by Keith Sudbury on 09-Mar-2016 09:22

Not to mention that table-scan isn't really a valid option if you just want to read a few million records from a table with half a billion rows :-)

Posted by ChUIMonster on 09-Mar-2016 09:22

It doesn't happen every day on every database but it can make a very big difference.

IMHO the biggest offenders tend to be old databases that still have everything in the schema area or which have very poorly thought out type 1 storage areas.

Posted by Thomas Mercer-Hursh on 09-Mar-2016 09:26

Yes, Gus and George, I understand that traversing the entire file according to a specified index, that there is a significant advantage to having the table be in physical order matching that index.  My question relates to my expectation that this is a rare requirement, at least in a version modern enough to have table-scan.  Most queries are going to be either individual records or a group of records bracketed on any one of several indices which are unlikely  to match physical order.   Fragmentation is a separate issue.

Posted by gus on 09-Mar-2016 09:42

> On Mar 9, 2016, at 10:23 AM, Keith Sudbury wrote:

>

> Not to mention that table-scan isn't really a valid option if you just want to read a few million records from a table with half a billion rows :-)

zacly! that is why we have indexes.

Posted by James Palmer on 19-Jul-2017 17:00

Going back to this thread, again.

I'm doing a test D&L of a customer database because they have index corruption meaning it's as good a chance ad any to do this for them. The vast majority of tables have dumped and loaded very quickly, but some haven't. One table in particular is very large and I want to try and select a better index for the dump.

Ignoring which index is the most used int he application, how would I use this dbanalys output to decide on the quickest index for the dump?

PUB.Entrylocatn
  confirmed                  173       7      3          60195  468.4M   100.0     1.0
  Confirmed-JobId            617       2      3          10510   81.7M    99.8     1.0
  Confirmed-OrigJobId        618       2      3          10508   81.7M    99.8     1.0
  confirmed-trx-reference    174       3      3          60551  471.1M    99.9     1.0
  cross-dock                 175       8      3          34898  271.5M    99.9     1.0
  entrylocatn                176       6      3          98477  766.2M    99.9     1.0
  GTIN                       619       1      3          10443   81.2M    99.8     1.0
  location                   177       5      3         115438  897.5M    99.9     1.0
  Pallet-No                  620       1      3          15015  116.8M    99.9     1.0
  pending                    178       5      3          33938  264.1M    99.9     1.0
  picker-stats               179       3      3          10451   81.2M    99.8     1.0
  picking-daily-stats        180       3      3          10451   81.2M    99.8     1.0
  po                         181       1      3          10445   81.2M    99.8     1.0
  SSCC                       621       1      3          10443   81.2M    99.8     1.0
  transaction-reference      182       1      3          36455  283.7M    99.9     1.0
  unique-item-number         172       5      3          87428  680.3M    99.9     1.0
  unit-set                   183       2      3          12704   98.8M    99.9     1.0

Posted by cjbrandt on 19-Jul-2017 18:52

In my experience either a single field index or the -index 0 option is the quickest to just dump.

the -index 0 option may require type 2 SA.  Someone else will remember and update.

Posted by Rob Fitzpatrick on 19-Jul-2017 20:53

If the table is in a Type II area then you can use -index 0.  If you try it with a table in a Type I area you won't get an error message, it will just use the primary index.  If you have a Type I area, I would try dumping with one of those small (81 MB) indexes.

Posted by George Potemkin on 20-Jul-2017 00:01

James, you can check the index logical scatter factors:

ftp.progress-tech.ru/.../SeekRatio.p

Change FOR EACH _File query to select the tables you need. Change ReadLimit (10000) to specify how many records you are going to read for an estimation of the logical scatter factors.

The small size indexes will be the ones of the best but you will have more choices based on the logical scatter factors.

Posted by James Palmer on 20-Jul-2017 02:54

Thanks everyone. The primary index is unique-item-number and it takes a while! I think I'll try transaction-reference because it's smaller, and it's an index we use a lot in the application.

Posted by James Palmer on 20-Jul-2017 06:41

I'm running a number of dump processes concurrently. At the moment I'm single threading the majority of them, but, say I'm doing 8 concurrent dumps on an 8 core box, is there any benefit to be gained from multi-threading them at all?

This thread is closed