Data Extraction...the hard way... - Forum - Community Groups - Progress Community

Data Extraction...the hard way...


Data Extraction...the hard way...

  • I have some old (version 9 and older) Prorogress database files that I need to pull data out of....without the nice friendly help of "Progress" itself.

    I no longer have access to a running version of Progress that supports these files. I've found whitepapers that discuss the organization of blocks in the database.

    I'm used to working with binary data, so packed fields, integers, floating point, and offset pointers, block-id's and such don't scare me.

    So far...I'm able to move through the databased from 1024 block to block...and I'm picking out record blocks.

    For each record block I'm following the local offsets in the block header to each record within that block and I'm reading what is most always the record length and table identifier.  Then I'm able to follow the field lengths and move from field to field and export data.  This has worked well for several tables. ...but some records seem to have "more complex" record headers.  Not just "four bytes".  

    Does anyone have technical documentation on the structure of individual record headers?

    Any guidence is welcome.


  • I would expect it's a lot easier - and safer - to send the db to someone who has 9.* and ask them to dump it for you.

    AFAIK, there's no published doc on the internal structure as it can change over time depending on what engineering decides to do with it.

  • Yes, might be much easier if contact some outside company...and send them the data base files....which I suppose they would have to use developer tools to "load" the database. and I would have to explain which tables I wanted etc.  This "easier" way could end up costing me thousands of dollars with some sort of data recovery service.

    I susspect the more recent Progress databases have increased in complexity. Exploring these old database files in a hex seems that really...these version 8 and 9 record blocks just don't seem that complicated.  For the most part..I'm able to take these thousands blocks and separate out the "like" records into groups based on their "table number".  Sometimes however I see records from "obviously" different tables that are being grouped together because there is something "special" about their record header.  I think the crux of my problem comes down to just knowing the organization of these tiny record headers which are probably only 4 to 10(?) bytes in length for each record. I expect there may be block/record address pointers for any records that are divided into "fragments" .  E.G. any record larger than "x" bytes or that no longer fits in their record block will be divided into fragments and the header of each fragment will contain the block id and relative address of the next fragment....or something.  Even if I can't get the "exact" specification for a record header...It'd help if I new what sort of data to expect in a header (like a forward link block pointers or ?).  The Block level headers are well defined in available documentation.   I'm not opposed to sending out the data to a third's the great heaping mounds of cash that must acompany it that gives me pause. Besides...what fun would that be?   :-)

  • how much data are we talking here? Because if it's a "few" GB, I'd be willing to help out with this for a reasonable fee.

  • Thanks, I'll keep that in mind :-)    Actually.... If I could purchase a small license for Progress Version 9,  It'd probably be worth it to me. I think Version 9 comes with the utility to allow it to upgrade version 8 databases. my knoweldge.... the current Version 10 can only "upgrade" version 9 databases.  I'm guessing you have a functional Version 9 system.

    It'd be swell if they had a utility that could ASCI dump tables from a db without the database engine running.

  • I have a pile of versions, although I only have >= 10.2 installed on my current machine.

    A personal db -might- be able to do a D&L, and the last time I got one it was in the $300 range. Failing that, a 4GL client would run around $1K.

    There's also the trial version development suites you can download, but they're for 10.2* systems. They might have the necessary binaries to convert a v8 to v9 db - I don't know for sure though.

  • In a version 11.0 distribution, there are three directories under the bin directory in $DLC - 83dbutils, 91dbutils, 101dbutils.  The 83dbutils includes 83conv78.bat to convert a version 7 DB to version 8.  So, with a current copy of Progress, presumably even the eval, you have the tools needed to migrate all the way from version 7 to the version you have.

    Trying to do this by reading the blocks would be nuts.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice

  • Thanks Thomas for additional input, It really is appreciated.

    I'm looking into if I can get access to these utilities.  This dabase was originally running on an old SCO Unix box...which is "dead / no longer available".  If I need to purchase a current version or use a trial version of Progress, I need to consider where I'm going to get a Server to run it on...and what platform it's going to be...and if it's not the same it going to be able to load these database files.

    I have alot of experience working on internal file structures like various flavors of ISAM (cisam, kisam, tisam) and others.  I have nothing against the Progress database. Seems like a fine database to me. :-)    However; I need to move data formerly hosted on a SCO Unix box with Progress to a new server/database --- and so far, I only have access to the raw .db or .d1 (.d"x") files.  If it's necessary that I perform a tree traversal starting at a root node and recursively moving to intermediate nodes until I arrive at a leaf node which points me to individual records... I can do that.  It won't be a picnic, but I've done this kind of thing before. It's always an "imperfect" science unless you have really good whitepapers that describe the function of status bits etc. Sometimes though....if you just want to do a raw dump of every record in the file... you can just move sequentially through the blocks tossing out everything but record blocks. If I consider a record "fragment" to actually be a valid "record" (has a rowid) in and of itself, I'm 90% of the way there already. Progress uses internal "field compression" that poses some additional challenge (I've seen this kind of thing before too).

    Maybe you do have to be a little nuts to go this route. To me, it's just knowing the "process" behind the magic. If I had 100 numbers to add and my calculator application won't launch,  I can go find, download, install and license another calulator application, or find somebody else with a working calculator application and ask or pay them to do it for me.. That might be "easier" but it doesn't mean I shouldn't know how to do addition. Does it make sense to write my own calculator application?  Well, no, but a typical calculator application does lots of stuff I don't need in order to add 100 numbers.   I don't need subtraction, or multiplication, or division. I don't need a fancy GUI.   I could infact write a Perl script to read a text file of 100 numbers and sum them in a couple minutes - Problem solved.

    I really do appreciate all the suggestions and additional info offered up by this groups members. I understand my questions fall well outside what a typical Progress database user would have any interest in.  Thanks again for the heads up on the conversion programs. I will investigate this avenue as well.

  • If I were paying for this, I sure would be looking for a compatible box to borrow before I authorized spending the kind of time that you are going to have to take.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice

  • Well, working at the problem from the other direction (trying to upgrade the files into Progress 10.2 with supplied tools is proving troublesome (perhaps because we lack the necessary expertise) -- the "paths" stored in the .db for existance don't exist.  The SCO box used different directory structure than a current Windows installation etc.

    We are trying to make adjustments so that the config files and path environment variables support the now non-existant SCO paths.

  • You've tried the repair option?

    You might get more response dealing with some of these move from dead box using utilities questions by posting on the OpenEdge/RDBMS forum.  The DB experts may not see a question "way up here".

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice

  • You can create a new .db file by using the data extents in whatever location you find convenient and creating a new structure definition file (.st) to describe where the extents are. Then you can use a "prostrct builddb foo" command to create a new .db file. Alternately, you can use prostrct repair to change the extent locations in an existing .db file.

  • I continue to be amazed that anyone with the initials VP in their title actually has this much low level root knowledge about the low level underpinnings of their product. That's a rarity these days. Thank you again for your detailed explanation.

  • First of all...I'd like to say that I just installed CentOS on my Macbook Pro, and I will continue to investigate getting Progress Up and Running.  I hope I will find some reasonable SQL administrator type client so I can browse tables etc.   Meanwhile... "Extracting data the Hard way" is actually going fairly well takling into account your tips on some of the underlying architectuure. Several of my "guesses" have turned out correct...and some of the organization I discovered through pure observation has been confirmed with your help. Thank You.

    All of the "smaller"  tables seem to export just fine (tables that have records with less than 16 fields).

    I am still having some difficulty getting some records grouped into the correct table number, The problem centers around the fragmentation and the skip table.

    For the following example record header...(which is indicated to be the first fragment by the record offset directory

    00 01 e9 62 00 d8 e7 00 0c 00 66 00 7c 00 8d 00 9d 00 da 00 ef fa 00 0c

    00 01 e9 62  --> The first four octets contain the rowid of the fragment (I understand row-ids and how to locate them)

    00 d8  --> the Length of this record fragment.

    e7 --> This record has a skip table.

    00 0c --> The length of the skip table

    00 66 00 7c 00 8d 00 9d 00 da 00 ef  --> the skip table (in groups of two octets)

    fa 00  --> This should be the table number right?  

    0c  --> The lengh of the first data field...followed by additional fields of varying sizes and encodings.

    Am I interpretting the table number correctly? Should the next fragment also contain this table number or no?  Guessing NO. When I move to the expected location of the next (and final in this instance) fragment. can additional fragments have a skip table? 

    I appologize for ginving you such a technical workout...again, this truly is appreciated.

    Robert Bednar

  • Have you tried the DB utilities from this CentOS parition (assuming there is a Linux version of the eval)?  You are getting extraordinary support from Gus here ... far beyond what I would expect anyone at PSC to provide for free, especially someone at his level.

    Consulting in Model-Based Development, Transformation, and Object-Oriented Best Practice