Information

Title	How to convert an existing database to Unicode UTF-8?

URL Name	19912

Article Number	000173318

Environment	Product: Progress Version: 9.x Product: OpenEdge Version: 10.x, 11x, 12.x OS: All Supported Operating Systems

Question/Problem Description

How to convert an existing database to Unicode UTF-8?
How to convert an existing Database to Unicode
How to convert an existing Database to UTF8 with dump and load.
Can an existing database be converted to UTF-8 without dump and load?
Can binary dump and binary load be used for different codepages?

Steps to Reproduce

Clarifying Information

Error Message

Defect Number

Enhancement Number

Cause

Resolution

To convert an existing Database to UTF8 two approaches are possible, either with or without a dump and load.

Before starting either process:

Ensure a valid database backup before proceeding.
If the Auditing is enabled, disable Auditing. Refer to Article How to disable auditing?

PROUTIL is not aware of the database codepage (-cpinternal). It must be specified in all PROUTIL command lines that modify data, to ensure that the command executes correctly and does not introduce unexpected data corruption resulting from code page differences. Further information about this behaviour is described in Article Inconsistent behaviour of PROUTIL when -cpinternal is not specified

OPTION 1: To convert an existing database to UTF-8 without dump and load:

1. Since OpenEdge 10.1A a default UTF-8 word rule file is provided in the installation directory which can be used instead, named: proword.254.

It can be found under <DLC> or <DLC>/prolang/utf.

Prior to 10.1A, compile a new version of word break table for UTF-8 to a rule number <N> in order to customize word rules.
proutil <dbname> -C wbreak-compiler <DLC>\prolang\convmap\utf8-bas.wbt <N>

Where:

<DLC> - Use an absolute path to DLC instead of the environment variable in the command line.
utf8-bas.wbt: The word-break table you use is dependent upon the language (code-page), found in the <install-dir>\prolang\convmap directory.
<N>: The rule number when converting DB to utf8 it's a number between 1-255 (of your choosing) that identifies the compiled word-break table.

2. Either place the new file proword.<N> in the install directory (DLC) or define the word-break Environment Variable (available since Progress Version 9.0A):

PROWD<N>=<file-directory>\proword.<N>

3. Convert the database to UTF-8:

$ proutil <dbname> -C convchar convert UTF-8

4. Apply the new word-rules to the database:

$ proutil <dbname> -C word-rules <N>

Example:

$ proutil <dbname> -C word-rules 254

5. Use the Data Administration tool to load the file: <DLC>\prolang\utf\_tran.df in order to change the database collation.

6. Rebuild all indexes:

$ proutil <database> -C idxbuild ALL -cpinternal UTF-8

OPTION 2: Using Dump and Load to convert the Database Collation:

A. To convert an existing database to UTF-8 using Data Administration dump and load (ASCII):

Once Steps 3 and 4 below have been completed , they do not need to be repeated for subsequent UTF-8 databases on the same system.

1. ASCII Dump the existing database using the Data Administration tool.

2. Create a new empty UTF-8 database.

The utf8\empty database will have the collation defined by default as basic.

$ prodb <new_database> <DLC>\prolang\utf\empty.db

3. Compile a new version of word break table for UTF-8 to a rule number <N>.

$ proutil <dbname> -C wbreak-compiler <DLC>\prolang\convmap\utf8-bas.wbt <N>

Where:
<N> - is a number between 1 and 255.
<DLC> - Use an absolute path to DLC instead of the environment variable in the command line.

In OpenEdge 10.1A and later, if word rules have not been customized then this is not absolutely necessary, a default UTF-8 word rule file named proword.254 provided in the installation directory which can be used instead

4. Either place the new created file proword.<N> in the install directory (DLC) or define the environment variable (available since Progress Version 9.0A):

PROWD<N>=<file-directory>\proword.<N>

5. Apply the new word-rules to the database:

$ proutil <dbname> -C word-rules <N>

Example:

$ proutil <dbname> -C word-rules 254 -cpinternal UTF-8

6. Load the database using the Data Administration tool.

A single byte client can be used to do this unless the .df or _user.d have characters outside the single byte code-page.
Alternatively, use a (remote) Windows GUI client. This client can also be used to load ascii dumped .d files.

$ prowin dbname [ -H host -S port ] -cpinternal UTF-8 -p _admin.p

7. Indexes need to be built after the database has been converted and the data loads have completed, if they were loaded as INACTIVE in the definition file. Otherwise they will be built as part of the Data Admin ascii load.

$ proutil <database> -C idxbuild ALL -cpinternal UTF-8

B. To convert an existing database to UTF-8 using binary dump and load:

When a Binary dump and load strategy is used;

Prior to OpenEdge 10.0A, binary dump does not record the code page of the text being written to the dump file (.bd). Use the (ASCII) Data Dictionary Dump Table Contents plus either the Data Dictionary Load Table Contents or the bulkload utility. If binary load is used to load data into a UTF-8 database from a .bd file which was dumped from a different code page database this will not work correctly. The .bd file is embedded with the code page of the database the data was dumped from and will not perform conversion to any other code page when loaded.
Starting with OpenEdge 10 and later, binary dump does record the code page of the text being written to the dump file (.bd). This can only be loaded into a database that uses the same code page. This ensures that there is no possibility of data corruption when loading the data resulting from code page differences. Failure to load into a database with the same code page will result in error 10855:

Code page of .bd file (<namne>) does not match code page of database(<name>). (10855)

1. Create an Empty database with the same codepage as the database the data was binary dumped from (eg: iso8859-1)

$ prodb <new_database> <DLC>\prolang\ame\empty.db

2. Load application schema (.df) files to the Empty database through the Data Dictionary

3. Load binary dump files to database copy: <new_database>.
This step includes all steps associated with a binary load including rebuilding indexes.

$ proutil dbname -C load (dumped files.bd) -i
$ proutil dbname -C idxbuild all -TM 31 -TB 32 -SG 65 -T /tmp

4. Convert the database copy to UTF-8.

$ proutil <dbname> -C convchar convert UTF-8

5. Assign word rules to the database copy.

See Option 1, Step 1 above.

Example: A default UTF-8 word rule file is provided in the installation directory, named: proword.254,

$ proutil <dbname> -C word-rules 254 -cpinternal UTF-8

6. Use the Data Administration tool to load the file: <DLC>\prolang\utf\_tran.df in order to change the database collation.
Otherwise the collation will not be changed. It will still be the originating database's collation, (eg ISO basic) not the utf-8 basic collation basic for example.

7. Rebuild all database indexes

$ proutil <database> -C idxbuild ALL -cpinternal UTF-8

Workaround

Notes

Progress Articles:

How to see what word-break table is defined for a database?
Can two IDXBUILDS be avoided when doing a dump load conversion to utf-8
How to convert UTF-8 database to IS08859-1?
Codepage change to UTF8 German Characters are not sorted

Keyword Phrase

Last Modified Date	1/21/2026 5:58 PM