Convert UTF-8 file to ISO8859-1 - Forum - OpenEdge General - Progress Community

Convert UTF-8 file to ISO8859-1

 Forum

Convert UTF-8 file to ISO8859-1

  • Hello,

    I would like to convert a large UTF-8 encoding file to a ISO8859-1 encoding file. I know I will be lost some characters but I don't care.

    I used INPUT FROM VALUE(filename) CONVERT SOURCE "utf-8" TARGET "iso8859-2" but the program stops to read the file when it found the first utf-8 character.

    I would like to use 'CODEPAGE-CONVERT' to convert each line of the file.

    I think I have to set CPINTERNAL, CPSTREAM, etc... but it never works.

    Thanks for your help

    Chris.

  • Version?

    V10: copy-lob from file file1.txt to file file2.txt convert source codepage "utf-8" target codepage "iso8859-1".

    Note: You said iso8859-1 but your code has iso8859-2

  • Thanks for the answer.

    In fact I have to do this in V9 and V10 under Windows.

    I tried to do what you said, but it doesn't work. I have exactly the same file before and after the 'COPY-LOB'.

    I have a file about 900Mb. I open it in UltraEdit (in UTF-8) and I save it in ANSI/ASCII. Then, after I have a new file about 450Mb.

    I lost all characters on 2bytes, instead I have '?'. That's what I want.

    Maybe I can do that in Progress or I have to modify the convmap.cp. I don't know.

    Chris.

    Note: it is ISO8859-1

  • on any "linux" distribution then command is:

    iconv --from-code=UTF-8 --to-code=ISO-8859-1 -c -s ./oldfile.p > ./newfile.p

    and, of course, you loose all invalides characteres in output.

  • I tried the COPY-LOB example and it is indeed converting nothing at all. Changing the target and source code pages to BERT and ERNIE also does not produce an error. It would seem that the CONVERT is simply, although documented, ignored - which smells like a bug.

  • OpenEdge 10.2A01 Linux:

    It works very well ! just start your Progress session with:
    -cpcase basic
    -cpcoll basic
    -cpstream utf-8
    -cpinternal utf-8

    ==================================================

    /* iconv_UTF-8_ISO8859-1.p */
    DEFINE STREAM lsIN.
    DEFINE STREAM lsOUT.
    DEFINE VARIABLE lc_filename   AS CHARACTER NO-UNDO.
    DEFINE VARIABLE jLigne        AS CHARACTER NO-UNDO.
    lc_filename = "site/index.p".

      INPUT STREAM lsIN FROM VALUE(lc_filename)
             CONVERT TARGET "iso8859-1" SOURCE "utf-8".
      OUTPUT STREAM lsOUT TO VALUE(lc_filename + ".new").
      REPEAT:
        IMPORT STREAM lsIN UNFORMATTED jLigne.
        PUT STREAM lsOUT UNFORMATTED jLigne SKIP.
      END.
      INPUT  STREAM lsIN  CLOSE.
      OUTPUT STREAM lsOUT CLOSE.

    ==================================================

    * invalid characters will replaced with ??????