[View:/cfs-file/__key/communityserver-discussions-components-files/19/esbpws2.xml:320:240]Hello everyone,
I'm having a small issue while parsing an XML that's using a UTF-8 codepage.
It contains some special characters like ‘ (U+0091), ’ (U+0092), “ (U+0093), ” (U+0094), œ (U+009C) and so on.
It's not very clear but although the above characters look like a single quotation mark ' and a double quotation mark ", they are not the same.
I first read the xml into a longchar and fix the codepage to UTF-8. (with or without FIX-CODEPAGE, the result is the same)
No convertion is needed because the xml file is already created in UTF-8, hence the NO-CONVERT.
I then use the longchar as an input source for the SAX-READER.
Example of my code:
/* Set a fixed codepage (UTF-8) for the longchar */
FIX-CODEPAGE ( wclong ) = "UTF-8".
/* copy the xml to a longchar */
COPY-LOB FILE wcxml TO wclong NO-CONVERT.
/* OUTPUT TO "d:\users\geegun\webservice\bal\esbpws\longcontent.txt". */
/* EXPORT wclong . */
/* OUTPUT CLOSE. */
CREATE SAX-READER whParser.
RUN saxparserprocedure.p PERSISTENT SET whHandler.
whParser:HANDLER = whHandler.
whParser:SET-INPUT-SOURCE("LONGCHAR", wclong ).
whParser:SAX-PARSE-FIRST() NO-ERROR.
ParseLoop:
REPEAT WHILE whParser:PARSE-STATUS = SAX-RUNNING:
whParser:SAX-PARSE-NEXT() NO-ERROR.
IF whParser:PRIVATE-DATA = "FatalErrorInvokedByUser"
THEN DO:
ASSIGN ERROR-STATUS:ERROR = TRUE.
LEAVE ParseLoop.
END.
END.
IF ERROR-STATUS:ERROR
THEN DO:
/* ... some error handling here ... */
END.
ELSE DO :
/* get the dataset from the saxparserprocedure */
RUN getdata IN whHandler (OUTPUT DATASET-HANDLE whdataset BIND, OUTPUT iplfuncerror , OUTPUT ipcErrorMsg ).
END.
When I uncomment the 'OUTPUT TO' to statement in the code above, the file still contains all the characters.
But when I look at the attribute's value (using GET-VALUE-BY-INDEX(indexPosition) ) during the parsing process, the attribute's value has already changed.
Attached to this post you can find a excerpt of the xml file. The following text 'Vidange d’huile' contains one of the special characters. It's not a normal apostrophe.
I've been searching for a solution for a while and I found the following KB post dating from 2014 which describes my problem but unfortunately there doesn't seem to be a solution.
http://knowledgebase.progress.com/articles/Article/000054284
Does anyone have an idea on how to solve this?
Or has anyone had the same problem before?
Thanks in advance,
Geert
You don't mention the OE version. Defect PSC00315657 referenced in the kbase was addressed in 11.6.0 and 11.5.1.
Hello Garry,
Unfortunately we are using OE 11.5.0 at the moment. Our management is still deciding whether to update to 11.5.1, 11.6.0 or even 11.7.0.
Thanks for the info.