In our ERP system we export some data to xml-files. We do this by buffer-copying rows from tables to temp-tables, and then exporting these using WRITE-XML. This results in a file starting with <?xml version="1.0" encoding="UTF-8"?>. So it's an xml 1.0 file.
Some of our customers have data in the tables with unicode characters, for instance U+1A. In our application, that's perfectly valid data. However, it is an invalid character in xml 1.0. It's valid in xml 1.1, but only when it's written as
So we have two issues here:
1. As far as I can tell, WRITE-XML always writes xml 1.0 files. Is it possible to write xml 1.1 files? If so, how? I can't find anything in the docs. We could just replace the header of the resulting document, but that's a hack.
2. We need to rewrite almost all invalid characters from U+01 until U+1F to the &#x__; form. Is there a smart way to do this? As we are always interested in complete records, buffer-copy is the most efficient way for us to copy the records to the temp-table. When we need to sanitize the data, we could do a replace for all these characters on all character fields, but that's a performance issue waiting to happen.
Any thoughts? Experiences?
1. no - the number of xml 1.1 consumers is also very disappointing
2a. when exporting end user xml we strip out all invalid asc values - it is not smart
2b. for all other use we try to use write-json which handles it all a lot better, but does miss a 'write-schema' option, but you can combine write-xmlschema with write-json