| | 1 | |
| | 2 | Elements that contain data of type xs:string may need to be escaped to prevent the xml parser from getting confused. How best to do this depends on the encoding of the string. |
| | 3 | |
| | 4 | == ASCII == |
| | 5 | |
| | 6 | If the string is guaranteed to only contain characters from the ASCII character set then it needs to be escaped only if it contains the characters <, & or > (> is normally OK except under confusing circumstances so it's probably best to escape it anyway). A CDATA declaration has the form: |
| | 7 | |
| | 8 | {{{ |
| | 9 | <![CDATA[some_string_goes_here]]> |
| | 10 | }}} |
| | 11 | |
| | 12 | This will escape any ASCII string except for ones that contain the substring ]]> since this would otherwise be the close delimiter. To escape such strings one can perform a global search and replace: |
| | 13 | |
| | 14 | {{{ |
| | 15 | s/]]>/]]]]><![CDATA[>/g |
| | 16 | }}} |
| | 17 | |
| | 18 | Which splits the substring between two CDATA tags, thus allowing it to be escaped. CDATA elements get ignored by conforming XML parsers so such a string transformation won't effect what gets read in by the parser. |
| | 19 | |
| | 20 | == Unicode == |
| | 21 | |
| | 22 | Unfortunately Unicode is more complicated to handle. If we need to deal with this we might want to consider using an external library. |