wiki:CDATA Syntax in XML

Elements that contain data of type xs:string may need to be escaped to prevent the xml parser from getting confused. How best to do this depends on the encoding of the string.

ASCII

If the string is guaranteed to only contain characters from the ASCII character set then it needs to be escaped only if it contains the characters <, & or > (> is normally OK except under confusing circumstances so it's probably best to escape it anyway). A CDATA declaration has the form:

<![CDATA[some_string_goes_here]]>

This will escape any ASCII string except for ones that contain the substring ]]> since this would otherwise be the close delimiter. To escape such strings one can perform a global search and replace:

s/]]>/]]]]><![CDATA[>/g

Which splits the substring between two CDATA tags, thus allowing it to be escaped. CDATA elements get ignored by conforming XML parsers so such a string transformation won't effect what gets read in by the parser.

Unicode

Unfortunately Unicode is more complicated to handle. If we need to deal with this we might want to consider using an external library.

Last modified 15 years ago Last modified on 06/30/11 21:26:29
Note: See TracWiki for help on using the wiki.