Special Characters in XML

XML symbols

Structured XML, irrespective of its purpose, is designed to carry data for a variety of purposes. The XML 1.0 standard defines 5 predefined entities – aka special characters – and requires that XML processors honor them.

The predefined character entities are used within the markup of an XML document. If these are used without translation in the data values represented in the same document, then the markup will no longer be valid. It is because of this that the XML specification defines a method to declare these characters within the scope of a XML document to represent data rather than markup.

The special characters can be referenced in XML using one of 3 formats:

  • &name; where name is the character name (if available) such as quot, amp, apos, lt, or gt.
  • &#nn; where nn is the decimal character code reference.
  • &#xhh; where xhh is the hexadecimal character code reference.

XML data formats such as HTML defines additional character entities to allow for rendering of special characters – this is required where aliases need to be established for certain Unicode characters. HTML 4 defines 252 named character entities which can be referred to either by name, decimal, or hexadecimal references.

XML 1.0 Predefined Entities

CharacterUnicode Code PointXML Character Entity FormatDescription

U+0022 (34)

"

double quotation mark
&

U+0026 (38)

&

ampersand

U+0027 (39)

'

apostrophe
<

U+003C (60)

&lt;

less than
>

U+003E (62)

&gt;

greater than

Example

In the example XML , we represent an ampersand with it’s character entity. The highlighted line (12) illustrates how this is done.

   1: <?xml version="1.0" encoding="UTF-8"?>

   2: <library>

   3:     <category name="dogs">

   4:         <book>

   5:             <title lang="english">All about dogs</title>

   6:             <author>Someone</author>

   7:             <isin>true</isin>

   8:             <daysuntilreturn>0</daysuntilreturn>

   9:             <price>15.10</price>

  10:         </book>

  11:         <book>

  12:             <title lang="english">Dogs &amp; Their Habits</title>

  13:             <author>Someone</author>

  14:             <isin>true</isin>

  15:             <daysuntilreturn>0</daysuntilreturn>

  16:             <price>15.15</price>

  17:         </book>

  18:     </category>

  19:      <category name="cats">

  20:         <book>

  21:             <title lang="english">All about cats</title>

  22:             <author>Someone</author>

  23:             <isin>false</isin>

  24:             <daysuntilreturn>3</daysuntilreturn>

  25:             <price>12.00</price>

  26:         </book>

  27:         <book>

  28:             <title lang="zulu">Konke mayelana ikati</title>

  29:             <author>Else</author>

  30:             <isin>true</isin>

  31:             <daysuntilreturn>0</daysuntilreturn>

  32:             <price>25.99</price>

  33:         </book>

  34:     </category>

  35: </library>

Posted in XML and tagged .

Leave a Reply