Structured XML, irrespective of its purpose, is designed to carry data for a variety of purposes. The XML 1.0 standard defines 5 predefined entities – aka special characters – and requires that XML processors honor them.
The predefined character entities are used within the markup of an XML document. If these are used without translation in the data values represented in the same document, then the markup will no longer be valid. It is because of this that the XML specification defines a method to declare these characters within the scope of a XML document to represent data rather than markup.
The special characters can be referenced in XML using one of 3 formats:
- &name; where name is the character name (if available) such as quot, amp, apos, lt, or gt.
- &#nn; where nn is the decimal character code reference.
- &#xhh; where xhh is the hexadecimal character code reference.
XML data formats such as HTML defines additional character entities to allow for rendering of special characters – this is required where aliases need to be established for certain Unicode characters. HTML 4 defines 252 named character entities which can be referred to either by name, decimal, or hexadecimal references.
XML 1.0 Predefined Entities
|Character||Unicode Code Point||XML Character Entity Format||Description|
|double quotation mark|
In the example XML , we represent an ampersand with it’s character entity. The highlighted line (12) illustrates how this is done.