The World Wide Web Consortium (W3C) defines XML as follows: The Extensible Markup Language (XML) is a simple text-based format for representing structured information.
Wikipedia defines XML as follows: Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format which is both human-readable and machine-readable.
The quoted definitions are all 100% correct; however, something more is required to aid understanding. To fully understand what XML is, review each word of the name in detail: Extensible Markup Language. The keyword here is EXTENSIBLE.
|Extensible||adjective: able to be extended; extendable.”an extensible architecture designed to accommodate changes”|
|Extensible Markup||XML on its own defines the use of tags (elements and attributes); however, it does not define which tags to use. That is the purview of the extended XML standard / format.|
For example: HTML is an extended XML language standard and it has a number of defined tags (html, head, meta, body,…).
XML is a meta-language – it only defines the syntax rules of the language, for example: every opening tag must have a closing tag.
The extended XML standard / format (such as HTML) defines the semantics of the language; for example the html tag may only have 1 head and 1 body tag as children.
This means that XML on its own does not DO anything – XML only provides a method to describe data in a structured manner using markup (syntax). The markup tags and rules (semantics) are defined by the extended XML standard/format. The extended XML standard/format is defined by the user; and this format is based on requirements.
XML is designed to carry data; the application of how that data is applied (processing, display, etc) is dependent on the implementation. The benefits are immediately apparent:
- Simplicity: XML carries data; it does not do anything.
- Flexibility: XML is an extensible language. Users can define tags that are fit for purpose
- Human & Machine Readable: XML is a plain text data format. It has the capacity/capability to be self-descriptive (this is not necessarily the case) if the extended XML format / standard allows for it.
- Defined Grammar: Documents can be easily validated, forcing structure.
Any data-interchange format has its pro’s and cons. The same holds true for XML:
- Processing large files can have an adverse impact on systems. Implementations need to take this into account.
- Some XML format/standards can become complex; this results in a language that is quite possibly only machine readable.
XML was originally derived from SGML (ISO 8879 from 1986) when it was designed to meet the challenges of large-scale electronic publishing. The XML design / definition occurred in the latter half of 1996 when the W3C published the new XML specification (working draft) and in 1998 when XML became a W3C recommendation. Since then the original specification has gone through a number of revisions to adapt it to current and new requirements; as well as to address any identified problems in the specification.
The understanding of what XML is (or rather what it is used for) has changed in the past two decades; especially as the technology standards using XML developed, matured and shifted focus. Currently there are hundreds of extended XML standards/formats using the XML specification as their basis. A review of these standards/formats will illustrate that the initial standards focused on publication and rendering of information (e.g. HTML) and moved towards information interchange as the number of integration standards (e.g. SOAP) and data storage standards became more prevalent and widely used.
Real-world use of XML will boil down to one of two scenarios:
- Use a pre-existing XML standard/format
- Use a custom XML standard/format by defining your own language syntax and semantic rules (e.g. tag names, rules).
|SGML||SGML (Standard Generalized Markup Language) is a standard for how to specify a document markup language or tag set. Such a specification is itself a document type definition (DTD). SGML is not in itself a document language, but a description of how to specify one.|
|HTML||HTML or HyperText Markup Language is the standard markup language used to create Web pages. HTML is written in the form of HTML elements consisting of tags enclosed in angle brackets (like <html> ).|
|SOAP||SOAP (Simple Object Access Protocol) is a messaging protocol that allows programs that run on disparate systems to communicate using Hypertext Transfer Protocol (HTTP) and its Extensible Markup Language (XML).|