CDATA

The term CDATA, meaning character data, is used for distinct, but related, purposes in the markup languages SGML and XML. The term indicates that a certain portion of the document is general character data, rather than non-character data or character data with a more specific, limited structure.

CDATA sections in XML

In an XML document or external entity, a CDATA section is a piece of element content that is marked up to be interpreted literally, as textual data, not as marked up content. A CDATA section is merely an alternative syntax for expressing character data; there is no semantic difference between character data in a CDATA section and character data in standard syntax where, for example, "<" and "&" are represented by "<" and "&", respectively.

Syntax and interpretation

A CDATA section starts with the following sequence:
CDATA[
and ends with the next occurrence of the sequence:
>
All characters enclosed between these two sequences are interpreted as characters, not [markup or entity references. Every character is taken literally, the only exception being the > sequence of characters. In:
John Smith
the start and end "sender" tags are interpreted as markup. However, the code:
CDATA[John Smith>
is equivalent to:
<sender>John Smith</sender>
Thus, the "tags" will have exactly the same status as the "John Smith"; they will be treated as text.
Similarly, if the [numeric character reference ð appears in element content, it will be interpreted as the single Unicode character 00F0. But if the same appears in a CDATA section, it will be parsed as six characters: ampersand, hash mark, digit 2, digit 4, digit 0, semicolon.

Uses of CDATA sections

New authors of XML documents often misunderstand the purpose of a CDATA section, mistakenly believing that its purpose is to "protect" data from being treated as ordinary character data during processing. Some APIs for working with XML documents do offer options for independent access to CDATA sections, but such options exist above and beyond the normal requirements of XML processing systems, and still do not change the implicit meaning of the data. Character data is character data, regardless of whether it is expressed via a CDATA section or ordinary markup. CDATA sections are useful for writing XML code as text data within an XML document. For example, if one wishes to typeset a book with XSL explaining the use of an XML application, the XML markup to appear in the book itself will be written in the source file in a CDATA section.

Nesting

A CDATA section cannot contain the string ">" and therefore it is not possible for a CDATA section to contain nested CDATA sections. The preferred approach to using CDATA sections for encoding text that contains the triad ">" is to use multiple CDATA sections by splitting each occurrence of the triad just before the ">". For example, to encode ">" one would write:
CDATA[>>
This means that to encode ">" in [the middle of a CDATA section, replace all occurrences of ">" with the following:

>CDATA[>
This effectively stops and restarts the CDATA section.

Issues with encoding

In text data, any [Unicode character not available in the encoding declared in the header can be represented using a &#nnn; numerical character reference. But the text within a CDATA section is strictly limited to the characters available in the encoding.
Because of this, using a CDATA section programmatically to quote data that could potentially contain '&' or '<' characters can cause problems when the data happens to contain characters that cannot be represented in the encoding. Depending on the implementation of the encoder, these characters can get lost, can get converted to the characters of the &#nnn; character reference, or can cause the encoding to fail. But they will not be maintained.
Another issue is that an XML document can be transcoded from one encoding to another during transport. When the XML document is converted to a more limited character set, such as ASCII, characters that can no longer be represented are converted to &#nnn; character references for a lossless conversion. But within a CDATA section, these characters can not be represented at all, and have to be removed or converted to some equivalent, altering the content of the CDATA section.

Use of CDATA in program output

CDATA sections in XHTML documents are liable to be parsed differently by web browsers if they render the document as HTML, since HTML parsers do not recognise the CDATA start and end markers, nor do they recognise HTML entity references such as < within <script> tags. This can cause rendering problems in web browsers and can lead to cross-site scripting vulnerabilities if used to display data from untrusted sources, since the two kinds of parser will disagree on where the CDATA section ends.
Since it is useful to be able to use less-than signs and ampersands in web page scripts, and to a lesser extent styles, without having to remember to escape them, it is common to use CDATA markers around the text of inline <script> and <style> elements in XHTML documents. But so that the document can also be parsed by HTML parsers, which do not recognise the CDATA markers, the CDATA markers are usually commented-out, as in this JavaScript example:

or this [Cascading Style Sheets">CSS example:

This technique is only necessary when using inline scripts and stylesheets, and is language-specific. CSS stylesheets, for example, only support the second style of commenting-out, but CSS also has less need for the < and & characters than JavaScript and so less need for explicit CDATA markers.

CDATA in DTDs

CDATA-type attribute value

In [Document Type Definition">
document.write;
//>

or this [Cascading Style Sheets">CSS example:

This technique is only necessary when using inline scripts and stylesheets, and is language-specific. CSS stylesheets, for example, only support the second style of commenting-out, but CSS also has less need for the < and & characters than JavaScript and so less need for explicit CDATA markers.

CDATA in DTDs

CDATA-type attribute value

In [Document Type Definition files for SGML and XML, an attribute value may be designated as being of type CDATA: arbitrary character data. Within a CDATA-type attribute, character and entity reference markup is allowed and will be processed when the document is read.
For example, if an XML DTD contains

it means that elements named foo may optionally have an attribute named "a" which is of type CDATA. In an XML document that is valid according to this DTD, an element like this might appear:

and an XML parser would interpret the "a" attribute's value as being the character data "1 & 2 are < 3".

CDATA-type entity

An SGML or XML DTD may also include entity declarations in which the token CDATA is used to indicate that entity consists of character data. The character data may appear within the declaration itself or may be available externally, referenced by a URI. In either case, character reference and parameter entity reference markup is allowed in the entity, and will be processed as such when it is read.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...