To the extent possible under law, the editors have waived all copyright and related or neighboring rights to this work. In addition, as of [DATE: 01 Jan 1901], the editors have made this specification available under the Open Web Foundation Agreement Version 1.0, which is available at http://www.openwebfoundation.org/legal/the-owf-1-0-agreements/owfa-1-0.
This specification aims to describe DOM APIs related to parsing markup into DOM trees and serializing DOM trees into markup, with a strong focus on compatibility with existing content.
Various issues are listed in the rest of the document.
This specification currently requires using the XML Parser for some APIs, when in an XML document. It is unclear whether consensus can be found for this approach.
All diagrams, examples, and notes in this specification are non-normative, as are all sections explicitly marked non-normative. Everything else in this specification is normative.
The key words "MUST", "MUST NOT", "REQUIRED", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative parts of this document are to be interpreted as described in RFC2119. For readability, these words do not appear in all uppercase letters in this specification. RFC2119
Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and terminate these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.
Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)
User agents may impose implementation-specific limits on otherwise unconstrained inputs, e.g. to prevent denial of service attacks, to guard against running out of memory, or to work around platform-specific limitations.
When a method or an attribute is said to call another method or attribute, the user agent must invoke its internal API for that attribute or method so that e.g. the author can't change the behavior by overriding attributes or methods with custom properties or functions in ECMAScript.
Unless otherwise stated, string comparisons are done in a case-sensitive manner.
If an algorithm calls into another algorithm, any exception that is thrown by the latter (unless it is explicitly caught), must cause the former to terminate, and the exception to be propagated up to its caller.
The IDL fragments in this specification must be interpreted as required for conforming IDL fragments, as described in the Web IDL specification. WEBIDL
Some of the terms used in this specification are defined in DOM4 and HTML. DOM HTML
Vendor-specific proprietary extensions to this specification are strongly discouraged. Authors must not use such extensions, as doing so reduces interoperability and fragments the user base, allowing only users of specific user agents to access the content in question.
If vendor-specific extensions are needed, the members should be prefixed by vendor-specific strings to prevent clashes with future versions of this specification. Extensions must be defined so that the use of extensions neither contradicts nor causes the non-conformance of functionality defined in the specification.
When vendor-neutral extensions to this specification are needed, either this specification can be updated accordingly, or an extension specification can be written that overrides the requirements in this specification. When someone applying this specification to their activities decides that they will recognise the requirements of such an extension specification, it becomes an applicable specification for the purposes of conformance requirements in this specification.
The term context object means the object on which the method or attribute being discussed was called.
Node
sThe following steps form the fragment parsing algorithm, whose arguments are a markup string and a context element.
If the context element's node document is an HTML document: let algorithm be the HTML fragment parsing algorithm.
If the context element's node document is an XML document: let algorithm be the XML fragment parsing algorithm.
Invoke algorithm with markup as the input, and context element as the context element.
Let new children be the nodes returned.
Let fragment be a new
DocumentFragment
whose
node document
is context element's
node document.
Append each node in new children to fragment (in order).
This ensures the node document for the new nodes is correct.
Return fragment.
To serialize a
Node
node, the user agent
must run the following steps:
The serialization of a
DocumentType
node is the string returned by
the following steps:
the empty string
the concatenation of
PUBLIC
";
"
" (U+0020 SPACE, U+0022 QUOTATION MARK);
"
" (U+0022 QUOTATION MARK).
the concatenation of
SYSTEM
";
"
" (U+0020 SPACE, U+0022 QUOTATION MARK);
"
" (U+0022 QUOTATION MARK).
the concatenation of
PUBLIC
";
"
" (U+0020 SPACE, U+0022 QUOTATION MARK);
" "
" (U+0022 QUOTATION MARK, U+0020 SPACE, U+0022 QUOTATION MARK);
"
" (U+0022 QUOTATION MARK).
Return the concatenation of
<!DOCTYPE
";
" (U+0020 SPACE);
" (U+0020 SPACE);
>
" (U+003E GREATER-THAN SIGN).
No attempt is made to ensure that the result of this
algorithm is sensible if the
public or
system ID
contains a ""
" character.
To produce an HTML serialization of a
Node
node, the user agent
must run the appropriate steps, depending on node's interface:
Element
Document
DocumentFragment
Run the HTML fragment serialization algorithm on node. Return the returned string.
Comment
Text
DocumentType
Return the serialization of node.
ProcessingInstruction
To produce an XML serialization of a
Node
node, the user agent
must run the appropriate steps, depending on node's interface:
Element
Return the concatenation of the following strings:
<
" (U+003C LESS-THAN SIGN);
tagName
attribute;
escaping / throwing
>
" (U+003E GREATER-THAN SIGN);
</
" (U+003C LESS-THAN SIGN, U+002F SOLIDUS);
tagName
attribute;
>
" (U+003E GREATER-THAN SIGN).
Document
Run the XML fragment serialization algorithm on node. Return the string this produced.
Comment
Return the concatenation of
<!--
" (U+003C LESS-THAN SIGN,
U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS);
-->
" (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS,
U+003E GREATER-THAN SIGN).
Text
data
.
If node has its serialize as CDATA flag set, run the following steps:
CData
production, throw an
InvalidStateError
exception and terminate the entire algorithm.
<![CDATA[
", data, and
"]]>
".
Otherwise, run the following steps:
&
" in
markup by "&
".
<
" in
markup by "<
".
>
" in
markup by ">
".
DocumentFragment
DocumentType
Return the serialization of node.
ProcessingInstruction
Return the concatenation of
<?
" (U+003C LESS-THAN SIGN, U+003F QUESTION MARK);
" (U+0020 SPACE);
?>
" (U+003F QUESTION MARK, U+003E GREATER-THAN SIGN).
The XML serialization of the attributes of an element element is the result of the following algorithm:
For each attribute attr in element attributes, in order, append the following strings to result:
" (U+0020 SPACE);
escaping / throwing
="
" (U+003D EQUALS SIGN, U+0022 QUOTATION MARK);
escaping / throwing
"
" (U+0022 QUOTATION MARK).
DOMParser
interfaceenum SupportedType { "text/html", "text/xml", "application/xml", "application/xhtml+xml", "image/svg+xml" }; [Constructor] interface DOMParser { Document parseFromString(DOMString str, SupportedType type); };
The DOMParser()
constructor
must return a new DOMParser
object.
The
parseFromString(str, type)
method must run these steps, depending on type:
text/html
"
Parse str with an HTML parser, and return the newly created document.
The scripting flag must be set to "disabled".
meta
elements are not
taken into account for the encoding used, as a Unicode stream is passed into
the parser.
script
elements get marked
unexecutable and the contents of noscript
get parsed as markup.
text/xml
"
application/xml
"
application/xhtml+xml
"
image/svg+xml
"
Parse str with a namespace-enabled XML parser.
If the previous step didn't return an error, return the newly created document and terminate these steps.
Let document be a newly-created
Document
.
The intention is that this document does not support
document.load()
.
It is not clear if it needs to be a
XMLDocument
for other reasons (such
as stringification).
Let root be a new
Element
, with its
local name
set to "parsererror
" and its
namespace
set to
"http://www.mozilla.org/newlayout/xml/parsererror.xml
".
At this point user agents may append nodes to root, for example to describe the nature of the error.
Append root to document.
Return document.
In any case, the returned document's content type must be the type argument.
It is currently unclear what the URL of the returned document should be.
Results for a test case:
Gecko | Opera | Chrome | |
---|---|---|---|
document.location | null | ||
document.URL | unsupported | unsupported | "" |
document.documentURI | Page URL | null | null |
Anne van Kesteren suggests using the default, about:blank.
The returned document's encoding is the default, UTF-8.
XMLSerializer
interface[Constructor] interface XMLSerializer { DOMString serializeToString(Node root); };
The XMLSerializer()
constructor must return a new XMLSerializer
object.
The
serializeToString(root)
method must
produce an XML serialization of
root and return the result.
Element
interfacepartial interface Element { [TreatNullAs=EmptyString] attribute DOMString innerHTML; [TreatNullAs=EmptyString] attribute DOMString outerHTML; void insertAdjacentHTML(DOMString position, DOMString text); };
innerHTML
The innerHTML
IDL
attribute represents the markup of the
Element
's contents.
innerHTML
[ = value ]
Returns a fragment of HTML or XML that represents the element's contents.
Can be set, to replace the contents of the element with nodes parsed from the given string.
In the case of an XML document,
will throw an InvalidStateError
if the Element
cannot be serialized
to XML, and a SyntaxError
if the given string is not well-formed.
On getting, if the context object's node document is an HTML document, then the attribute must return the result of running the HTML fragment serialization algorithm on the context object; otherwise, the context object's node document is an XML document, and the attribute must return the result of running the XML fragment serialization algorithm on the context object instead (this might throw an exception instead of returning a string).
On setting, these steps must be run:
Let fragment be the result of invoking the fragment parsing algorithm with the new value as markup, and the context object as the context element.
Replace all with fragment within the context object.
outerHTML
The outerHTML
IDL
attribute represents the markup of the
Element
and its contents.
outerHTML
[ = value ]
Returns a fragment of HTML or XML that represents the element and its contents.
Can be set, to replace the element with nodes parsed from the given string.
In the case of an XML document,
will throw an InvalidStateError
if the
element cannot be serialized to XML, and a
SyntaxError
if the given string is not
well-formed.
Throws a NoModificationAllowedError
exception if the parent of the element is the
Document
node.
On getting, if the context object's node document is an HTML document, then the attribute must return the result of running the HTML fragment serialization algorithm on a fictional node whose only child is context object; otherwise, the context object's node document is an XML document, and the attribute must return the result of running the XML fragment serialization algorithm on that fictional node instead (this might throw an exception instead of returning a string).
On setting, the following steps must be run:
Let parent be the context object's parent.
If parent is null, terminate these steps. There would be no way to obtain a reference to the nodes created even if the remaining steps were run.
If parent is a
Document
, throw a
NoModificationAllowedError
exception and terminate these steps.
If parent is a
DocumentFragment
, let
parent be a new
Element
with
body
as its
local name,
Let fragment be the result of invoking the fragment parsing algorithm with the new value as markup, and parent as the context element.
Replace the context object with fragment within the context object's parent.
insertAdjacentHTML()
insertAdjacentHTML
(position, text)
Parses the given string text as HTML or XML and inserts the resulting nodes into the tree in the position given by the position argument, as follows:
Throws a SyntaxError
exception if the arguments have invalid values (e.g., in the case of an
XML document, if the given string is
not well-formed).
Throws a NoModificationAllowedError
exception if the given position isn't possible (e.g. inserting elements
after the root element of a Document
).
The
insertAdjacentHTML(position, text)
method must run these steps:
Use the first matching item from this list:
Let context be the context object's parent.
If context is null or a
document, throw
a NoModificationAllowedError
and
terminate these steps.
Let context be the context object.
Throw a SyntaxError
exception.
If context is not an
Element
or the following are all true:
html
", and
let context be a new
Element
with
body
as its
local name,
Let fragment be the result of invoking the fragment parsing algorithm with text as markup, and parent as the context element.
Use the first matching item from this list:
Insert fragment into the context object's parent before the context object.
Insert fragment into the context object before its first child.
Append fragment to the context object.
Insert fragment into the context object's parent before the context object's next sibling.
Text
interfacepartial interface Text { attribute boolean serializeAsCDATA; };
serializeAsCDATA
[ = value ]
Text
nodes have an additional
associated flag, the serialize as CDATA flag.
The
serializeAsCDATA
attribute must return true if the context object has its
serialize as CDATA flag set, or false otherwise.
Setting the serializeAsCDATA
attribute must, if the new value is true, set the
context object's serialize as CDATA flag, or unset
it otherwise.
Range
interfacepartial interface Range { DocumentFragment createContextualFragment(DOMString fragment); };
createContextualFragment
(fragment)
Returns a DocumentFragment
, created
from the markup string given.
The
createContextualFragment(fragment)
method must run these steps:
Let node be the context object's start node.
Let element be as follows, depending on node's interface:
Document
DocumentFragment
Element
Text
Comment
DocumentType
ProcessingInstruction
If either element is null or the following are all true:
html
", and
let element be a new element with
body
" as its
local name,
Let fragment node be the result of invoking the fragment parsing algorithm with fragment as markup, and element as the context element.
For each script in fragment node, unset the "parser-inserted" and "already started" flags.
This step is intended to be equivalent to not setting those flags in the first place, and to ensure that scripts are run when fragment node is inserted into a document.
Return fragment node.
All references are normative unless marked "Non-normative".
Thanks to Anne van Kesteren, Aryeh Gregor, Boris Zbarsky, David Håsäther, Henri Sivonen, Ryosuke Niwa, Simon Pieters, timeless and Travis Leithead for their useful comments.
Special thanks to Ian Hickson for defining the
innerHTML
and
outerHTML
attributes, and the
insertAdjacentHTML()
method in
HTML and his useful comments.
HTML