This document is a technical specification produced as part of the manakai project. It might be updated, replaced, or obsoleted by other documents at any time.
The scope of this specification is limited to the products within the manakai project. It does not intended to be implemented by multiple parties, although nothing prevents it from implemented by other DOM implementations.
Comments on this document are welcome and may be sent to the author.
This section is non-normative.
This specification defines details of DOM tree validation not covered by other applicable specifications.
This specification defines:
This specification was originally published
at <http://suika.suikawiki.org/www/markup/xml/validation-langs>
as Handling of unknown namespaces in conformance
checking since .
Earlier versions of this specification contained non-normative
descriptions of how to validate
HTML script
and style
elements.
As definitions of these elements were
slightly simplified in the
HTML Standard,
these descriptions were removed.
This specification depends on the Infra Standard. The terms list, code point, concatenate, HTML namespace, and XMLNS namespace are defined by the Infra Standard.
For the purpose of this specification, user agents are conformance checkers, also known as validators.
User agents
MUST implement
the DOM
Standard. Terms
child,
parent,
ancestor,
node,
node document,
document,
content
type, HTML document, XHTML
document, document
element,
element,
attribute list,
attribute,
value,
namespace (of element or attribute),
local name (of element or attribute),
comment (Comment
),
processing instruction (ProcessingInstruction
),
template content,
and
shadow tree
are defined by the DOM
Standard.
User agents MUST implement the HTML Standard. Terms applicable specification, document base URL, inter-element whitespace, and valid e-mail address are defined by the HTML Standard.
User agents MUST support XML [XML] and XML Namespaces [XMLNS].
The terms information item, document information item, element information item, attribute information item, character information item, [children], [parent], [attributes], [document element], [base URI], [normalized value], [character code], [namespace name], and [local name], are defined by the XML Information Set specification.
User agents are not required to implement other specifications, but are expected to follow steps defined in this specification. For example, a user agent might not support XSLT, but is is still required to detect whether an element is interpreted per XSLT specifications or not as defined by this specification.
The terms valid URL and valid absolute URL are defined by URL Standard as URL and absolute URL, respectively.
The term valid MIME type is defined by MIME Sniffing Standard as
parsable MIME type.
The utf-8
character encoding is defined by Encoding Standard.
The term valid language tag is defined by BCP 47.
The term Date construct is defined by Atom 1.0 specification.
The term
literal result element
is defined by the
XSLT1
specification.
The XSLT namespace is
http://www.w3.org/1999/XSL/Transform
.
The RSS1 namespace is
http://purl.org/rss/1.0/
.
When a value is required to match the production labeled
document
in the XML
specification,
it MUST also conform to other requirements for
XML documents.
A validation mode identify the specifications that describe the requirements used to validate a node. There are five modes:
To determine the validation mode of a node node, run these steps:
application/xml
or text/xml
version
attribute in the XSLT
namespace, return XSLT mode and abort these
steps.
application/xslt+xml
or text/xsl
application/rdf+xml
null
, set parent
node to the result of determining the validation mode of parent.
RDF
element in
the RDF namespace:
xmlns
attribute in the
XMLNS namespace whose value is the RSS1 namespace and
the rdf
attribute in the
XMLNS namespace whose value is the RDF namespace,
return RSS1 mode and abort these steps.
Elements not conforming to the RDF/XML specification (e.g. an element in string literal) are validated in default mode.
An element or attribute is in no
namespace if
its namespace is null
.
A namespace which is not null
is supported if it is defined
by a native element
specification.
A namespace which is not null
is unknown if it is
not supported.
An element is unknown namespace element if its namespace is unknown, or it is in no namespace and is not an RSS2 element.
An attribute is unknown namespace attribute if its namespace is unknown.
An unknown namespace element or unknown namespace attribute MUST NOT be used anywhere except where they are explicitly allowed.
An unknown namespace element MAY be used as the document element or as an orphan node. It MAY also be used where any kind of element is allowed.
An unknown namespace element MAY have any attribute in no namespace or any unknown namespace attribute.
Additionally, any attribute in a supported namespace might also be specified to an unknown namespace element as long as it is allowed by an applicable specification.
An unknown namespace element MAY have any kind of child (unless otherwise disallowed).
An unknown namespace attribute MAY have any value.
An attribute in no namespace of an unknown namespace element MAY have any value.
Conformance checkers MAY warn any unknown namespace element or unknown namespace attribute as use of it could be an authoring error.
Conformance checkers are expected to report errors or warnings on unknown elements and attributes useful for authors.
This specification is not intended to override any other specification's requirements.
For a public Web document, non-standard element which is not defined by any applicable specification ought to be reported as an error.
In the following example, as the bookmark
element in
the http://mybookmark.example/
namespace is not defined
by any applicable
specification, this fragment is non-conforming:
<div xmlns="http://www.w3.org/1999/xhtml">
<bookmark xmlns="http://mybookmark.example/">Hello</bookmark>
</div>
For XML data that is not expected to be directly shown to user
(e.g. an XML data retrieved via XMLHttpRequest
), use of
null- or non-standard namespaces ought not to be an error.
For example, following document fragment should not be considered
as non-conforming, nevertheless none of data
and
item
elements and the name
attribute is
defined by any public standard:
<data>
<item name="x1"/>
<item name="x2"><p xmlns="http://www.w3.org/1999/xhtml">Hi!</p></item>
</data>
If the p
element contained an item
element in no namespace, it ought to be reported as an error, as no
standard defines the item
element in no namespace as
phrasing content.
If there is an element in the
http://www.w3.org/2000/svg/
namespace, an
error or warning ought to be reported. It is likely an authoring
error.
This section is work in progress.
This section is only applied to user agents supporting RDF/XML.
RDF/XML is defined by the RDF/XML
specification, i.e.
RDF
1.1 XML Syntax. The terms
string literal,
XML literal,
and
transform the Infoset into the sequence of events in document order
are defined by the RDF/XML specification. The term
RDF graph
is defined by the
RDF
1.1 Concepts and Abstract Syntax specification.
The RDF namespace
is http://www.w3.org/1999/02/22-rdf-syntax-ns#
.
To parse a node as RDF/XML, where node is a document or element, run these steps:
If node is in non-RDF node, break.
Let infoset be the result of mapping node into an Infoset.
A document is a mapped to a document information item whose [children] is the mapped children of the document and [base URI] is the document's document base URL. The document information item's [document element] is the first element information item of its [children], if any.
An element is mapped to an element information item whose [local name] is the element's local name, [namespace name] is the element's namespace, [children] is the mapped children of the element, [attributes] is the mapped attributes of the element, and [base URI] is the element's node document's document base URL. If the element's parent is a node, the element information item's [parent] is the information item mapped from the element's parent.
An attribute is mapped to an attribute information item whose [local name] is the attribute's local name, [namespace name] is the attribute's namespace, and [normalized value] is the attribute's value.
A text is mapped to a sequence of zero or more character information items, whose [character code] values' concatenation as code points, in the same order, is equal to the text's data.
The mapped children of a node is a list of the information items mapped from the element and text children of the node, excluding any node in non-RDF nodes, in the same order.
The mapped attributes of an element is a list of the attribute information items mapped from the attributes in the attribute list of the element, excluding any node in non-RDF nodes, in the same order.
Any other node "attached" to the node, including but not limited to other kinds of children such as processing instructions and comments, template content, and shadow tree is added to non-RDF nodes.
Let events be the result of invoking the steps to transform the Infoset into the sequence of events in document order with infoset.
When
an
attribute information item is removed, add
the attribute from which the attribute
information item is created to non-RDF nodes
unless it is a base
or lang
attribute in the XML
namespace.
Let graph be the RDF graph obtained by matching events with the grammer.
The
grammer starts with production doc
if node is a document,
or RDF
otherwise.
If events does not match to the grammer production, add the node which is transformed to the first event preventing the match into non-RDF nodes. Otherwise, break.
The resolve (e,
s)
grammer action in RDF/XML
MUST run the following steps:
utf-8
.
The value of RDF/XML attributes whose value is directly resolved as an IRI reference MUST be a valid URL.
The lexical form of an RDF literal MUST conform to the relevant requirements of the
datatype of the literal
except when it is embedded in an RDF/XML fragment as parseTypeLiteralPropertyElt
or parseTypeOtherPropertyElt
.
An XML literal embedded using RDF/XML's feature is validated as part of the document. Result of the validation might be different from the validation result of the lexical form if the XML literal is not self-contained.
If the lexical form of an RDF literal is defined as an HTML or XML document (or a fragment of an HTML or XML document), it MUST be parsed with appropriate HTML parser or XML parser with following configurations:
Document
associated with the parser is not in
any browsing context.
That is, no script is executed and no external resource is retrieved by the parser.
Node
, the
scripting flag is set if and only if scripting is
enabled for the Node
.
en
(English).
An element is an RSS2 element if and only
if it is in no namespace and either it is the document
element and its local name is rss
or
one of its ancestor is an RSS2 element.
In other words, elements in no
namespace are RSS2 element if they belong to the main
document tree whose document element is the
rss
element in no namespace.
RSS2 elements are interpreted as described by RSS2 specifications [RSS2] [RSSPROFILE].
Atom and its extension specifications allow extensions such that almost everything is allowed, which is not useful for conformance checker. This specification defines stricter restrictions for the purpose of conformance checking.
Atom family namespaces are
http://purl.org/atom/ns#
,
http://www.w3.org/2005/Atom
,
http://purl.org/syndication/thread/1.0
,
http://purl.org/syndication/history/1.0
,
http://www.w3.org/2007/app
,
and
http://purl.org/atompub/tombstones/1.0
.
An Atom family element is an element in one of Atom family namespaces.
Constraints expressed in RELAX NG schema fragments in an applicable specifications for Atom family namespaces are to be interpreted as MUST-level requirements for the purpose of conformance checking.
Comments, inter-element whitespaces, and processing instructions MAY be inserted anywhere in an Atom family element.
For an Atom family element, an attribute
or
An unknown namespace element MAY be used as a child of Atom extensible elements, i.e. following elements:
extensionElement
, or
extensionSansTitleElement
in content
deleted-entry
elements in namespace
http://purl.org/atompub/tombstones/1.0
feed
elements in namespace
http://purl.org/atom/ns#
entry
elements in namespace
http://purl.org/atom/ns#
If the content of an Atom family element is defined as HTML or XHTML fragment, it MUST conform to relevant requirements in applicable specifications, using the rules for HTML or XHTML documents, respectively.
Elements complete
and archive
in
namespace http://purl.org/syndication/history/1.0
MUST NOT be used more than once in an
feed
element in namespace
http://www.w3.org/2005/Atom
.
Concepts used to describe constraints for Atom family elements MUST be interpreted as following:
addr-spec
XXX need to define <atom:content> content validation when type is a MIME type
An HTML meta
element MAY have a
property
attribute.
If the attribute is specified, the element MUST
NOT have a name
attribute or an
attribute that cannot be used when a name
attribute is specified. The content
attribute MUST be specified if a property
attribute is specified. An
HTML meta
element with property
attribute is metadata
content (it is not a phrasing content).
The value of a property
attribute MUST be an OGP property
name. A OGP property name is a property
value defined by an
applicable specification or a prefixed property
value. A prefixed property value is a
property prefix followed by a U+003A
COLON
character (:
) followed by one
or more characters. A property prefix is a string of one
or more characters that is not a U+003A
COLON
character (:
) and is not used
by property
value defined by an
applicable specification as prefix. A deprecated
property
value SHOULD NOT be used.
A property
value MUST NOT be used unless it is defined in the
context it is used by an applicable specification.
If the property
attribute
value is og:type
, the content
attribute value MUST be a value allowed as an og:type
value or a prefixed property value.
An example of applicable specifications is The Open Graph protocol.
This section is non-normative.
Whether an element can be used as the document element or not is determined by following steps:
This section is non-normative.
Conformance of an element is checked as follows:
This section is non-normative.
The data-web-defs repository contains some machine-readable data for definitions in this specification.
The tests-web repository contains conformance checking test data.
This document is written by Wakaba <wakaba@suikawiki.org>.
This document is developed as part of the the manakai project.
Per CC0, to the extent possible under law, the author has waived all copyright and related or neighboring rights to this work.