DOM Tree Validation

The manakai project, 22 April 2018

Latest version
https://manakai.github.io/spec-dom/validation-langs

Status of This Document

This document is a technical specification produced as part of the manakai project. It might be updated, replaced, or obsoleted by other documents at any time.

The scope of this specification is limited to the products within the manakai project. It does not intended to be implemented by multiple parties, although nothing prevents it from implemented by other DOM implementations.

Comments on this document are welcome and may be sent to the author.

Table of contents

  1. 1 Introduction
    1. 1.1 Scope
    2. 1.2 History
  2. 2 Infrastructure
  3. 3 Validation modes
  4. 4 Validation of unknown namespaces
  5. 5 RDF/XML integration
  6. 6 RSS elements
  7. 7 Atom elements
  8. 8 OGP
  9. 9 Document element
  10. 10 Checking an element
  11. 11 Data
  12. References
  13. Author

1 Introduction

This section is non-normative.

This specification defines details of DOM tree validation not covered by other applicable specifications.

1.1 Scope

This specification defines:

... for validator implementations by the manakai project.

1.2 History

This specification was originally published at <http://suika.suikawiki.org/www/markup/xml/validation-langs> as Handling of unknown namespaces in conformance checking since .

Earlier versions of this specification contained non-normative descriptions of how to validate HTML script and style elements. As definitions of these elements were slightly simplified in the HTML Standard, these descriptions were removed.

2 Infrastructure

This specification depends on the Infra Standard. The terms list, code point, concatenate, HTML namespace, and XMLNS namespace are defined by the Infra Standard.

For the purpose of this specification, user agents are conformance checkers, also known as validators.

User agents MUST implement the DOM Standard. Terms child, parent, ancestor, node, node document, document, content type, HTML document, XHTML document, document element, element, attribute list, attribute, value, namespace (of element or attribute), local name (of element or attribute), comment (Comment), processing instruction (ProcessingInstruction), template content, and shadow tree are defined by the DOM Standard.

User agents MUST implement the HTML Standard. Terms applicable specification, document base URL, inter-element whitespace, and valid e-mail address are defined by the HTML Standard.

User agents MUST support XML [XML] and XML Namespaces [XMLNS].

The terms information item, document information item, element information item, attribute information item, character information item, [children], [parent], [attributes], [document element], [base URI], [normalized value], [character code], [namespace name], and [local name], are defined by the XML Information Set specification.

User agents are not required to implement other specifications, but are expected to follow steps defined in this specification. For example, a user agent might not support XSLT, but is is still required to detect whether an element is interpreted per XSLT specifications or not as defined by this specification.


The terms valid URL and valid absolute URL are defined by URL Standard as URL and absolute URL, respectively.

The term valid MIME type is defined by MIME Sniffing Standard as parsable MIME type. The utf-8 character encoding is defined by Encoding Standard.

The term valid language tag is defined by BCP 47.

The term Date construct is defined by Atom 1.0 specification.

The term literal result element is defined by the XSLT1 specification. The XSLT namespace is http://www.w3.org/1999/XSL/Transform.

The RSS1 namespace is http://purl.org/rss/1.0/.


When a value is required to match the production labeled document in the XML specification, it MUST also conform to other requirements for XML documents.

3 Validation modes

A validation mode identify the specifications that describe the requirements used to validate a node. There are five modes:

Default mode
Conformance of the node is governed by its definition in the native element specifications or the RSS2 specifications, or by the rules for the unknown namespace elements and attributes. The native element specifications are the applicable specifications such as HTML Standard, except for the XSLT specifications, the RDF/XML specification, the RSS1 specifications, the RSS2 specifications, and this specification.
XSLT mode
Conformance of the node is governed by the XSLT specifications. For the purpose of this specification, the XSLT specifications are any applicable specifications defining XSLT and its extensions. XSLT family elements are elements in the XSLT namespace and XSLT extension namespaces defined by XSLT specifications. Note that XSLT literal result elements are not XSLT family elements.
RDF mode
Conformance of the node is governed by the RDF/XML specification.
RSS1 mode
Conformance of the node is governed by both the RDF/XML specification and the RSS1 specifications. For the purpose of this specification, the RSS1 specifications are any applicable specifications defining RSS1 and its modules.

To determine the validation mode of a node node, run these steps:

  1. If node is a document element, switch by node's node document's content type:
    application/xml or text/xml
    If node is a literal result element and node has the version attribute in the XSLT namespace, return XSLT mode and abort these steps.
    application/xslt+xml or text/xsl
    If node is an XSLT family element or a literal result element, return XSLT mode and abort these steps.
    application/rdf+xml
    Return RDF mode and abort these steps.
  2. Let parent be node's parent.
  3. Let parent mode be default mode.
  4. If parent is not null, set parent node to the result of determining the validation mode of parent.
  5. If parent mode is RDF mode or RSS1 mode and parent is an XML literal, set parent mode to default mode.
  6. If node is an XSLT family element, or if parent mode is XSLT mode and node is an literal result element, return XSLT mode and abort these steps.
  7. If node is an RDF element in the RDF namespace:
    1. If node has the xmlns attribute in the XMLNS namespace whose value is the RSS1 namespace and the rdf attribute in the XMLNS namespace whose value is the RDF namespace, return RSS1 mode and abort these steps.
    2. Otherwise, return RDF mode and abort these steps.
  8. If parent mode is RDF mode or RSS1 mode and node is considered as matching to a part of the current production in the RDF/XML Syntax Data Model, return parent mode and abort these steps.
  9. Return default mode.

Elements not conforming to the RDF/XML specification (e.g. an element in string literal) are validated in default mode.

4 Validation of unknown namespaces

An element or attribute is in no namespace if its namespace is null.

A namespace which is not null is supported if it is defined by a native element specification.

A namespace which is not null is unknown if it is not supported.

An element is unknown namespace element if its namespace is unknown, or it is in no namespace and is not an RSS2 element.

An attribute is unknown namespace attribute if its namespace is unknown.


An unknown namespace element or unknown namespace attribute MUST NOT be used anywhere except where they are explicitly allowed.

An unknown namespace element MAY be used as the document element or as an orphan node. It MAY also be used where any kind of element is allowed.

An unknown namespace element MAY have any attribute in no namespace or any unknown namespace attribute.

Additionally, any attribute in a supported namespace might also be specified to an unknown namespace element as long as it is allowed by an applicable specification.

An unknown namespace element MAY have any kind of child (unless otherwise disallowed).

An unknown namespace attribute MAY have any value.

An attribute in no namespace of an unknown namespace element MAY have any value.

Conformance checkers MAY warn any unknown namespace element or unknown namespace attribute as use of it could be an authoring error.

Conformance checkers are expected to report errors or warnings on unknown elements and attributes useful for authors.

This specification is not intended to override any other specification's requirements.

For a public Web document, non-standard element which is not defined by any applicable specification ought to be reported as an error.

In the following example, as the bookmark element in the http://mybookmark.example/ namespace is not defined by any applicable specification, this fragment is non-conforming:

<div xmlns="http://www.w3.org/1999/xhtml">
  <bookmark xmlns="http://mybookmark.example/">Hello</bookmark>
</div>

For XML data that is not expected to be directly shown to user (e.g. an XML data retrieved via XMLHttpRequest), use of null- or non-standard namespaces ought not to be an error.

For example, following document fragment should not be considered as non-conforming, nevertheless none of data and item elements and the name attribute is defined by any public standard:

<data>
  <item name="x1"/>
  <item name="x2"><p xmlns="http://www.w3.org/1999/xhtml">Hi!</p></item>
</data>

If the p element contained an item element in no namespace, it ought to be reported as an error, as no standard defines the item element in no namespace as phrasing content.

If there is an element in the http://www.w3.org/2000/svg/ namespace, an error or warning ought to be reported. It is likely an authoring error.

5 RDF/XML integration

This section is work in progress.

This section is only applied to user agents supporting RDF/XML.

RDF/XML is defined by the RDF/XML specification, i.e. RDF 1.1 XML Syntax. The terms string literal, XML literal, and transform the Infoset into the sequence of events in document order are defined by the RDF/XML specification. The term RDF graph is defined by the RDF 1.1 Concepts and Abstract Syntax specification. The RDF namespace is http://www.w3.org/1999/02/22-rdf-syntax-ns#.

To parse a node as RDF/XML, where node is a document or element, run these steps:

  1. Let non-RDF nodes be an empty list.
  2. Repeat:
    1. If node is in non-RDF node, break.

    2. Let infoset be the result of mapping node into an Infoset.

      A document is a mapped to a document information item whose [children] is the mapped children of the document and [base URI] is the document's document base URL. The document information item's [document element] is the first element information item of its [children], if any.

      An element is mapped to an element information item whose [local name] is the element's local name, [namespace name] is the element's namespace, [children] is the mapped children of the element, [attributes] is the mapped attributes of the element, and [base URI] is the element's node document's document base URL. If the element's parent is a node, the element information item's [parent] is the information item mapped from the element's parent.

      An attribute is mapped to an attribute information item whose [local name] is the attribute's local name, [namespace name] is the attribute's namespace, and [normalized value] is the attribute's value.

      A text is mapped to a sequence of zero or more character information items, whose [character code] values' concatenation as code points, in the same order, is equal to the text's data.

      The mapped children of a node is a list of the information items mapped from the element and text children of the node, excluding any node in non-RDF nodes, in the same order.

      The mapped attributes of an element is a list of the attribute information items mapped from the attributes in the attribute list of the element, excluding any node in non-RDF nodes, in the same order.

      Any other node "attached" to the node, including but not limited to other kinds of children such as processing instructions and comments, template content, and shadow tree is added to non-RDF nodes.

    3. Let events be the result of invoking the steps to transform the Infoset into the sequence of events in document order with infoset.

      When an attribute information item is removed, add the attribute from which the attribute information item is created to non-RDF nodes unless it is a base or lang attribute in the XML namespace.

    4. Let graph be the RDF graph obtained by matching events with the grammer.

      The grammer starts with production doc if node is a document, or RDF otherwise.

      If events does not match to the grammer production, add the node which is transformed to the first event preventing the match into non-RDF nodes. Otherwise, break.

  3. Return graph and non-RDF nodes.

The resolve (e, s) grammer action in RDF/XML MUST run the following steps:

  1. Let url be the result of resolving s relative to the base-uri accessor of e, using character encoding utf-8.
  2. If url is in error, return s.
  3. Otherwise, return url.

The value of RDF/XML attributes whose value is directly resolved as an IRI reference MUST be a valid URL.

The lexical form of an RDF literal MUST conform to the relevant requirements of the datatype of the literal except when it is embedded in an RDF/XML fragment as parseTypeLiteralPropertyElt or parseTypeOtherPropertyElt.

An XML literal embedded using RDF/XML's feature is validated as part of the document. Result of the validation might be different from the validation result of the lexical form if the XML literal is not self-contained.

If the lexical form of an RDF literal is defined as an HTML or XML document (or a fragment of an HTML or XML document), it MUST be parsed with appropriate HTML parser or XML parser with following configurations:

6 RSS elements

An element is an RSS2 element if and only if it is in no namespace and either it is the document element and its local name is rss or one of its ancestor is an RSS2 element.

In other words, elements in no namespace are RSS2 element if they belong to the main document tree whose document element is the rss element in no namespace.

RSS2 elements are interpreted as described by RSS2 specifications [RSS2] [RSSPROFILE].

7 Atom elements

Atom and its extension specifications allow extensions such that almost everything is allowed, which is not useful for conformance checker. This specification defines stricter restrictions for the purpose of conformance checking.

Atom family namespaces are http://purl.org/atom/ns#, http://www.w3.org/2005/Atom, http://purl.org/syndication/thread/1.0, http://purl.org/syndication/history/1.0, http://www.w3.org/2007/app, and http://purl.org/atompub/tombstones/1.0.

An Atom family element is an element in one of Atom family namespaces.

Constraints expressed in RELAX NG schema fragments in an applicable specifications for Atom family namespaces are to be interpreted as MUST-level requirements for the purpose of conformance checking.

Comments, inter-element whitespaces, and processing instructions MAY be inserted anywhere in an Atom family element.

For an Atom family element, an attribute or child that is not explicitly allowed by an applicable specification MUST NOT be used.

An unknown namespace element MAY be used as a child of Atom extensible elements, i.e. following elements:

If the content of an Atom family element is defined as HTML or XHTML fragment, it MUST conform to relevant requirements in applicable specifications, using the rules for HTML or XHTML documents, respectively.

Elements complete and archive in namespace http://purl.org/syndication/history/1.0 MUST NOT be used more than once in an feed element in namespace http://www.w3.org/2005/Atom.

Concepts used to describe constraints for Atom family elements MUST be interpreted as following:

RFC 3066 language tag
Valid language tag
URI
IRI
IRI reference
Valid URL (If IRI and IRI reference are distinguished, valid absolute URL and valid URL, respectively.)
Media type
MIME media type
Valid MIME type
addr-spec
E-mail address
Valid e-mail address
W3C Date-Time string
Same as element content of Date construct

XXX need to define <atom:content> content validation when type is a MIME type

8 OGP

An HTML meta element MAY have a property attribute. If the attribute is specified, the element MUST NOT have a name attribute or an attribute that cannot be used when a name attribute is specified. The content attribute MUST be specified if a property attribute is specified. An HTML meta element with property attribute is metadata content (it is not a phrasing content).

The value of a property attribute MUST be an OGP property name. A OGP property name is a property value defined by an applicable specification or a prefixed property value. A prefixed property value is a property prefix followed by a U+003A COLON character (:) followed by one or more characters. A property prefix is a string of one or more characters that is not a U+003A COLON character (:) and is not used by property value defined by an applicable specification as prefix. A deprecated property value SHOULD NOT be used.

A property value MUST NOT be used unless it is defined in the context it is used by an applicable specification.

Many property values are only defined for speciic og:type values.

If the property attribute value is og:type, the content attribute value MUST be a value allowed as an og:type value or a prefixed property value.

An example of applicable specifications is The Open Graph protocol.

9 Document element

This section is non-normative.

Whether an element can be used as the document element or not is determined by following steps:

  1. Determine the validation mode of the element.
  2. If the element is in the XSLT mode, return whether the element is allowed as the document element according to XSLT specifications.
  3. Otherwise, if the element is in the RDF mode, return whether the element is allowed as the document element according to RDF/XML specification.
  4. Otherwise, if the element is in the RSS1 mode, in a supported namespace, or an RSS2 element, return whether the element is allowed as the document element by an applicable specification.
  5. Otherwise, the element is an unknown namespace element. Return true.

10 Checking an element

This section is non-normative.

Conformance of an element is checked as follows:

  1. Determine the validation mode of the element.
  2. Let mode be the validation mode of the element.
  3. If mode is XSLT mode, check the element according to XSLT specifications.
  4. Otherwise, if mode is RDF mode, check the element according to the RDF/XML specification.
  5. Otherwise, if mode is RSS1 mode, check the element according to the RDF/XML specification and RSS1 specifications.
  6. Otherwise, if the element is in a supported namespace or is an RSS2 element:
    1. If the element is defined in an applicable specification, check the element according to the specification.
    2. Otherwise, report an "unknown element" error.
  7. Otherwise, the element is an unknown namespace element. Report an "unknown element" warning.

11 Data

This section is non-normative.

The data-web-defs repository contains some machine-readable data for definitions in this specification.

The tests-web repository contains conformance checking test data.

References

RSS2
RSS 2.0, RSS Advisory Board.
RSSPROFILE
Really Simple Syndication Best Practices Profile, RSS Advisory Board.
XML
Extensible Markup Language (XML), W3C Recommendation.
XMLNS
Namespaces in XML, W3C Recommendation.

Author

This document is written by Wakaba <wakaba@suikawiki.org>.

This document is developed as part of the the manakai project.

Per CC0, to the extent possible under law, the author has waived all copyright and related or neighboring rights to this work.