Embedding Web Annotations in HTML

Abstract

The Web Annotation Working Group has defined a foundational Web Annotation Data Model [annotation-model] and Web Annotation Vocabulary [annotation-vocab] for annotating resources found on the Web. The Data Model includes the description of a JSON-LD [json-ld] based serialization.

This document describes and illustrates potential approaches for including Web Annotation documents and content within HTML documents.

The approaches described are incomplete and preliminary. They are speculative and representative only and by no means exhaust the full range of feasible options. The approaches discussed in this note have emerged from Working Group discussions [reference issue #147?] and should be considered no more than early starting points for further experimentation and development.

Introduction

According the Web Annotation Data Model [annotation-model]:

"An annotation is considered to be a set of connected resources, typically including a body and target, and conveys that the body is related to the target.... This perspective results in a basic model with three parts, depicted below."

The Web Annotation Data Model defines an extensible, interoperable framework for expressing annotations in RDF-based vocabularies and additional provides definition of a JSON-LD [json-ld] based serialization. However, it does not prescribe a means for expressing a Web Annotation within a Web page--whether that annotation targets the page itself or some external Web resource.

For example, the quotation above is simply an HTML blockquote and contains no encoded reference to the quotation as it exists within the Web Annotation Data Model [annotation-model] document.

This document considers how a conformant expression of a Web annotation that targets a fragment of an HTML document might usefully be embedded within the same or another HTML document. In addition to text embedded within the DOM, HTML documents often reference external resources such as images that are meant to be considered integral parts of an HTML Document when viewed on the Web, e.g., through the use of the HTML <img> element's src attribute. The approaches discussed in this Note also would facilitate embedding in HTML an annotation targeting (in the context or scope of a Web page) an image or other similar Web page component resource external to an HTML document.

Pending further experimentation, use-case development, incubation and broader collaborative efforts, this document stops short of proposing any new, annotation-specific extensions to HTML 5 [html]. nor does it propose any new, annotation-specific HTML elements or attributes. To address specific use cases, possible extensions have been discussed within the Web Annotation Working Group and elsewhere, e.g., a proposal for new note, notegroup, and noteref HTML elements to better handle footnote-style references within HTML documents [html-notes]. But for now this document only considers approaches that can work out of the box today, without the need for an extension of HTML or the Web Annotation Vocabulary [annotation-vocab]. These approaches rely on existing mechanisms, e.g., RDFa, CSS, JavaScript and JSON-LD, etc. to embed annotations within HTML documents.

This document also neither addresses in detail nor proposes user interface implementations. Where available demonstration or developmental implementations are referenced to illustrate the approaches described, but defining the optimum interface(s) for creating or editing annotations embedded in HTML is not the focus of this document.

Motivation

The interest in embedding Web Annotations within HTML is motivated by several generic use cases that have arisen over the course of the Web Annotation Working Group's existence, a few of which are listed here:

Personal Annotating: Directly adding personal annotations to a locally stored copy of an HTML Web page obviates the need to create and maintain a separate system to store and manage annotations.
Portable Annotation: Similarly, directly adding annotations to a local copy of an HTML Web page not only facilitates local (perhaps temporary) storage, but also provides a mechanism for distribution (e.g. via email attachment).
Offline-first Annotation: Embedded annotations might later be extracted from the locally stored HTML and published to an online Web Annotation Protocol [annotation-protocol] store of annotations when the user is back online.
Lighweight, decentralized Annotation Tools: Embedding annotations in HTML may facilitate the creation of lightweight annotation tools that are meant to be deployed in a decentralized fashion (i.e. copies deployed by individuals and used independently of an centralized annotation server or service).
Collaborative Annotating: Directly adding annotations to an HTML Web page that is being collaboratively authored, edited, and/or curated by a group of individuals, obviates the need for the group to select and agree on a single, central annotation repository. or collaboration environment that must be used by all to ingest, store, dessiminate and otherwise manage their annotations. But because the annotations as stored in the HTML still use a standard model and vocabulary, the individualized CSS and JavaScript tools used by members of the group for viewing and interacting with the annotations can still refer to the standard set of elements and attributes; the effect will be the same regardless of which annotation tool is used.
Wholly Internal Annotations: Requiring the external storage of footnote annotations (and the like), e.g., annotations linking target text at one point in an HTML document to an item (i.e., annotation body) in a reference list elsewhere in the same HTML document is inefficient in most instances and makes interactive displays of these footnotes unduly complicated.

Model and Vocabulary Conformance

The Web Annotation Data Model and Vocabuarly Recommendations constrain any approach for embedding annotations in HTML:

An Annotation must have exactly 1 IRI that identifies it. While in some circumstances this IRI could be derived from the IRI of the HTML document (e.g., using an HTML fragment identifier), if the annotation is to be maintained and made accessible separate from the HTML document, it may be better to mint a wholly independent IRI. Regardless, the dictates of the Data Model [annotation-model] Section 3.3.7 (Other Identities) must be observed.
TODO: put here the salient points from the discussion of context documents and vocab from issue #347
TODO? approach should use a recognized RDF in HTML serialization? Which also means you have a route back to json-ld as required by model.

Terminology

TODO: add / subtract additional terminology as needed.

IRI: An IRI, or Internationalized Resource Identifier, is an extension to the URI specification to allow characters from Unicode, whereas URIs must be made up of a subset of ASCII characters. There is a mapping algorithm for translating between IRIs and the equivalent encoded URI form. IRIs are defined by [rfc3987].
Resource: An item of interest that MAY be identified by an IRI.
Web Resource: A Resource that MUST be identified by an IRI, as described in the Web Architecture [webarch]. Web Resources MAY be dereferencable via their IRI.
Segment (of Interest): The part of the Resource that is selected using a Selector.
External Web Resource: A Web Resource which is not part of the representation the selection, such as a web page, image, or video. External Web Resources are dereferencable from their IRI.
Property: A feature of a Resource, that often has a particular data type. In the model sections, the term "Property" is used to refer to only those features which are not Relationships and instead have a literal value such as a string, integer, or date. The valid values for a Property are thus any data type other than object, or an array containing members of that data type if more than one is allowed.
Relationship: In the model sections, the term "Relationship" is used to distinguish those features that refer to other Resources, either by reference to the Resource's IRI or by including a description of the Resource in the representation. The valid values for a Relationship are: a quoted string containing an IRI, an object that has the "id" property, or an array containing either of these if more than one is allowed.
Class: Resources may be divided, conceptually, into groups called "classes"; members of a class are known as Instances of that class. Resources are associated with a particular class through typing. Classes are identified by IRIs, i.e., they are also Web Resources themselves.
Type: A special Relationship that associates an Instance of a class to the Class it belongs to.
Instance: An element of a group of Resources represented by a particular Class.

Annotations Embedded as JSON-LD

JSON-LD [json-ld] is the serialization format specified in the Web Annotation Data Model [annotation-model]. HTML can accommodate this serialization format directly via the use of the HTML <script> element with its type attribute assigned the media type for a Web Annotation: application/ld+json;profile="http://www.w3.org/ns/anno.jsonld"

Example

Example 1: JSON-LD example - annotating the transcribed motto

<script id='anno-588a322026bbcc00203fd0fb' class='motto' type='application/ld+json;profile="http://www.w3.org/ns/anno.jsonld"'>
{
    "@context": "http://www.w3.org/ns/anno.jsonld",
    "id": "http://example.org/AnnotationIllustration-EO-Camel.html#snno-588a322026bbcc00203fd0fb",
    "type": "Annotation",
    "motivation": "describing",

    "target": {
        "source": "http://example.org/AnnotationIllustration-EO-Camel.html",
        "selector": {
            "type": "FragmentSelector",
            "conformsTo": "http://tools.ietf.org/rfc/rfc3236",
            "value": "mottoTranscription"
        }
    },

    "body": {
        "type": "TextualBody",
        "value": "Fiunt, quae posse negabas.",
        "format": "text/plain",
        "language": "la"
    },

    "created": "2017-01-26T17:30:04.639Z",
    "creator": {
        "type": "Person",
        "email": "mail-to:t-cole3@illinois.edu",
        "name": "Tim Cole"
    }
}
</script>

Example

Example 2: JSON-LD example - annotating the embedded image

<script id='anno-588a3d0a26bbcc00203fd0fc' class='iconclass' type='application/ld+json;profile="http://www.w3.org/ns/anno.jsonld"'>
{
    "@context": "http://www.w3.org/ns/anno.jsonld",
    "id": "http://example.org/AnnotationIllustration-EO-Camel.html#anno-588a3d0a26bbcc00203fd0fc",
    "type": "Annotation",
    "motivation": "tagging",

    "target": {
        "source": "http://emblemimages.grainger.illinois.edu/meditationesembl00voge/JPGthumbnail/emblem/E000004.jpg",
        "scope": "http://example.org/AnnotationIllustration-EO-Camel.html"
    },

    "body": "http://iconclass.org/rkd/25F24",

    "created": "2017-01-26T18:16:42.26Z",
    "creator": {
        "type": "Person",
        "email": "mailto:t-cole3@illinois.edu",
        "name": "Tim Cole"
    }
}
</script>

Example

Example 3: JSON-LD example - annotating the html page

<script id='anno-588a42e726bbcc00203fd0fd' class='related' type='application/ld+json;profile="http://www.w3.org/ns/anno.jsonld"'>
{
    "@context": "http://www.w3.org/ns/anno.jsonld",
    "id": "http://example.org/AnnotationIllustration-EO-Camel.html#anno-588a42e726bbcc00203fd0fd",
    "type": "Annotation",
    "motivation": "linking",

    "target": "http://example.org/AnnotationIllustration-EO-Camel.html",

    "body": "http://hdl.handle.net/10111/EmblemRegistry:E012658",

    "created": "2017-01-26T18:41:43.218Z",
    "creator": {
        "type": "Person",
        "email": "mailto:t-cole3@illinois.edu",
        "name": "Tim Cole"
    }
}
</script>

TODO: Summarize comparitive advantages of this approach, e.g., facilitates overlapping targets, easy export/import without tranformation back in to json-ld, etc.

Annotations Embedded as RDFa

TODO: Describe how dokieli embeds Web Annotations in HTML using RDFa. Include examples of annotations in RDFa. Otherwise keep relatively succint and focused on the annotation in RDFa, with less about the complete application Instead point to the application...

TODO: Summarize comparitive advantages of this approach, e.g., markup of the target with attributes, etc. ...

Example

Example 4: dokieli_Example(s)

GET /annotation/csarven.ca/linked-data-notifications/decentralisation HTTP/1.1
Host: linkedresearch.org
Accept: text/html
Accept-Language: en-GB,en;q=0.8, en-US;q=0.6

HTTP/1.1 200 OK
Content-Type: text/html;charset=utf-8
Content-Language: en

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
  <head>
    <meta charset="utf-8" />
    <title>https://linkedresearch.org/annotation/csarven.ca/linked-data-notifications/decentralisation</title>
  </head>
  <body>
    <main>
      <article id="decentralisation" about="i:" typeof="oa:Annotation" prefix="rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# schema: http://schema.org/ dcterms: http://purl.org/dc/terms/ oa: http://www.w3.org/ns/oa# i: https://linkedresearch.org/annotation/csarven.ca/linked-data-notifications/decentralisation">
        <h1 property="schema:name">Sarven Capadisli <span rel="oa:motivatedBy" resource="oa:comments">comments</span></h1>
        <dl class="author-name"><dt>Authors</dt><dd><span rel="schema:creator"><span about="http://csarven.ca/#i" typeof="schema:Person"><img alt="" height="48" rel="schema:image" src="http://csarven.ca/media/images/sarven-capadisli.jpg" width="48" /> <a href="http://csarven.ca/" rel="schema:url"><span about="http://csarven.ca/#i" property="schema:name">Sarven Capadisli</span></a></span></span></dd></dl>
        <dl class="published"><dt>Published</dt><dd><a href="https://linkedresearch.org/annotation/csarven.ca/linked-data-notifications/decentralisation"><time datetime="2016-12-15T11:31:40.622Z" datatype="xsd:dateTime" property="schema:datePublished" content="2016-12-15T11:31:40.622Z">2016-12-15 11:31:40</time></a></dd></dl>
        <dl class="rights"><dt>Rights</dt><dd><a href="https://creativecommons.org/licenses/by/4.0/" rel="dcterms:rights">CC BY 4.0</a></dd></dl>
        <dl class="canonical"><dt>Canonical</dt><dd about="i:" rel="oa:canonical" resource="urn:uuid:5fdc8f8d-5930-4a81-8534-cad81af7d9c1">5fdc8f8d-5930-4a81-8534-cad81af7d9c1</dd></dl>
        <dl class="target"><dt><a href="http://csarven.ca/linked-data-notifications#abstract" rel="oa:hasTarget">In reply to</a> (<a about="http://csarven.ca/linked-data-notifications#abstract" href="http://csarven.ca/linked-data-notifications" rel="oa:hasSource" typeof="oa:SpecificResource">part of</a>)</dt><dd><blockquote about="http://csarven.ca/linked-data-notifications#abstract" cite="http://csarven.ca/linked-data-notifications#abstract"><span rel="oa:hasSelector" resource="i:#fragment-selector" typeof="oa:FragmentSelector"><meta property="rdf:value" content="#abstract" xml:lang="" lang="" rel="dcterms:conformsTo" resource="https://tools.ietf.org/html/rfc3987" /><span rel="oa:refinedBy" resource="i:#text-quote-selector" typeof="oa:TextQuoteSelector"><span property="oa:prefix" xml:lang="en" lang="en">N provides a building block for </span><mark property="oa:exact" xml:lang="en" lang="en">decentralised</mark><span property="oa:suffix" xml:lang="en" lang="en"> Web applications. This permits e</span></span></span></blockquote></dd></dl><dl class="renderedvia"><dt>Rendered via</dt><dd><a about="http://csarven.ca/linked-data-notifications#abstract" href="https://dokie.li/" rel="oa:renderedVia">dokieli</a></dd></dl>
        <section id="note-decentralisation" rel="oa:hasBody" resource="i:#note-decentralisation"><h2 property="schema:name">Note</h2><div datatype="rdf:HTML" property="rdf:value schema:description" resource="i:#note-decentralisation" typeof="oa:TextualBody">Communities have various semantics for the term <em>decentralisation</em>.</div><dl class="rights"><dt>Rights</dt><dd><a href="https://creativecommons.org/licenses/by/4.0/" rel="dcterms:rights">CC BY 4.0</a></dd></dl></section>
      </article>
    </main>
  </body>
</html>

Web Annotation-based Citation URLs

The Selectors and States Note [selectors-states] published by the Web Annotation Working Group includes information on encoding Web Annotation Selectors and State classes as IRI Fragmenet Identifiers. The following examples show how these URLs could be used to reference portions of a Specific Resource on the Web via IRIs:

Example using <blockquote> and <q> tags

Example 5: Example blockquote and q tags using the cite attribute

<blockquote cite="https://www.w3.org/TR/annotation-model/">
  <q cite="https://www.w3.org/TR/annotation-model/#selector(type=TextPositionSelector,start=8424,end=8270)">
  An annotation is considered to be a set of connected resources, typically including
   a body and target, and conveys that the body is related to the target.</q>

  <q cite="https://www.w3.org/TR/annotation-model/#selector(type=TextPositionSelector,start=8651,end=8576)">
  This perspective results in a basic model with three parts, depicted below.</q>

  <!-- TODO: how do we make this an annotation? -->
  <img src="images/intro_model.png" alt="Basic Model: Annotation, Body and Target" width="400"/>
</blockquote>

The Selectors and States Note [selectors-states] explains that fragment identifiers are technically defined when the media type is specified. However in practice the utilization of fragment identifiers by publishers and developers ranges from browser state handling to anchoring highlights of quotations (as seen here).

Using these fragement identifiers as values of the cite attribute on <blockquote> and <q> tags provides a means for both specificity and future extensibility. Site authors as well as browser, server, and JavaScript developers may take advantage of these citations identifiers for re-anchoring selection or extracting (and verifying) quotions made within an HTML document which uses this method.

Example using an <a> tag

Using the same methods described above, <a> tags may also be used to express a desired highlight or reference. However, as mentioned above, the use of that fragement within the retrieved resource may vary.

Example 6: Example using an anchor tag

<p>According to the Web Annotation Data Model spec
<a href="https://www.w3.org/TR/annotation-model/#selector(type=TextPositionSelector,start=8424,end=8270)">
an annotation is considered to be a set of what things?</a>
(click the link to find out!)</p>

Embedding Web Annotations in HTML

Reference Note

W3C Working Group Note 15 April 2025

Abstract

Status of This Document

Introduction

Motivation

Model and Vocabulary Conformance

Terminology

Annotations Embedded as JSON-LD

Example

Example

Example

Annotations Embedded as RDFa

Example

Web Annotation-based Citation URLs

Example using <blockquote> and <q> tags

Example using an <a> tag

Annotations Embedded in HTML as ??? - space for an additional approach

Example

Selectors?

Acknowledgements