The Shape Expressions (ShEx) language describes RDF nodes and graph structures. A node constraint describes an RDF node (IRI, blank node or literal) and a shape describes the triples involving nodes in an RDF graph. These descriptions identify predicates and their associated cardinalities and datatypes. ShEx shapes can be used to communicate data structures associated with some process or interface, generate or validate data, or drive user interfaces.

This document defines programming and REST interfaces for instantiating and executing validation services. This includes the use of ShapeMaps to bind RDF data to ShEx schemas. See the Shape Expressions Primer for a non-normative description of shape maps.

This document will be presented to the Shape Expressions Community Group.

Introduction

The Shape Expressions (ShEx) language provides a structural schema for RDF data. This can be used to document APIs or datasets, aid in development of API-conformant messages, minimize defensive programming, guide user interfaces, or anything else that involves a machine-readable description of data organization and typing requirements.

A practical use of ShEx is to test nodes in RDF nodes for conformance with shape expressions. This document defines interfaces for doing that.

To understand the programatic interface described in this specification and how it is intended to operate in a programming environment, it is useful to have working knowledge of WebIDL [[WebIDL]]. To understand how ShEx relates to RDF, it is helpful to be familiar with the basic RDF concepts [[RDF11-CONCEPTS]].

Terminology

The ShEx interface is defined using terms from RDF semantics [[!rdf11-mt]]:

For the purposes of the ShExProcessor interface, a promise is an object that represents the eventual result of a single asynchronous operation. Promises are defined in [[ECMASCRIPT-6.0]].

Conformance criteria are relevant to authors and authoring tool implementers. As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

ShapeMap structure

The ShEx specification defines two forms of ShapeMaps:

A result ShapeMap: has the following properties:

If the status property is absent, the status is assumed to be "conformant". The reason and appInfo properties may also be absent but have no default value.

A node selector is a subset of a SPARQL triple pattern with these restrictions:

A query ShapeMap is a ShapeMap with only a node, shape and possibly a status property. A query ShapeMap with a status must have a status of "conformant" or "nonconformant".

A ground ShapeMap is a ShapeMap in which all node selector/shape pairs have been replaced by a set of zero or more node/shape pairs. The ShEx validation process takes as input a ground ShapeMap.

async Application Programming Interface

This API provides a clean mechanism that enables developers to determine conformance of RDF Graphs to a ShEx Schema.

Each function in the ShEx API defines a set of results and errors. Errors are identified by their error code.

The ShExProcessor Interface

The ShExProcessor interface is the high-level programming structure that developers use to determine conformance of RDF Graphs to a ShEx Schema. All input parameters are static; meaning they are not modified by any defined API opperation.

parseSchema
Checks conformance of the given parseSchema.graph with parseSchema.schema using parseSchema.shapeMap according to .
  1. If validate.schema has the form of an IRI, set schema to the result of dereferencing the IRI, treating the result as ShExC, ShExJ, or some other RDF serialization, depending on the content-type of the result, and transforming into the ShExJ abstract syntax for processing.
  2. Otherwise, if a USVString, validate.schema is treated as a string, and parsed into the ShExJ abstract syntax, using format as a hint to determine the content-type.
  3. Otherwise, if an RDF Graph, schema is transforming into the ShExJ abstract syntax for processing.
  4. If validate.schema is parsed as ShExC, and a syntax error is found, reject promise passing syntax error.
  5. If a transformation to the ShExJ abstract syntax does not result in valid ShExJ, reject promise, passing invalid schema.
schemastring
The ShEx Schema, either as a dereferencable IRI (URL), an inline data string, or an RDF Graph which is transformed into the ShExJ abstract syntax.
A set of options to configure parsing.
parseData
Checks conformance of the given parseData.graph with parseData.data using parseData.shapeMap according to .
  1. If validate.graph has the form of an IRI, set graph to the result of dereferencing the IRI, treating the result an RDF Graph serialization, depending on the content-type of the result. Otherwise, validate.graph either is an RDF Graph, or a processor should use other means to transform it into an RDF Graph.
datastring
The ShEx Data, either as a dereferencable IRI (URL), an inline data string, or an RDF Graph which is transformed into the ShExJ abstract syntax.
A set of options to configure parsing.
resolveShapeMap
Checks conformance of the given resolveShapeMap.graph with resolveShapeMap.data using resolveShapeMap.shapeMap according to .
  1. parse queryShapeMap string or JSON doc (resolve prefixes?).
  2. construct fixedShapeMap with each fixed pairs and replacing node selectors with their corresponding nodes.
datastring
The ShEx Data, either as a dereferencable IRI (URL), an inline data string, or an RDF Graph which is transformed into the ShExJ abstract syntax.
A set of options to configure parsing.
validate
Checks conformance of the given validate.graph with validate.schema using validate.shapeMap according to .
  1. Process schema and graph according to the requirements in , creating shapeResults, a ShapeResults dictionary based on a copy of validate.shapeMap with entries added, as necessary, for each focus node fi in focusNodes associated with http://www.w3.org/ns/shex#Start as a shapeExprLabel, a BOOL result value, depending on if the node conforms with the referenced shape(s), and an optional message, describing details conformance, and promise is fulfilled using shapeResults.
  2. If graph is found not to conform using schema and validate.shapeMap, reject promise, passing non-conforming graph along with shapeResults.
schema
The ShEx Schema, either as a dereferencable IRI (URL), an inline data string, or an RDF Graph which is transformed into the ShExJ abstract syntax.
graph
The graph to check conformance with, either as a a dereferencable IRI to some RDF serialization, or as an accessable RDF Graph.
shapeMap
A dictionary mapping a node in validate.graph to one ore more shapeExprLabel.
options
A set of options to configure the algorithms.

The rest of this follows sections 6.2 - 6.5 in the orig API.

The Application Programming Interface

This API provides a clean mechanism that enables developers to determine conformance of RDF Graphs to a ShEx Schema.

The ShEx API uses Promises to represent the result of the various asynchronous operations. Promises are defined in [[ECMASCRIPT-6.0]]. General use within specifications can be found in [[promises-guide]]. When using promises, an error is reported as a rejected promise and a result is reported as an accepted promise.

The ShExProcessor Interface

The ShExProcessor interface is the high-level programming structure that developers use to determine conformance of RDF Graphs to a ShEx Schema.

It is important to highlight that implementations do not modify the input parameters. If an error is detected, the Promise is rejected passing a ShExError with the corresponding error code.

validate
Checks conformance of the given validate.graph with validate.schema using validate.shapeMap according to .
  1. Create a new Promise promise and return it. The following steps are then executed asynchronously.
  2. If validate.schema has the form of an IRI, set schema to the result of dereferencing the IRI, treating the result as ShExC, ShExJ, or some other RDF serialization, depending on the content-type of the result, and transforming into the ShExJ abstract syntax for processing.
  3. Otherwise, if a USVString, validate.schema is treated as a string, and parsed into the ShExJ abstract syntax, using format as a hint to determine the content-type.
  4. Otherwise, if an RDF Graph, schema is transforming into the ShExJ abstract syntax for processing.
  5. If validate.schema is parsed as ShExC, and a syntax error is found, reject promise passing syntax error.
  6. If a transformation to the ShExJ abstract syntax does not result in valid ShExJ, reject promise, passing invalid schema.
  7. If validate.graph has the form of an IRI, set graph to the result of dereferencing the IRI, treating the result an RDF Graph serialization, depending on the content-type of the result. Otherwise, validate.graph either is an RDF Graph, or a processor should use other means to transform it into an RDF Graph.
  8. Process schema and graph according to the requirements in , creating shapeResults, a ShapeResults dictionary based on a copy of validate.shapeMap with entries added, as necessary, for each focus node fi in focusNodes associated with http://www.w3.org/ns/shex#Start as a shapeExprLabel, a BOOL result value, depending on if the node conforms with the referenced shape(s), and an optional message, describing details conformance, and promise is fulfilled using shapeResults.
  9. If graph is found not to conform using schema and validate.shapeMap, reject promise, passing non-conforming graph along with shapeResults.
schema
The ShEx Schema, either as a dereferencable IRI (URL), an inline data string, or an RDF Graph which is transformed into the ShExJ abstract syntax.
graph
The graph to check conformance with, either as a a dereferencable IRI to some RDF serialization, or as an accessable RDF Graph.
shapeMap
A dictionary mapping a node in validate.graph to one ore more shapeExprLabel.
options
A set of options to configure the algorithms.

The RDFGraph Type

The RDFGraph type is describe an RDF Graph accessed as described in Graph access.

bgp
Performs a SPARQL Basic Graph Pattern query on the graph.

The ShExOptions Type

The ShExOptions type is used to pass various options to the ShExProcessor methods.

base
The base IRI to use when parsing validate.schema. If set, this overrides the validate.schema's IRI.
focusNodes
One or more nodes in validate.graph which are are checked for conformance against start.
format
One of ShExC or ShExJ.
processingMode
If set to shex-2.0, the implementation has to produce exactly the same results as the algorithms defined in this specification. If set to another value, the ShEx processor is allowed to extend or modify the algorithms defined in this specification to enable application-specific optimizations. The definition of such optimizations is beyond the scope of this specification and thus not defined. Consequently, different implementations may implement different optimizations. Developers must not define modes beginning with shex as they are reserved for future versions of this specification.

The ShapeResults Type

The ShapeResults type is used for reporting the result of conformance checking of an RDF Graph against a schema.

ShapeResults is a dictionary mapping a node in validate.graph to a sequence of ShapeResult entries.

shape
A shapeExprLabel in validate.schema against which node conformance was checked.
result
The result of the conformance check.
reason
An optional message describing the result of the conformance check.

Error Handling

This section describes the datatype definitions used within the ShEx API for error handling.

ShExError

The ShExError type is used to report processing errors.

code
a string representing the particular error type, as described in the various algorithms in this document.
message
an optional error message containing additional debugging information. The specific contents of error messages are outside the scope of this specification.

ShExErrorCode

The ShExErrorCode represents the collection of valid ShEx error codes.

invalid schema
The validate.schema argument to validate is invalid. @gkellogg: This could use some improvement.
non-conforming graph
The validate.graph is found not to conform with validate.schema.
syntax error
Parsing validate.schema as ShExC failed with a syntax error.

ShapeMap syntax

ShapeMaps can be easily transmitted an understood with a specialized syntax.

Relative and prefixed IRIs in the node position are resolved against the application-defined prefix map and base URL for the data. Likewise, schema IRI forms are resolved against the Schema PREFIX and namespace.

... status, reason, appInfo ...

d:n1 @ s:S1, # d: prefix from data, s: from schema
"foo"^^xsd:string@START!           # "foo" did not match the start shape.
  /"missing :p1"                   # The reason given is "missing :p1".
  $"appinfo":{"myextra1":["..."]} ,# The application provide structural data.
"chat"@en-fr@<http://...S3>?, # validate a literal
{FOCUS :p2 "abcd"@en-us}@START, # validate subjects of :p2 "abcd"
{_ :p3 FOCUS}@START # valide all objects of :p3

ShapeMap grammar

[1]    shapeMap    ::=   pair (',' pair)*;
[2]    pair    ::=   nodeSelector shapeSelector status? reason? jsonAttributes?
[3]    nodeSelector    ::=   objectTerm | triplePattern
[4]    subjectTerm    ::=   iri | BLANK_NODE_LABEL
[5]    objectTerm    ::=   subjectTerm | literal
[6]    triplePattern    ::=   '{' "FOCUS" iri (objectTerm | '_') '}'
| '{' (subjectTerm | '_') iri "FOCUS" '}'
[7]    shapeSelector    ::=   '@' (iri | "START") | ATSTART | ATPNAME_NS | ATPNAME_LN
[8]    status    ::=   '!' | '?'
[9]    reason    ::=   '/' string
[10]    jsonAttributes    ::=   '$' '"appinfo"' ':' jsonValue
[11]    jsonValue    ::=   'false' | 'null' | 'true' | jsonObject | jsonArray | DOUBLE | STRING_LITERAL2;
[12]    jsonObject    ::=   '{' (jsonMember (',' jsonMember)*)? '}';
[13]    jsonMember    ::=   STRING_LITERAL2 ':' jsonValue;
[14]    jsonArray    ::=   '[' (jsonValue (',' jsonValue)*)? ']';
[13t]    literal    ::=    rdfLiteral | numericLiteral | booleanLiteral
[16t]    numericLiteral    ::=    INTEGER | DECIMAL | DOUBLE
[65]    rdfLiteral    ::=    langString | string ("^^" iri)?
[134s]    booleanLiteral    ::=    "true" | "false"
[135s]    string    ::=       STRING_LITERAL1 | STRING_LITERAL_LONG1
| STRING_LITERAL2 | STRING_LITERAL_LONG2
[66]    langString    ::=       LANG_STRING_LITERAL1 | LANG_STRING_LITERAL_LONG1
| LANG_STRING_LITERAL2 | LANG_STRING_LITERAL_LONG2
[136s]    iri    ::=    IRIREF | prefixedName
[137s]    prefixedName    ::=    PNAME_LN | PNAME_NS

Terminals

[18t]    <IRIREF>    ::=    "<" ([^#0000- <>\"{}|^`\\] | UCHAR)* ">"
[140s]    <PNAME_NS>    ::=    PN_PREFIX? ":"
[141s]    <PNAME_LN>    ::=    PNAME_NS PN_LOCAL
[70]    <ATPNAME_NS>    ::=    "@" PN_PREFIX? ":"
[71]    <ATPNAME_LN>    ::=    "@" PNAME_NS PN_LOCAL
[142s]    <BLANK_NODE_LABEL>    ::=    "_:" (PN_CHARS_U | [0-9]) ((PN_CHARS | ".")* PN_CHARS)?
[15]    <AT_PNAME_NS>    ::=    '@' PNAME_NS
[16]    <AT_PNAME_LN>    ::=    '@' PNAME_LN
[17]    <AT_START>    ::=    "@START"
This terminal has precendence over LANGTAG
[145s]    <LANGTAG>    ::=    "@" ([a-zA-Z])+ ("-" ([a-zA-Z0-9])+)*
[19t]    <INTEGER>    ::=    [+-]? [0-9]+
[20t]    <DECIMAL>    ::=    [+-]? [0-9]* "." [0-9]+
[21t]    <DOUBLE>    ::=    [+-]? ([0-9]+ "." [0-9]* EXPONENT | "."? [0-9]+ EXPONENT)
[155s]    <EXPONENT>    ::=    [eE] [+-]? [0-9]+
[156s]    <STRING_LITERAL1>    ::=    "'" ([^'\\\n\r] | ECHAR | UCHAR)* "'"
[157s]    <STRING_LITERAL2>    ::=    '"' ([^\"\\\n\r] | ECHAR | UCHAR)* '"'
[158s]    <STRING_LITERAL_LONG1>    ::=    "'''" ( ("'" | "''")? ([^\\'\\] | ECHAR | UCHAR) )* "'''"
[159s]    <STRING_LITERAL_LONG2>    ::=    '"""' ( ('"' | '""')? ([^\"\\] | ECHAR | UCHAR) )* '"""'
[73]    <LANG_STRING_LITERAL1>    ::=    "'" ([^'\\\n\r] | ECHAR | UCHAR)* "'" LANGTAG
[74]    <LANG_STRING_LITERAL2>    ::=    '"' ([^\"\\\n\r] | ECHAR | UCHAR)* '"' LANGTAG
[75]    <LANG_STRING_LITERAL_LONG1>    ::=    "'''" ( ("'" | "''")? ([^\\'\\] | ECHAR | UCHAR) )* "'''" LANGTAG
[76]    <LANG_STRING_LITERAL_LONG2>    ::=    '"""' ( ('"' | '""')? ([^\"\\] | ECHAR | UCHAR) )* '"""' LANGTAG
[26t]    <UCHAR>    ::=       "\\u" HEX HEX HEX HEX
| "\\U" HEX HEX HEX HEX HEX HEX HEX HEX
[160s]    <ECHAR>    ::=    "\\" [tbnrf\\\"\\']
[164s]    <PN_CHARS_BASE>    ::=       [A-Z] | [a-z]
| [#00C0-#00D6] | [#00D8-#00F6] | [#00F8-#02FF]
| [#0370-#037D] | [#037F-#1FFF]
| [#200C-#200D] | [#2070-#218F] | [#2C00-#2FEF]
| [#3001-#D7FF] | [#F900-#FDCF] | [#FDF0-#FFFD]
| [#10000-#EFFFF]
[165s]    <PN_CHARS_U>    ::=    PN_CHARS_BASE | "_"
[167s]    <PN_CHARS>    ::=       PN_CHARS_U | "-" | [0-9]
| [#00B7] | [#0300-#036F] | [#203F-#2040]
[168s]    <PN_PREFIX>    ::=    PN_CHARS_BASE ( (PN_CHARS | ".")* PN_CHARS )?
[169s]    <PN_LOCAL>    ::=    (PN_CHARS_U | ":" | [0-9] | PLX) ( (PN_CHARS | "." | ":" | PLX)* (PN_CHARS | ":" | PLX) )?
[170s]    <PLX>    ::=    PERCENT | PN_LOCAL_ESC
[171s]    <PERCENT>    ::=    "%" HEX HEX
[172s]    <HEX>    ::=    [0-9] | [A-F] | [a-f]
[173s]    <PN_LOCAL_ESC>    ::=    "\\" ( "_" | "~" | "." | "-" | "!" | "$" | "&" | "'" | "(" | ")" | "*" | "+" | "," | ";" | "=" | "/" | "?" | "#" | "@" | "%" )
[98]    PASSED TOKENS    ::=       [ \t\r\n]+
| "#" [^\r\n]*