This document defines the procedures and rules to be applied when converting tabular data into JSON. Tabular data may be complemented with metadata annotations that describe its structure, the meaning of its content and how it may form part of a collection of interrelated tabular data. This document specifies the effect of this metadata on the resulting JSON.

The CSV on the Web Working Group was chartered to produce Recommendations for "Access methods for CSV Metadata", "Metadata vocabulary for CSV data" and "Mapping mechanism to transforming CSV into various Formats (e.g., RDF, JSON, or XML)". This document aims to satisfy the JSON variant of the mapping Recommendation.

Introduction

This document describes the processing of tabular data to create a set of nested objects that MUST be serialized as JSON [[!RFC7159]].

The conversion of CSV content to JSON is intended for web developers who need not care about the complexities of RDF [[!rdf11-concepts]]. Where the formality of RDF is required, [[!csv2rdf]] provides the procedures for mapping from CSV content to RDF which may be serialized to [[json-ld]].

The [[!tabular-data-model]] defines an annotated tabular data model consisting of tables, columns, rows and cells, enriched with annotations that describe the structure of the tabular data and the meaning of its content. A table group is a collection of tables published as a single atomic unit.

The conversion procedure described in this specification operates on the tabular data. This specification does not specify the processes needed to convert CSV-encoded data into tabular data form. Please refer to [[!tabular-data-model]] for details of parsing tabular data.

Conversion applications MUST provide at least two modes of operation: standard and minimal.

Standard mode conversion frames the information gleaned from the cells of the tabular data with details of the rows, tables and a table group within which that information is provided.

Minimal mode conversion includes only the information gleaned from the cells of the tabular data within the output.

Standard and minimal conversion are described normatively below.

Conversion applications MAY offer additional implementation specific conversion modes.

Conversion specifications, as defined in [[!tabular-metadata]] MAY be used to specify how tabular data can be transformed into another format using a script or template. Such a conversion specification MAY use the JSON output described in this specification as input.

The conversion procedure described in this specification is considered to be entirely textual. There is no requirement on conversion applications to check the semantic consistency of the data during the conversion, nor validate the output against JSON syntax rules. Downstream applications SHOULD be aware of the potential for syntax errors and take appropriate action.

Tabular data MUST conform to the description from [[!tabular-data-model]]. In particular note that each row MUST contain the same number of cells (although some of these cells may be empty). Given this constraint, not all CSV-encoded data can be considered to be tabular data. As such, the conversion procedure described in this specification cannot be applied to all CSV files.

Converting Tabular Data to JSON

The procedures for converting tabular data into JSON are described below for both standard and minimal modes.

Algorithm terms

aboutUrl
The aboutUrl is the evaluation of the aboutUrl property of the current cell as defined in URI template properties in [[!tabular-metadata]].
annotated table
The annotated table is defined in [[!tabular-data-model]] as describing a particular table and its metadata.
array
An array is defined in JSON ([[!RFC7159]]) as an ordered sequence of zero or more values, where a value is a string, number, boolean, null, object, or array.
cell
A cell is defined in [[!tabular-data-model]] as the intersection of a row and a column within a table.
cell errors
Cell errors are defined in [[!tabular-data-model]] as a (possibly empty) list of validation errors generated while parsing the literal content of a cell to generate the semantic value.
cell value
A cell value is defined in [[!tabular-data-model]] as the semantic value of the cell; this MAY be null or, in the case that the cell specifies a separator property, a sequence of values.
column
A column is defined in [[!tabular-data-model]] as a vertical arrangement of cells within a table.
common properties
The common properties, defined in Section 3.3 Common Properties of [[!tabular-metadata]]), may be specified for tables and table groups.
identifier
The identifier is the evaluation of the @id property for the current resource. As defined in [[!tabular-data-model]], the identifier is null if the @id property is undefined. The identifier MAY be applied to either a table group or a table.
name
In the context of this specification, name is used as defined in JSON ([[!RFC7159]]); that is, that name is a string that provides a unique key within a set of name-value pairs within a JSON object.
notes
A list of notes, as defined in [[!tabular-data-model]], attached to an annotated table or table group using the notes property. This may be an empty list.
object
An object is defined in JSON ([[!RFC7159]]) as unordered collection of zero or more name-value pairs, where a name is a string and a value is a string, number, boolean, null, object, or array.
propertyUrl
The propertyUrl is the evaluation of the propertyUrl property of the current cell as defined in URI template properties in [[!tabular-metadata]].
row
The row is defined in [[!tabular-data-model]] as a horizontal arrangement of cells within a table.
row number
A row number is defined in [[!tabular-data-model]] as the position of the row within the table, starting from 1.
row source number
A row source number is defined in [[!tabular-data-model]] as the position of the row within the source CSV+ file. Provision of the row source number is dependent on parsing applications and may be reported as null.
subject
Within this algorithm, a subject is the resource that the value of a given cell refers to. This may be specified using the aboutUrl property.
table group
The table group is defined in [[!tabular-data-model]] as comprising a set of annotated tables and a set of annotations that relate to those tables.
table group description
The table group description object as defined in [[!tabular-data-model]].
valueUrl
The valueUrl is the evaluation of the valueUrl property of the current cell as defined in URI template properties in [[!tabular-metadata]].

Generating JSON

A conformant JSON conversion application MUST produce output conforming to this algorithm according to the chosen mode of conversion: standard or minimal.

Where an annotated table is defined in isolation (e.g. in the absence of a table group description), a default table group description is provided with a single resources annotation that refers to that table.

Minimal mode

The steps in the algorithm defined here apply to minimal mode.

  1. Insert an empty array A into the JSON output. The objects containing the name-value pairs associated with the cell values will be subsequently inserted into this array.

  2. Each table is processed sequentially in the order they are referenced in the table group description. For each table where the value of property suppressOutput is false:

    1. Each row within the table is processed sequentially in order. For each row in the current table:

      1. Generate a sequence of objects, S1 to Sn, each of which corresponds to a subject described by the current row, as described in .

        The subject(s) described by each row are determined according to the aboutUrl property for each cell in the current row. Where aboutUrl is undefined, a default subject for the row is used.

      2. As described in , process the sequence of objects, S1 to Sn, to produce a new sequence of root objects, SR1 to SRm, that MAY include nested objects.

        A row MAY describe multiple interrelated subjects; where the valueUrl property for one cell matches the aboutUrl property for another cell in the same row.

      3. Insert each root object, SR1 to SRm, into array A.

Standard mode

The steps in the algorithm defined here apply to standard mode.

  1. Insert an empty object G into the JSON output which is associated with the table group.

  2. If the table group has an identifier IG; insert the following name-value pair into object G:

    name
    @id
    value
    IG
  3. Insert any notes and common properties specified for the table group into object G according to the rules provided in .

  4. Insert the following name-value pair into object G:

    name
    table
    value
    AT

    where AT is an array into which the objects describing the annotated tables will be subsequently inserted.

  5. Each table is processed sequentially in the order they are referenced in the table group description.

    For each table where the value of property suppressOutput is false:

    1. Insert an empty object T into the array AT to represent the table.

    2. If the table has an identifier IT; insert the following name-value pair into object T:

      name
      @id
      value
      IT
    3. Specify the source CSV+ file URL for the current table based on the value of property url; insert the following name-value pair into object T:

      name
      url
      value
      URL
    4. Insert any notes and common properties specified for the table into object T. according to the rules provided in .

      All other annotations for the table are ignored during the conversion; including information about table schemas and column descriptions specified therein, dialect descriptions, foreign-key-definitions etc.

    5. Insert the following name-value pair into object T:

      name
      row
      value
      AR

      where AR is an array into which the objects describing the rows will be subsequently inserted.

    6. Each row within the table is processed sequentially in order. For each row in the current table:

      1. Insert an empty object R into the array AR to represent the row.

      2. Specify the row number n for the row; insert the following name-value pair into object R:

        name
        rownum
        value
        n
      3. Specify the row source number nsource for the row within the source CSV+ file URL using a fragment-identifier as specified in [[RFC7111]]; if row source number is not null, insert the following name-value pair into object R:

        name
        url
        value
        URL#row=nsource
      4. Insert the following name-value pair into object R:

        name
        describes
        value
        A

        where A is an array. The objects containing the name-value pairs associated with the cell values will be subsequently inserted into this array.

      5. Generate a sequence of objects, S1 to Sn, each of which corresponds to a subject described by the current row, as described in .

        The subject(s) described by each row are determined according to the aboutUrl property for each cell in the current row. Where aboutUrl is undefined, a default subject for the row is used.

      6. As described in , process the sequence of objects, S1 to Sn, to produce a new sequence of root objects, SR1 to SRm, that MAY include nested objects.

        A row MAY describe multiple interrelated subjects; where the valueUrl property for one cell matches the aboutUrl property for another cell in the same row.

      7. Insert each root object, SR1 to SRm, into array A.

Generating Objects

The steps in the algorithm defined here apply to both standard and minimal modes.

This algorithm generates a sequence of objects, S1 to Sn, each of which corresponds to a subject described by the current row. The algorithm inserts name-value pairs into Si depending on the cell values as outlined in the following steps.

  1. Determine the unique subjects for the current row. The subject(s) described by each row are determined according to the aboutUrl property for each cell in the current row. A default subject for the row is used for any cells where aboutUrl is undefined.

  2. For each subject that the current row describes where at least one of the cells that refers to that subject has a value or valueUrl that is not null, and is associated with a column where the value of property suppressOutput has value false:

    1. Create an empty object Si to represent the subject i.

      (i is the index number with values from 1 to n, where n is the number of subjects for the row)

      Subject i is identified according to the aboutUrl property of its associated cells: IS. For a default subject where aboutUrl is not specified by its cells, IS is null.

    2. If the identifier for subject i, IS, is not null, then insert the following name-value pair into object Si:

      name
      @id
      value
      IS
    3. Each cell referring to subject i is then processed sequentially according to the order of the columns.

      For each cell referring to subject i, where the value of property suppressOutput for the column associated with that cell is false, insert a name-value pair into object Si as described below:

      1. If the value of propertyUrl for the cell is not null, then name N takes the value of propertyUrl.

        Else, name N takes the value of the name property for the column associated with the cell.

      2. If the valueUrl for the current cell is not null, then insert the following name-value pair into object Si:

        name
        N
        value
        Vurl

        where Vurl is the value of valueUrl property for the current cell, is expressed as a string in the JSON output.

      3. Else, if the cell specifies a separator property and the cell value is not an empty sequence, then the cell value provides a sequence of values for inclusion within the JSON output; insert an array Av containing each value V of the sequence into object Si:

        name
        N
        value
        Av

        Each of the values V derived from the sequence MUST be expressed in the JSON output according to the datatype property of the cell as defined below: .

        Since arrays are implicitly ordered in JSON, the ordered property, if specified, has no effect on the JSON output.

      4. Else, if the cell value is not null, then the cell value provides a single value V for inclusion within the JSON output; insert the following name-value pair into object Si:

        name
        N
        value
        V

        Value V derived from the cell values MUST be expressed in the JSON output according to the datatype property of the cell as defined below: .

    4. If name N occurs more than once within object Si, the name-value pairs from each occurrence of name N MUST be compacted to form a single name-value pair with name N and whose value is an array containing all values from each of those name-value pairs.

Generating Nested Objects

The steps in the algorithm defined herein apply to both standard and minimal modes.

Where the current row describes multiple subjects, it MAY be possible to organise the objects associated with those subjects such that some objects are nested within others; e.g. where the valueUrl property for one cell matches the aboutUrl property for another cell in the same row.

This algorithm considers a sequence of objects generated according to , S1 to Sn, each of which corresponds to a subject described by the current row. It generates a new sequence of root objects, SR1 to SRm, that MAY include nested objects.

Where the current row describes only a single subject, this algorithm may be bypassed as no nesting is possible. In such a case, the root object SR1 is identical to the original object S1.

This nesting algorithm is based on the interrelationships between subjects described within a given row that are specified using the valueUrl property. Cell values expressing the identity of a subject in the current row (e.g. as a simple literal) will be ignored by this algorithm.

The algorithm uses the following terms:

child
If two vertices are connected in a tree, the one which is further away from the root of the tree is referred to as the child of the other.
descendant
A vertex N is a descendant of a vertex M if either N is the child M, or there are vertices V1,…,Vk, such that V1=M, Vk=N, and Vk+1 is a child of Vk.
edge
One of the main constituents of graphs; in such data structures edges are used to establish relationships among vertices. In the context of this algorithm, edges are expressed in JSON using a name-value pair whose value is another object or an array of objects.
forest
A collection of disjoint trees. For the purpose of this algorithm, the order of the trees are important, i.e., forests can also be viewed as a sequence of roots.
graph
Data structure consisting of vertices (or "nodes") and edges. See, for example, [[Knuth]] for further details.
node
Synonym of vertex.
root
A dedicated vertex in a tree; a root is not the child of any vertex.
tree
A tree (or rooted tree) is a connected, acyclic graph where one vertex has been designated as the root, in which case the edges have a natural orientation towards or away from the root.
vertex
One of the main constituents of graphs; in such data structures a vertex usually holds further information or data. In the context of this algorithm, vertices are used to represent the JSON objects.

The nesting algorithm is defined as follows:

  1. For all cells in the current row, determine the valueUrls, Vurl, that occur only once. The list of these uniquely occurring valueUrls is referred to as the URL-list.

  2. Create an empty forest F. Vertices in the trees of this forest represent the subjects described by the current row.

  3. For each object Si in the sequence S1 to Sn:

    1. Determine the identity of object Si: IS. If present in object Si, the name-value pair with name @id provides the value of IS. Else, object Si is not explicitly identified and IS is null.

    2. Check whether there is a vertex N in forest F that represents object Si. If none of the existing vertices in forest F represent object Si, then insert a new tree into forest F whose root is a vertex N that represents object Si and has identity IS.

    3. For all cells associated with the current object Si (e.g. whose aboutUrl property matches IS):

      1. If the valueUrl property of the current cell is defined and its value, Vurl, appears in the URL-list, then check each of the other objects in the sequence S1 to Sn to determine if Vurl identifies one of those objects.

        For object Sj, if the name-value pair with name @id is present and its value matches Vurl, then:

        1. If the root of the tree containing vertex N is a vertex that represents object Sj, then object Si is already a descendant of object Sj; no further action can be taken for this instance of Vurl.

          This clause in the algorithm prevents circular loops being created.

          Furthermore, because the URL-list contains valueUrls that occur only once for the current row, object Si cannot be a descendant of an intermediate vertices in the tree.

        2. Else, if there is a root vertex M in forest F that represents object Sj, then set vertex M as a child of vertex N and remove vertex M from the list of roots in forest F (e.g. the tree rooted by M becomes a sub-tree of N).

        3. Else, create a new vertex M that represents object Sj as a child of vertex N.

  4. Each vertex in forest F represents an object in the original sequence of objects S1 to Sn and is associated with a subject described by the current row. Rearrange objects S1 to Sn such that they mirror the structure of the trees in forest F.

    If vertex M, representing object Si, is a child of vertex N, representing object Sj, then the name-value pair in object Sj associated with the edge relating M and N MUST be modifed such that the (literal) value, Vurl, from that name-value pair is replaced by object Si thus creating a nested object.

    Objects represented by root vertices are referred to as root objects.

  5. Return the sequence of root objects, SR1 to SRm.

An implementation may be able to optimize the algorithm by skipping branches (e.g. if URL-list is empty) or by other means.

Interpreting datatypes

Cell values are expressed in the JSON output according to the cell's datatype property. The relationship between the value of the datatype property and the primitive types supported by JSON (as specified in [[!RFC7159]]) is provided in the table below.

Instances of JSON reserved characters within string values MUST be escaped as defined in [[!RFC7159]].

JSON has no native support for expressing language information; therefore the lang property has no effect on the JSON outut.

A cell's format property is irrelevant to the conversion procedure defined in this specification; the cell value has already been parsed from the contents the cell according to the format property.

Where the contents of the cell cannot be parsed, or other validation errors occur, cell errors will be provided. It is an implementation decision to determine how conversion applications should proceed in the event that cell errors are encountered.

datatypeJSON primitive typeRemarks
numbernumber
binarystringbinary is considered to be equivalent to xsd:base64Binary
datetimestring
anystring
xmlstring
htmlstring
jsonstring
anyAtomicTypestring
anyURIstring
base64Binarystring
booleanboolean
datestring
dateTimestring
dateTimeStampstring
decimalnumber
integernumber
longnumber
intnumber
shortnumber
bytenumber
nonNegativeIntegernumber
positiveIntegernumber
unsignedLongnumber
unsignedIntnumber
unsignedShortnumber
unsignedBytenumber
nonPositiveIntegernumber
negativeIntegernumber
doublenumber
durationstring
dayTimeDurationstring
yearMonthDurationstring
floatnumber
gDaystring
gMonthstring
gMonthDaystring
gYearstring
gYearMonthstring
hexBinarystring
QNamestring
stringstring
normalizedStringstring
tokenstring
languagestring
Namestring
NCNamestring
timestring

JSON-LD to JSON

This section defines a mechanism for transforming the [[json-ld]] Dialect used for common properties and notes into plain-old JSON.

Name-value pairs from notes and common properties annotations are generally copied verbatim from the metadata description subject to the exceptions below:

  1. Name-value pairs whose value is an object using the [[json-ld]] keyword @value, for example:

    name
    N
    value
    { "@value": "V" }

    are transformed to:

    name
    N
    value
    V

    Name-value pairs occurring within the value object that use [[json-ld]] keywords @language and @type are ignored.

  2. Name-value pairs whose value is an object using the [[json-ld]] keyword @id to coerce a string-value to be interpreted as an IRI, for example:

    name
    N
    value
    { "@id": "Vurl" }

    are transformed to:

    name
    N
    value
    Vurl

Terms defined within the RDFa 1.1 Initial Context ([[rdfa-core]]) are not expanded during the transformation.

Examples

Each of the examples expresses more complex conversions - it is recommended that readers of this specification work through the examples in sequential order.

Simple example

This example comprises a single annotated table containing information attributes about countries; country code, position (latitude, longitude) and name. Whilst the input CSV+ file, published at http://example.org/countries.csv, includes a header line, no further metadata annotations are given. The CSV+ file is provided below:

        

The annotated table generated from parsing the CSV+ file is shown below and provides the basis for the conversion to JSON.

Annotations for the resulting table T, with 4 columns and 3 rows, are shown below:

idcore annotationsannotations
urlcolumnsrows
Thttp://example.org/countries.csvC1, C2, C3, C4R1, R2, R3

Annotations for the columns, rows and cells in table T are shown in the tables below.

Column annotations:

idcore annotationsannotations
tablenumbersource numbercellsnametitle
C1T11C1.1, C2.1, C3.1countryCodecountryCode
C2T22C1.2, C2.2, C3.2latitudelatitude
C3T33C1.3, C2.3, C3.3longitudelongitude
C4T44C1.4, C2.4, C3.4namename

Row annotations:

idcore annotations
tablenumbersource numbercells
R1T12C1.1, C1.2, C1.3, C1.4
R2T23C2.1, C2.2, C2.3, C2.4
R3T34C3.1, C3.2, C3.3, C3.4

Cell annotations:

idcore annotationsannotations
tablecolumnrowstring valuevalueerrorspropertyUrl
C1.1TC1R1"AD""AD"<http://example.org/countries.csv#countryCode>
C1.2TC2R1"42.546245""42.546245"<http://example.org/countries.csv#latitude>
C1.3TC3R1"1.601554""1.601554"<http://example.org/countries.csv#longitude>
C1.4TC4R1"Andorra""Andorra"<http://example.org/countries.csv#name>
C2.1TC1R2"AE""AE"<http://example.org/countries.csv#countryCode>
C2.2TC2R2"23.424076""23.424076"<http://example.org/countries.csv#latitude>
C2.3TC3R2"53.847818""53.847818"<http://example.org/countries.csv#longitude>
C2.4TC4R2"United Arab Emirates""United Arab Emirates"<http://example.org/countries.csv#name>
C3.1TC1R3"AF""AF"<http://example.org/countries.csv#countryCode>
C3.2TC2R3"33.93911""33.93911"<http://example.org/countries.csv#latitude>
C3.3TC3R3"67.709953""67.709953"<http://example.org/countries.csv#longitude>
C3.4TC4R3"Afghanistan""Afghanistan"<http://example.org/countries.csv#name>

As the value of propertyUrl has not been set within the metadata description it defaults to the URI Template (see [[RFC6570]]) #{[column-name]}, where [column-name] is the value of the name property for the column associated with the cell. For example, the value of propertyUrl for all cells in column C1 ("name": "countryCode") is http://example.org/countries.csv#countryCode.

Minimal mode output for this example is provided below:

        

The aboutUrl property has not been set for cells in table T ({ "url": "http://example.org/countries.csv"}) - cells in a given row where aboutUrl has not been specified are assumed to refer to the same subject and so the name-value pairs associated with the cell values of that row occur within the same object.

Given that the propertyUrl has not been explicitly set for cells in table T ({ "url": "http://example.org/countries.csv"}), the simplified name is used in the name-value pairs; e.g. countryCode rather than http://example.org/countries.csv#countryCode

Standard mode output for this example is provided below:

        

Even though the table was defined in isolation, the table is wrapped in a table group.

The name-value pair with name url provides reference to the original CSV+ file and to specific rows therein.

The row number is provided for each row using name-value pair with name rownum.

The object containing the name-values pairs associated with the cell values of a row are related to the object for that row using the name-value pair with name describes.

Example with single table and rich annotations

This example is based on Use Case #11 - City of Palo Alto Tree Data and comprises a single annotated table describing an inventory of tree maintenance operations. The input CSV+ file, published at http://example.org/tree-ops-ext.csv, and the associated metadata description http://example.org/tree-ops-ext.csv-metadata.json are provided below:

        
        

The notes annotation in the metadata description uses the Open Annotation data model currently under development within the Web Annotations Working Group. This is purely illustrative; no constraints are placed on the value of the notes annotation.

The annotated table generated from parsing the CSV+ file and associated metadata is shown below and provides the basis for the conversion to JSON.

Annotations for the resulting table T, with 9 columns and 3 rows, are shown below:

idcore annotationsannotations
urlcolumnsrows
Thttp://example.org/tree-ops-ext.csvC1, C2, C3, C4, C5, C6, C7, C8, C9R1, R2, R3@id<http://example.org/tree-ops-ext>
dc:title"Tree Operations"
dc:keywords["tree", "street", "maintenance"]
dc:publisher[{ "schema:name": "Example Municipality", "schema:url": { "@id": "http://example.org" } }]
dc:license<http://opendefinition.org/licenses/cc-by/>
dc:modified"2010-12-31"
notes[{ "@type": "oa:Annotation", ... }]
primaryKeyC1

The value of the notes annotation has been shortened for clarity in the table above.

Annotations for the columns, rows and cells in table T are shown in the tables below.

Column annotations:

idcore annotationsannotations
tablenumbersource numbercellsnametitlerequiredsuppressOutputdc:description
C1T11C1.1, C2.1, C3.1GIDGID, Generic IdentifiertruetrueAn identifier for the operation on a tree.
C2T22C1.2, C2.2, C3.2on_streetOn StreetThe street that the tree is on.
C3T33C1.3, C2.3, C3.3speciesSpeciesThe species of the tree.
C4T44C1.4, C2.4, C3.4trim_cycleTrim CycleThe operation performed on the tree.
C5T55C1.5, C2.5, C3.5dbhDiameter at Breast HtDiameter at Breast Height (DBH) of the tree (in feet), measured 4.5ft above ground.
C6T66C1.6, C2.6, C3.6inventory_dateInventory DateThe date of the operation that was performed.
C7T77C1.7, C2.7, C3.7commentsCommentsSupplementary comments relating to the operation or tree.
C8T88C1.8, C2.8, C3.8protectedProtectedIndication (YES / NO) whether the tree is subject to a protection order.
C9T99C1.9, C2.9, C3.9kmlKMLKML-encoded description of tree location.

In this example, output for column C1 (GID) is not required; note the suppressOutput annotation on this column.

Row annotations:

idcore annotations
tablenumbersource numbercells
R1T12C1.1, C1.2, C1.3, C1.4, C1.5, C1.6, C1.7, C1.8, C1.9
R2T23C2.1, C2.2, C2.3, C2.4, C2.5, C2.6, C2.7, C2.8, C2.9
R3T34C3.1, C3.2, C3.3, C3.4, C3.5, C3.6, C3.7, C3.8, C3.9

Cell annotations:

idcore annotationsannotations
tablecolumnrowstring valuevalueerrorslangdatatypeformatdefaultaboutUrl
C1.1TC1R1"1""1"stringhttp://example.org/tree-ops-ext#gid-1
C1.2TC2R1"ADDISON AV""ADDISON AV"string<http://example.org/tree-ops-ext#gid-1>
C1.3TC3R1"Celtis australis""Celtis australis"string<http://example.org/tree-ops-ext#gid-1>
C1.4TC4R1"Large Tree Routine Prune""Large Tree Routine Prune"enstring<http://example.org/tree-ops-ext#gid-1>
C1.5TC5R1"11"11integer<http://example.org/tree-ops-ext#gid-1>
C1.6TC6R1"10/18/2010"2010-10-18dateM/d/yyyy<http://example.org/tree-ops-ext#gid-1>
C1.7TC7R1""nullstring<http://example.org/tree-ops-ext#gid-1>
C1.8TC8R1""falsebooleanYES|NO"NO"<http://example.org/tree-ops-ext#gid-1>
C1.9TC9R1"<Point><coordinates>-122.156485,37.440963</coordinates></Point>""<Point><coordinates>-122.156485,37.440963</coordinates></Point>"xml<http://example.org/tree-ops-ext#gid-1>
C2.1TC1R2"2""2"string<http://example.org/tree-ops-ext#gid-2>
C2.2TC2R2"EMERSON ST""EMERSON ST"string<http://example.org/tree-ops-ext#gid-2>
C2.3TC3R2"Liquidambar styraciflua""Liquidambar styraciflua"string<http://example.org/tree-ops-ext#gid-2>
C2.4TC4R2"Large Tree Routine Prune""Large Tree Routine Prune"enstring<http://example.org/tree-ops-ext#gid-2>
C2.5TC5R2"11"11integer<http://example.org/tree-ops-ext#gid-2>
C2.6TC6R2"6/2/2010"2010-06-02dateM/d/yyyy<http://example.org/tree-ops-ext#gid-2>
C2.7TC7R2""nullstring<http://example.org/tree-ops-ext#gid-2>
C2.8TC8R2""falsebooleanYES|NO"NO"<http://example.org/tree-ops-ext#gid-2>
C2.9TC9R2"<Point><coordinates>-122.156749,37.440958</coordinates></Point>""<Point><coordinates>-122.156749,37.440958</coordinates></Point>"xml<http://example.org/tree-ops-ext#gid-2>
C3.1TC1R3"6""6"string<http://example.org/tree-ops-ext#gid-6>
C3.2TC2R3"ADDISON AV""ADDISON AV"string<http://example.org/tree-ops-ext#gid-6>
C3.3TC3R3"Robinia pseudoacacia""Robinia pseudoacacia"string<http://example.org/tree-ops-ext#gid-6>
C3.4TC4R3"Large Tree Routine Prune""Large Tree Routine Prune"enstring<http://example.org/tree-ops-ext#gid-6>
C3.5TC5R3"29"29integer<http://example.org/tree-ops-ext#gid-6>
C3.6TC6R3"6/1/2010"2010-06-01dateM/d/yyyy<http://example.org/tree-ops-ext#gid-6>
C3.7TC7R3"cavity or decay; trunk decay; codominant leaders; included bark; large leader or limb decay; previous failure root damage; root decay; beware of BEES""cavity or decay", "trunk decay", "codominant leaders", "included bark", "large leader or limb decay", "previous failure root damage", "root decay", "beware of BEES"string<http://example.org/tree-ops-ext#gid-6>
C3.8TC8R3"YES"truebooleanYES|NO"NO"<http://example.org/tree-ops-ext#gid-6>
C3.9TC9R3"<Point><coordinates>-122.156299,37.441151</coordinates></Point>""<Point><coordinates>-122.156299,37.441151</coordinates></Point>"xml<http://example.org/tree-ops-ext#gid-6>

For brevity, the propertyUrl is not shown in the table of cell annotations. Where not explicitly set, the value of propertyUrl defaults to the URI Template (see [[RFC6570]]) #{[column-name]}, where [column-name] is the value of the name property for the column associated with the cell. For example, the value of propertyUrl for all cells in column C2 ("name": "on_street") is http://example.org/tree-ops-ext.csv#on_street.

Minimal mode output for this example is provided below:

        

The subject described by each row is explcitly defined using the aboutUrl property; e.g. the subject of row R1 is http://example.org/tree-ops-ext#gid-1.

Output for column C1 ({ "name": "GID" }) is not included as column property suppressOutput has value true.

Cells C1.7 and C2.7 (rows R1 and R2; column, { "name": "comments" }) have null values - no output is included for these cells.

Cell C3.7 (row R3; column, { "name": "comments" }) contains a sequence of values; the set of values are included in an array.

Standard mode output for this example is provided below:

        

Table T ({ "url": "http://example.org/tree-ops-ext.csv"}) has been explicitly identified: { "@id": "<http://exmple.org/tree-ops-ext>"}.

Common properties and notes specified for table T ({ "url": "http://example.org/tree-ops-ext.csv"}) are included in the output.

Example with single table and using virtual columns to produce multiple subjects per row

This example uses a single annotated table describing a listing of music events. Each row from the CSV+ file corresponds to three resources; the music event itself, the location where that event occurs and the offer to sell tickets for that event. The goal is to convert the CSV content into schema.org markup that a search engine such as Googlecan use to index music events. Details of how Google expects this information to be structured can be found here.

The input CSV+ file, published at http://example.org/events-listing.csv, and the associated metadata description http://example.org/events-listing.csv-metadata.json are provided below:

        
        

The CSV to JSON translation is limited to providing one statement, or triple, per column in the table. The target schema.org markup requires 10 statements to describe each event. As the base CSV+ file contains 5 columns, an additional 5 virtual columns have been added in order to provide for the full complement of statements - including the relationships between the 3 resources (event, location and offer) described by each row of the table. Note that the virtual property is set to true for these virtual columns.

Furthermore, note that no attempt is made to reconcile between locations or offers that may be associated with more than one event; every row in the table will create both a new location resource and offer resource in addition to the event resource. If considered necessary, applications such as OpenRefine may be used to identify and reconcile duplicate location resources once the JSON output has been generated.

The annotated table generated from parsing the CSV+ file and associated metadata is shown below and provides the basis for the conversion to JSON.

Annotations for the resulting table T, with 10 columns and 2 rows, are shown below:

idcore annotationsannotations
urlcolumnsrows
Thttp://example.org/events-listing.csvC1, C2, C3, C4, C5, C6, C7, C8, C9, C10R1, R2

Annotations for the columns, rows and cells in table T are shown in the tables below.

Column annotations:

idcore annotationsannotations
tablenumbersource numbercellsnametitlevirtual
C1T11C1.1, C2.1nameName
C2T22C1.2, C2.2start-dateStart Date
C3T33C1.3, C2.3location-nameLocation Name
C4T44C1.4, C2.4location-addressLocation Address
C5T55C1.5, C2.5ticket-urlTicket Url
C6T66C1.6, C2.6type-eventtrue
C7T77C1.7, C2.7type-placetrue
C8T88C1.8, C2.8type-offertrue
C9T99C1.9, C2.9locationtrue
C10T1010C1.10, C2.10offerstrue

Row annotations:

idcore annotations
tablenumbersource numbercells
R1T12C1.1, C1.2, C1.3, C1.4, C1.5, C1.6, C1.7, C1.8, C1.9, C1.10
R2T23C2.1, C2.2, C2.3, C2.4, C2.5, C2.6, C2.7, C2.8, C2.9, C2.10

Cell annotations:

idcore annotationsannotations
tablecolumnrowstring valuevalueerrorsdatatypeformataboutUrlpropertyUrlvalueUrl
C1.1TC1R1"B.B. King""B.B. King"string<http://example.org/events-listing.csv#event-1>schema:name
C1.2TC2R1"2014-04-12T19:30"2014-04-12T19:30:00datetimeyyyy-MM-ddTHH:mm:ss<http://example.org/events-listing.csv#event-1>schema:startDate
C1.3TC3R1"Lupo’s Heartbreak Hotel""Lupo’s Heartbreak Hotel"string<http://example.org/events-listing.csv#place-1>schema:name
C1.4TC4R1"79 Washington St., Providence, RI""79 Washington St., Providence, RI"string<http://example.org/events-listing.csv#place-1>schema:address
C1.5TC5R1"https://www.etix.com/ticket/1771656"<https://www.etix.com/ticket/1771656>anyURI<http://example.org/events-listing.csv#offer-1>schema:url
C1.6TC6R1""nullstring<http://example.org/events-listing.csv#event-1>rdf:typeschema:MusicEvent
C1.7TC7R1""nullstring<http://example.org/events-listing.csv#place-1>rdf:typeschema:Place
C1.8TC8R1""nullstring<http://example.org/events-listing.csv#offer-1>rdf:typeschema:Offer
C1.9TC9R1""nullstring<http://example.org/events-listing.csv#event-1>schema:location<http://example.org/events-listing.csv#place-1>
C1.10TC10R1""nullstring<http://example.org/events-listing.csv#event-1>schema:offers<http://example.org/events-listing.csv#offer-1>
C2.1TC1R2"B.B. King""B.B. King"string<http://example.org/events-listing.csv#event-2>schema:name
C2.2TC2R2"2014-04-13T20:00"2014-04-13T20:00:00datetimeyyyy-MM-ddTHH:mm:ss<http://example.org/events-listing.csv#event-2>schema:startDate
C2.3TC3R2"Lynn Auditorium""Lynn Auditorium"string<http://example.org/events-listing.csv#place-2>schema:name
C2.4TC4R2"Lynn, MA, 01901""Lynn, MA, 01901"string<http://example.org/events-listing.csv#place-2>schema:address
C2.5TC5R2"http://frontgatetickets.com/venue.php?id=11766"<http://frontgatetickets.com/venue.php?id=11766>anyURI<http://example.org/events-listing.csv#offer-2>schema:url
C2.6TC6R2""nullstring<http://example.org/events-listing.csv#event-2>rdf:typeschema:MusicEvent
C2.7TC7R2""nullstring<http://example.org/events-listing.csv#place-2>rdf:typeschema:Place
C2.8TC8R2""nullstring<http://example.org/events-listing.csv#offer-2>rdf:typeschema:Offer
C2.9TC9R2""nullstring<http://example.org/events-listing.csv#event-2>schema:location<http://example.org/events-listing.csv#place-2>
C2.10TC10R2""nullstring<http://example.org/events-listing.csv#event-2>schema:offers<http://example.org/events-listing.csv#offer-2>

Minimal mode output for this example is provided below:

        

Three resources are defined for each row within the table; event, location and offer - therefore three objects are created for each row.

Each column explicitly defines both aboutUrl and propertyUrl properties which are inherited by the column's cells.

Columns C6, C7 and C8 ({ "name": "type-event"}, { "name": "type-place"} and { "name": "type-offer"}) define the semantic types of the resources described by each row: schema:MusicEvent, schema:Place and schema:Offer respectively—noting that the use of rdf:type is converted to the name @type (as used in [[json-ld]]) by this conversion application.

Column C9 ({ "name": "location"}) uses the aboutUrl and valueUrl to assert the relationship between the event and location resources.

Column C10 ({ "name": "offer"}) uses the aboutUrl and valueUrl to assert the relationship between the event and offer resources.

Standard mode output for this example is provided below:

        

The resources described by each row are explcitly defined using the aboutUrl property - in this case three resources per row (event, location and offer); the objects containing the name-values pairs associated with the cell values of a row are related to the object for each subject in that row using the name-value pair with name describes.

Example with table group comprising three interrelated tables

This example is based on Use Case #4 - Publication of public sector roles and salaries and uses three annotated tables published within a table group. Information about senior roles and junior roles within a government department are published in CSV format by each department. These are validated against a centrally published schema to ensure that all the data published by departments is consistent. Additionally, a list of professions is also published centrally, providing a controlled vocabulary against which departmental submissions are validated.

The input CSV+ files and associated metadata descriptions are provided below:

        
        
        
        
        
        

In this example, the resource gov.uk/professions.csv is identified using a relative URL to host http://example.org. In reality this resource would be published centrally by government and served from some remote host. Similarly, the metadata description resource metadata.json would be also be centrally published. Government departments seeking to validate their role and salary data would download a copy of this metadata description and place it, without modification, in the same directory as their CSV+ files whose names MUST match those specified in the metadata description; senior-roles.csv and junior-roles.csv.

The table group generated from parsing the CSV+ files and associated metadata is shown below and provides the basis for the conversion to JSON.

Annotations for the table group G and the three tables Ta, Tb, and Tc are shown below.

Table Group annotations:

idcore annotationsannotations
resources
GTa, Tb, Tc@typeTableGroup

Table annotations:

idcore annotationsannotations
urlcolumnsrowsprimaryKeysuppressOutputforeignKeys
columnsreference
Tahttp://example.org/gov.uk/professions.csvCa1Ra1, Ra2, Ra3, Ra4Ca1true
Tbhttp://example.org/senior-roles.csvCb1, Cb2, Cb3, Cb4, Cb5, Cb6Rb1, Rb2Cb1Cb5Cb1
Cb6Ca1
Tchttp://example.org/junior-roles.csvCc1, Cc2, Cc3, Cc4, Cc5, Cc6, Cc7Rc1, Rc2Cc1Cb1
Cc7Ca1

In this example, output for the centrally published list of professions, table Ta (http://example.org/gov.uk/professions.csv), is not required; only information from the departmental submissions is to be translated to JSON. Note the suppressOutput annotation on this table.

Annotations for the columns, rows and cells in table T are shown in the tables below.

Column annotations:

idcore annotationsannotations
tablenumbersource numbercellsnametitlerequired
Ca1Ta11Ca1.1, Ca2.1, Ca3.1, Ca4.1nameProfessiontrue
Cb1Tb11Cb1.1, Cb2.1refPost Unique Referencetrue
Cb2Tb22Cb1.2, Cb2.2nameName
Cb3Tb33Cb1.3, Cb2.3gradeGrade
Cb4Tb44Cb1.4, Cb2.4jobJob Title
Cb5Tb55Cb1.5, Cb2.5reportsToReports to Senior Post
Cb6Tb66Cb1.6, Cb2.6professionProfession
Cc1Tc11Cc1.1, Cc2.1reportsToSeniorReporting Senior Post
Cc2Tc22Cc1.2, Cc2.2gradeGrade
Cc3Tc33Cc1.3, Cc2.3min_payPayscale Minimum (£)
Cc4Tc44Cc1.4, Cc2.4max_payPayscale Maximum (£)
Cc5Tc55Cc1.5, Cc2.5jobGeneric Job Title
Cc6Tc66Cc1.6, Cc2.6numberNumber of Posts (FTE)
Cc7Tc77Cc1.7, Cc2.7professionProfession

Row annotations:

idcore annotations
tablenumbersource numbercells
Ra1Ta12Ca1.1
Ra2Ta23Ca2.1
Ra3Ta34Ca3.1
Ra4Ta45Ca4.1
Rb1Tb12Cb1.1, Cb1.2, Cb1.3, Cb1.4, Cb1.5, Cb1.6
Rb2Tb23Cb2.1, Cb2.2, Cb2.3, Cb2.4, Cb2.5, Cb2.6
Rc1Tc12Cc1.1, Cc1.2, Cc1.3, Cc1.4, Cc1.5, Cc1.6, Cc1.7
Rc2Tc23Cc2.1, Cc2.2, Cc2.3, Cc2.4, Cc2.5, Cc2.6, Cc2.7

Cell annotations:

idcore annotationsannotations
tablecolumnrowstring valuevalueerrorsdatatypeaboutUrlpropertyUrlvalueUrl
Ca1.1TaCa1Ra1"Finance""Finance"string
Ca2.1TaCa1Ra2"Information Technology""Information Techology"string
Ca3.1TaCa1Ra3"Operational Delivery""Operational Delivery"string
Ca4.1TaCa1Ra4"Policy""Policy"string
Cb1.1TbCb1Rb1"90115""90115"string<http://example.org/senior-roles.csv#post-90115>dc:identifier
Cb1.2TbCb2Rb1"Steve Egan""Steve Egan"string<http://example.org/senior-roles.csv#post-90115>foaf:name
Cb1.3TbCb3Rb1"SCS1A""SCS1A"string<http://example.org/senior-roles.csv#post-90115><http://example.org/def/grade>
Cb1.4TbCb4Rb1"Deputy Chief Executive""Deputy Chief Executive"string<http://example.org/senior-roles.csv#post-90115><http://example.org/def/job>
Cb1.5TbCb5Rb1"90334""90334"string<http://example.org/senior-roles.csv#post-90115><http://example.org/def/reportsTo><http://example.org/senior-roles.csv#post-90334>
Cb1.6TbCb6Rb1"Finance""Finance"string<http://example.org/senior-roles.csv#post-90115><http://example.org/def/profession>
Cb2.1TbCb1Rb2"90334""90334"string<http://example.org/senior-roles.csv#post-90334>dc:identifier
Cb2.2TbCb2Rb2"Sir Alan Langlands""Sir Alan Langlands"string<http://example.org/senior-roles.csv#post-90334>foaf:name
Cb2.3TbCb3Rb2"SCS4""SCS4"string<http://example.org/senior-roles.csv#post-90334><http://example.org/def/grade>
Cb2.4TbCb4Rb2"Chief Executive""Chief Executive"string<http://example.org/senior-roles.csv#post-90334><http://example.org/def/job>
Cb2.5TbCb5Rb2"xx"nullstring<http://example.org/senior-roles.csv#post-90334><http://example.org/def/reportsTo>
Cb2.6TbCb6Rb2"Policy""Policy"string<http://example.org/senior-roles.csv#post-90334><http://example.org/def/profession>
Cc1.1TcCc1Rc1"90115""90115"string<http://example.org/def/reportsTo><http://example.org/senior-roles.csv#post-90115>
Cc1.2TcCc2Rc1"4""4"string<http://example.org/def/grade>
Cc1.3TcCc3Rc1"17426"17426integer<http://example.org/def/min_pay>
Cc1.4TcCc4Rc1"20002"20002integer<http://example.org/def/max_pay>
Cc1.5TcCc5Rc1"Administrator""Administrator"string<http://example.org/def/job>
Cc1.6TcCc6Rc1"8.67"8.67number<http://example.org/def/number-of-posts>
Cc1.7TcCc7Rc1"Operational Delivery""Operational Delivery"string<http://example.org/def/profession>
Cc2.1TcCc1Rc2"90115""90115"string<http://example.org/def/reportsTo><http://example.org/senior-roles.csv#post-90115>
Cc2.2TcCc2Rc2"5""5"string<http://example.org/def/grade>
Cc2.3TcCc3Rc2"19546"19546integer<http://example.org/def/min_pay>
Cc2.4TcCc4Rc2"22478"22478integer<http://example.org/def/max_pay>
Cc2.5TcCc5Rc2"Administrator""Administrator"string<http://example.org/def/job>
Cc2.6TcCc6Rc2"0.5"0.5number<http://example.org/def/number-of-posts>
Cc2.7TcCc7Rc2"Operational Delivery""Operational Delivery"string<http://example.org/def/profession>

Notice that valueUrl is not specified for cell Cb2.5 because the cell value is null and the virtual property of column Cb5 is not specified.

Minimal mode output for this example is provided below:

        

Prefixes defined within the RDFa 1.1 Initial Context ([[rdfa-core]]) are not expanded; e.g. dc: for <http://purl.org/dc/terms/>.

Output for table Ta ({ "url": "http://example.org/gov.uk/professions.csv" }) is not included as property suppressOutput has value true.

The propertyUrl is specified for all cells in tables Tb and Tc.

Columns Cb5 and Cc1 ({ "name": "reportsTo" } and { "name": "reportsToSenior" }) use the aboutUrl, propertyUrl and valueUrl properties to assert the relationship between the given post and the senior post it reports to for the cells therein. However, since senior posts and junior posts are described in different tables so it is not possible to create nested objects for this particular case.

Standard mode output for this example is provided below: