This document defines the procedures and rules to be applied when converting tabular data into JSON. Tabular data may be complemented with metadata annotations that describe its structure, the meaning of its content and how it may form part of a collection of interrelated tabular data. This document specifies the effect of this metadata on the resulting JSON.
The CSV on the Web Working Group was chartered to produce Recommendations for "Access methods for CSV Metadata", "Metadata vocabulary for CSV data" and "Mapping mechanism to transforming CSV into various Formats (e.g., RDF, JSON, or XML)". This document aims to satisfy the JSON variant of the mapping Recommendation.
This document describes the processing of tabular data to create a set of nested objects that MUST be serialized as JSON [[!RFC7159]].
The conversion of CSV content to JSON is intended for web developers who need not care about the complexities of RDF [[!rdf11-concepts]]. Where the formality of RDF is required, [[!csv2rdf]] provides the procedures for mapping from CSV content to RDF which may be serialized to [[json-ld]].
The [[!tabular-data-model]] defines an annotated tabular data model consisting of tables, columns, rows and cells, enriched with annotations that describe the structure of the tabular data and the meaning of its content. A table group is a collection of tables published as a single atomic unit.
The conversion procedure described in this specification operates on the tabular data. This specification does not specify the processes needed to convert CSV-encoded data into tabular data form. Please refer to [[!tabular-data-model]] for details of parsing tabular data.
Conversion applications MUST provide at least two modes of operation: standard and minimal.
Standard mode conversion frames the information gleaned from the cells of the tabular data with details of the rows, tables and a table group within which that information is provided.
Minimal mode conversion includes only the information gleaned from the cells of the tabular data within the output.
Standard and minimal conversion are described normatively below.
Conversion applications MAY offer additional implementation specific conversion modes.
Conversion specifications, as defined in [[!tabular-metadata]] MAY be used to specify how tabular data can be transformed into another format using a script or template. Such a conversion specification MAY use the JSON output described in this specification as input.
The conversion procedure described in this specification is considered to be entirely textual. There is no requirement on conversion applications to check the semantic consistency of the data during the conversion, nor validate the output against JSON syntax rules. Downstream applications SHOULD be aware of the potential for syntax errors and take appropriate action.
Tabular data MUST conform to the description from [[!tabular-data-model]]. In particular note that each row MUST contain the same number of cells (although some of these cells may be empty). Given this constraint, not all CSV-encoded data can be considered to be tabular data. As such, the conversion procedure described in this specification cannot be applied to all CSV files.
The procedures for converting tabular data into JSON are described below for both standard and minimal modes.
aboutUrl
property of the current cell as defined in URI template properties in [[!tabular-metadata]].null
or, in the case that the cell specifies a separator
property, a sequence of values.@id
property for the current resource. As defined in [[!tabular-data-model]], the identifier is null
if the @id
property is undefined. The identifier MAY be applied to either a table group or a table.notes
property. This may be an empty list.propertyUrl
property of the current cell as defined in URI template properties in [[!tabular-metadata]].null
.valueUrl
property of the current cell as defined in URI template properties in [[!tabular-metadata]].A conformant JSON conversion application MUST produce output conforming to this algorithm according to the chosen mode of conversion: standard or minimal.
Where an annotated table is defined in isolation (e.g. in the absence of a table group description), a default table group description is provided with a single resources
annotation that refers to that table.
The steps in the algorithm defined here apply to minimal mode.
Insert an empty array A into the JSON output. The objects containing the name-value pairs associated with the cell values will be subsequently inserted into this array.
Each table is processed sequentially in the order they are referenced in the table group description. For each table where the value of property suppressOutput
is false:
Each row within the table is processed sequentially in order. For each row in the current table:
Generate a sequence of objects, S1 to Sn, each of which corresponds to a subject described by the current row, as described in .
The subject(s) described by each row are determined according to the aboutUrl property for each cell in the current row. Where aboutUrl is undefined, a default subject for the row is used.
As described in , process the sequence of objects, S1 to Sn, to produce a new sequence of root objects, SR1 to SRm, that MAY include nested objects.
A row MAY describe multiple interrelated subjects; where the valueUrl property for one cell matches the aboutUrl property for another cell in the same row.
The steps in the algorithm defined here apply to standard mode.
Insert an empty object G into the JSON output which is associated with the table group.
If the table group has an identifier IG; insert the following name-value pair into object G:
@id
Insert any notes and common properties specified for the table group into object G according to the rules provided in .
Insert the following name-value pair into object G:
table
where AT is an array into which the objects describing the annotated tables will be subsequently inserted.
Each table is processed sequentially in the order they are referenced in the table group description.
For each table where the value of property suppressOutput
is false:
Insert an empty object T into the array AT to represent the table.
If the table has an identifier IT; insert the following name-value pair into object T:
@id
Specify the source CSV+ file URL for the current table based on the value of property url
; insert the following name-value pair into object T:
url
Insert any notes and common properties specified for the table into object T. according to the rules provided in .
All other annotations for the table are ignored during the conversion; including information about table schemas and column descriptions specified therein, dialect descriptions, foreign-key-definitions etc.
Insert the following name-value pair into object T:
row
where AR is an array into which the objects describing the rows will be subsequently inserted.
Each row within the table is processed sequentially in order. For each row in the current table:
Insert an empty object R into the array AR to represent the row.
Specify the row number n for the row; insert the following name-value pair into object R:
rownum
Specify the row source number nsource for the row within the source CSV+ file URL using a fragment-identifier as specified in [[RFC7111]]; if row source number is not null
, insert the following name-value pair into object R:
url
#row=
nsourceInsert the following name-value pair into object R:
describes
where A is an array. The objects containing the name-value pairs associated with the cell values will be subsequently inserted into this array.
Generate a sequence of objects, S1 to Sn, each of which corresponds to a subject described by the current row, as described in .
The subject(s) described by each row are determined according to the aboutUrl property for each cell in the current row. Where aboutUrl is undefined, a default subject for the row is used.
As described in , process the sequence of objects, S1 to Sn, to produce a new sequence of root objects, SR1 to SRm, that MAY include nested objects.
A row MAY describe multiple interrelated subjects; where the valueUrl property for one cell matches the aboutUrl property for another cell in the same row.
The steps in the algorithm defined here apply to both standard and minimal modes.
This algorithm generates a sequence of objects, S1 to Sn, each of which corresponds to a subject described by the current row. The algorithm inserts name-value pairs into Si depending on the cell values as outlined in the following steps.
Determine the unique subjects for the current row. The subject(s) described by each row are determined according to the aboutUrl property for each cell in the current row. A default subject for the row is used for any cells where aboutUrl is undefined.
For each subject that the current row describes where at least one of the cells that refers to that subject has a value or valueUrl that is not null
, and is associated with a column where the value of property suppressOutput
has value false:
Create an empty object Si to represent the subject i.
(i is the index number with values from 1 to n, where n is the number of subjects for the row)
Subject i is identified according to the aboutUrl property of its associated cells: IS. For a default subject where aboutUrl is not specified by its cells, IS is null
.
If the identifier for subject i, IS, is not null
, then insert the following name-value pair into object Si:
@id
Each cell referring to subject i is then processed sequentially according to the order of the columns.
For each cell referring to subject i, where the value of property suppressOutput
for the column associated with that cell is false, insert a name-value pair into object Si as described below:
If the value of propertyUrl for the cell is not null
, then name N takes the value of propertyUrl.
Else, name N takes the value of the name
property for the column associated with the cell.
If the valueUrl for the current cell is not null
, then insert the following name-value pair into object Si:
where Vurl is the value of valueUrl property for the current cell, is expressed as a string in the JSON output.
Else, if the cell specifies a separator
property and the cell value is not an empty sequence, then the cell value provides a sequence of values for inclusion within the JSON output; insert an array Av containing each value V of the sequence into object Si:
Each of the values V derived from the sequence MUST be expressed in the JSON output according to the datatype
property of the cell as defined below: .
Since arrays are implicitly ordered in JSON, the ordered
property, if specified, has no effect on the JSON output.
Else, if the cell value is not null
, then the cell value provides a single value V for inclusion within the JSON output; insert the following name-value pair into object Si:
Value V derived from the cell values MUST be expressed in the JSON output according to the datatype
property of the cell as defined below: .
If name N occurs more than once within object Si, the name-value pairs from each occurrence of name N MUST be compacted to form a single name-value pair with name N and whose value is an array containing all values from each of those name-value pairs.
The steps in the algorithm defined herein apply to both standard and minimal modes.
Where the current row describes multiple subjects, it MAY be possible to organise the objects associated with those subjects such that some objects are nested within others; e.g. where the valueUrl property for one cell matches the aboutUrl property for another cell in the same row.
This algorithm considers a sequence of objects generated according to , S1 to Sn, each of which corresponds to a subject described by the current row. It generates a new sequence of root objects, SR1 to SRm, that MAY include nested objects.
Where the current row describes only a single subject, this algorithm may be bypassed as no nesting is possible. In such a case, the root object SR1 is identical to the original object S1.
This nesting algorithm is based on the interrelationships between subjects described within a given row that are specified using the valueUrl property. Cell values expressing the identity of a subject in the current row (e.g. as a simple literal) will be ignored by this algorithm.
The algorithm uses the following terms:
The nesting algorithm is defined as follows:
For all cells in the current row, determine the valueUrls, Vurl, that occur only once. The list of these uniquely occurring valueUrls is referred to as the URL-list.
Create an empty forest F. Vertices in the trees of this forest represent the subjects described by the current row.
For each object Si in the sequence S1 to Sn:
Determine the identity of object Si: IS. If present in object Si, the name-value pair with name @id
provides the value of IS. Else, object Si is not explicitly identified and IS is null
.
Check whether there is a vertex N in forest F that represents object Si. If none of the existing vertices in forest F represent object Si, then insert a new tree into forest F whose root is a vertex N that represents object Si and has identity IS.
For all cells associated with the current object Si (e.g. whose aboutUrl property matches IS):
If the valueUrl property of the current cell is defined and its value, Vurl, appears in the URL-list, then check each of the other objects in the sequence S1 to Sn to determine if Vurl identifies one of those objects.
For object Sj, if the name-value pair with name @id
is present and its value matches Vurl, then:
If the root of the tree containing vertex N is a vertex that represents object Sj, then object Si is already a descendant of object Sj; no further action can be taken for this instance of Vurl.
Else, if there is a root vertex M in forest F that represents object Sj, then set vertex M as a child of vertex N and remove vertex M from the list of roots in forest F (e.g. the tree rooted by M becomes a sub-tree of N).
Else, create a new vertex M that represents object Sj as a child of vertex N.
Each vertex in forest F represents an object in the original sequence of objects S1 to Sn and is associated with a subject described by the current row. Rearrange objects S1 to Sn such that they mirror the structure of the trees in forest F.
If vertex M, representing object Si, is a child of vertex N, representing object Sj, then the name-value pair in object Sj associated with the edge relating M and N MUST be modifed such that the (literal) value, Vurl, from that name-value pair is replaced by object Si thus creating a nested object.
Objects represented by root vertices are referred to as root objects.
An implementation may be able to optimize the algorithm by skipping branches (e.g. if URL-list is empty) or by other means.
Cell values are expressed in the JSON output according to the cell's datatype
property. The relationship between the value of the datatype
property and the primitive types supported by JSON (as specified in [[!RFC7159]]) is provided in the table below.
Instances of JSON reserved characters within string values MUST be escaped as defined in [[!RFC7159]].
JSON has no native support for expressing language information; therefore the lang
property has no effect on the JSON outut.
A cell's format
property is irrelevant to the conversion procedure defined in this specification; the cell value has already been parsed from the contents the cell according to the format
property.
Where the contents of the cell cannot be parsed, or other validation errors occur, cell errors will be provided. It is an implementation decision to determine how conversion applications should proceed in the event that cell errors are encountered.
datatype | JSON primitive type | Remarks |
---|---|---|
number | number | |
binary | string | binary is considered to be equivalent to xsd:base64Binary |
datetime | string | |
any | string | |
xml | string | |
html | string | |
json | string | |
anyAtomicType | string | |
anyURI | string | |
base64Binary | string | |
boolean | boolean | |
date | string | |
dateTime | string | |
dateTimeStamp | string | |
decimal | number | |
integer | number | |
long | number | |
int | number | |
short | number | |
byte | number | |
nonNegativeInteger | number | |
positiveInteger | number | |
unsignedLong | number | |
unsignedInt | number | |
unsignedShort | number | |
unsignedByte | number | |
nonPositiveInteger | number | |
negativeInteger | number | |
double | number | |
duration | string | |
dayTimeDuration | string | |
yearMonthDuration | string | |
float | number | |
gDay | string | |
gMonth | string | |
gMonthDay | string | |
gYear | string | |
gYearMonth | string | |
hexBinary | string | |
QName | string | |
string | string | |
normalizedString | string | |
token | string | |
language | string | |
Name | string | |
NCName | string | |
time | string |
This section defines a mechanism for transforming the [[json-ld]] Dialect used for common properties and notes
into plain-old JSON.
Name-value pairs from notes and common properties annotations are generally copied verbatim from the metadata description subject to the exceptions below:
Name-value pairs whose value is an object using the [[json-ld]] keyword @value
, for example:
{ "@value": "
V" }
are transformed to:
Name-value pairs occurring within the value object that use [[json-ld]] keywords @language
and @type
are ignored.
Name-value pairs whose value is an object using the [[json-ld]] keyword @id
to coerce a string-value to be interpreted as an IRI, for example:
{ "@id": "
Vurl" }
are transformed to:
Terms defined within the RDFa 1.1 Initial Context ([[rdfa-core]]) are not expanded during the transformation.
Each of the examples expresses more complex conversions - it is recommended that readers of this specification work through the examples in sequential order.
This example comprises a single annotated table containing information attributes about countries; country code, position (latitude, longitude) and name. Whilst the input CSV+ file, published at http://example.org/countries.csv
, includes a header line, no further metadata annotations are given. The CSV+ file is provided below:
The annotated table generated from parsing the CSV+ file is shown below and provides the basis for the conversion to JSON.
Annotations for the resulting table T, with 4 columns and 3 rows, are shown below:
id | core annotations | annotations | |||
---|---|---|---|---|---|
url | columns | rows | |||
T | http://example.org/countries.csv | C1, C2, C3, C4 | R1, R2, R3 |
Annotations for the columns, rows and cells in table T are shown in the tables below.
Column annotations:
id | core annotations | annotations | ||||
---|---|---|---|---|---|---|
table | number | source number | cells | name | title | |
C1 | T | 1 | 1 | C1.1, C2.1, C3.1 | countryCode | countryCode |
C2 | T | 2 | 2 | C1.2, C2.2, C3.2 | latitude | latitude |
C3 | T | 3 | 3 | C1.3, C2.3, C3.3 | longitude | longitude |
C4 | T | 4 | 4 | C1.4, C2.4, C3.4 | name | name |
Row annotations:
id | core annotations | |||
---|---|---|---|---|
table | number | source number | cells | |
R1 | T | 1 | 2 | C1.1, C1.2, C1.3, C1.4 |
R2 | T | 2 | 3 | C2.1, C2.2, C2.3, C2.4 |
R3 | T | 3 | 4 | C3.1, C3.2, C3.3, C3.4 |
Cell annotations:
id | core annotations | annotations | |||||
---|---|---|---|---|---|---|---|
table | column | row | string value | value | errors | propertyUrl | |
C1.1 | T | C1 | R1 | "AD" | "AD" | <http://example.org/countries.csv#countryCode> | |
C1.2 | T | C2 | R1 | "42.546245" | "42.546245" | <http://example.org/countries.csv#latitude> | |
C1.3 | T | C3 | R1 | "1.601554" | "1.601554" | <http://example.org/countries.csv#longitude> | |
C1.4 | T | C4 | R1 | "Andorra" | "Andorra" | <http://example.org/countries.csv#name> | |
C2.1 | T | C1 | R2 | "AE" | "AE" | <http://example.org/countries.csv#countryCode> | |
C2.2 | T | C2 | R2 | "23.424076" | "23.424076" | <http://example.org/countries.csv#latitude> | |
C2.3 | T | C3 | R2 | "53.847818" | "53.847818" | <http://example.org/countries.csv#longitude> | |
C2.4 | T | C4 | R2 | "United Arab Emirates" | "United Arab Emirates" | <http://example.org/countries.csv#name> | |
C3.1 | T | C1 | R3 | "AF" | "AF" | <http://example.org/countries.csv#countryCode> | |
C3.2 | T | C2 | R3 | "33.93911" | "33.93911" | <http://example.org/countries.csv#latitude> | |
C3.3 | T | C3 | R3 | "67.709953" | "67.709953" | <http://example.org/countries.csv#longitude> | |
C3.4 | T | C4 | R3 | "Afghanistan" | "Afghanistan" | <http://example.org/countries.csv#name> |
As the value of propertyUrl
has not been set within the metadata description it defaults to the URI Template (see [[RFC6570]]) #{[column-name]}
, where [column-name]
is the value of the name
property for the column associated with the cell. For example, the value of propertyUrl
for all cells in column C1 ("name": "countryCode"
) is http://example.org/countries.csv#countryCode
.
Minimal mode output for this example is provided below:
The aboutUrl
property has not been set for cells in table T ({ "url": "http://example.org/countries.csv"}
) - cells in a given row where aboutUrl
has not been specified are assumed to refer to the same subject and so the name-value pairs associated with the cell values of that row occur within the same object.
Given that the propertyUrl
has not been explicitly set for cells in table T ({ "url": "http://example.org/countries.csv"}
), the simplified name is used in the name-value pairs; e.g. countryCode
rather than http://example.org/countries.csv#countryCode
Standard mode output for this example is provided below:
Even though the table was defined in isolation, the table is wrapped in a table group.
The name-value pair with name url
provides reference to the original CSV+ file and to specific rows therein.
The row number is provided for each row using name-value pair with name rownum
.
The object containing the name-values pairs associated with the cell values of a row are related to the object for that row using the name-value pair with name describes
.
This example is based on Use Case #11 - City of Palo Alto Tree Data and comprises a single annotated table describing an inventory of tree maintenance operations. The input CSV+ file, published at http://example.org/tree-ops-ext.csv
, and the associated metadata description http://example.org/tree-ops-ext.csv-metadata.json
are provided below:
The notes
annotation in the metadata description uses the Open Annotation data model currently under development within the Web Annotations Working Group. This is purely illustrative; no constraints are placed on the value of the notes
annotation.
The annotated table generated from parsing the CSV+ file and associated metadata is shown below and provides the basis for the conversion to JSON.
Annotations for the resulting table T, with 9 columns and 3 rows, are shown below:
id | core annotations | annotations | |||
---|---|---|---|---|---|
url | columns | rows | |||
T | http://example.org/tree-ops-ext.csv | C1, C2, C3, C4, C5, C6, C7, C8, C9 | R1, R2, R3 | @id | <http://example.org/tree-ops-ext> |
dc:title | "Tree Operations" | ||||
dc:keywords | ["tree", "street", "maintenance"] | ||||
dc:publisher | [{ "schema:name": "Example Municipality", "schema:url": { "@id": "http://example.org" } }] | ||||
dc:license | <http://opendefinition.org/licenses/cc-by/> | ||||
dc:modified | "2010-12-31" | ||||
notes | [{ "@type": "oa:Annotation", ... }] | ||||
primaryKey | C1 |
The value of the notes
annotation has been shortened for clarity in the table above.
Annotations for the columns, rows and cells in table T are shown in the tables below.
Column annotations:
id | core annotations | annotations | |||||||
---|---|---|---|---|---|---|---|---|---|
table | number | source number | cells | name | title | required | suppressOutput | dc:description | |
C1 | T | 1 | 1 | C1.1, C2.1, C3.1 | GID | GID , Generic Identifier | true | true | An identifier for the operation on a tree. |
C2 | T | 2 | 2 | C1.2, C2.2, C3.2 | on_street | On Street | The street that the tree is on. | ||
C3 | T | 3 | 3 | C1.3, C2.3, C3.3 | species | Species | The species of the tree. | ||
C4 | T | 4 | 4 | C1.4, C2.4, C3.4 | trim_cycle | Trim Cycle | The operation performed on the tree. | ||
C5 | T | 5 | 5 | C1.5, C2.5, C3.5 | dbh | Diameter at Breast Ht | Diameter at Breast Height (DBH) of the tree (in feet), measured 4.5ft above ground. | ||
C6 | T | 6 | 6 | C1.6, C2.6, C3.6 | inventory_date | Inventory Date | The date of the operation that was performed. | ||
C7 | T | 7 | 7 | C1.7, C2.7, C3.7 | comments | Comments | Supplementary comments relating to the operation or tree. | ||
C8 | T | 8 | 8 | C1.8, C2.8, C3.8 | protected | Protected | Indication (YES / NO) whether the tree is subject to a protection order. | ||
C9 | T | 9 | 9 | C1.9, C2.9, C3.9 | kml | KML | KML-encoded description of tree location. |
In this example, output for column C1 (GID
) is not required; note the suppressOutput
annotation on this column.
Row annotations:
id | core annotations | |||
---|---|---|---|---|
table | number | source number | cells | |
R1 | T | 1 | 2 | C1.1, C1.2, C1.3, C1.4, C1.5, C1.6, C1.7, C1.8, C1.9 |
R2 | T | 2 | 3 | C2.1, C2.2, C2.3, C2.4, C2.5, C2.6, C2.7, C2.8, C2.9 |
R3 | T | 3 | 4 | C3.1, C3.2, C3.3, C3.4, C3.5, C3.6, C3.7, C3.8, C3.9 |
Cell annotations:
id | core annotations | annotations | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
table | column | row | string value | value | errors | lang | datatype | format | default | aboutUrl | |
C1.1 | T | C1 | R1 | "1" | "1" | string | http://example.org/tree-ops-ext#gid-1 | ||||
C1.2 | T | C2 | R1 | "ADDISON AV" | "ADDISON AV" | string | <http://example.org/tree-ops-ext#gid-1> | ||||
C1.3 | T | C3 | R1 | "Celtis australis" | "Celtis australis" | string | <http://example.org/tree-ops-ext#gid-1> | ||||
C1.4 | T | C4 | R1 | "Large Tree Routine Prune" | "Large Tree Routine Prune" | en | string | <http://example.org/tree-ops-ext#gid-1> | |||
C1.5 | T | C5 | R1 | "11" | 11 | integer | <http://example.org/tree-ops-ext#gid-1> | ||||
C1.6 | T | C6 | R1 | "10/18/2010" | 2010-10-18 | date | M/d/yyyy | <http://example.org/tree-ops-ext#gid-1> | |||
C1.7 | T | C7 | R1 | "" | null | string | <http://example.org/tree-ops-ext#gid-1> | ||||
C1.8 | T | C8 | R1 | "" | false | boolean | YES|NO | "NO" | <http://example.org/tree-ops-ext#gid-1> | ||
C1.9 | T | C9 | R1 | "<Point><coordinates>-122.156485,37.440963</coordinates></Point>" | "<Point><coordinates>-122.156485,37.440963</coordinates></Point>" | xml | <http://example.org/tree-ops-ext#gid-1> | ||||
C2.1 | T | C1 | R2 | "2" | "2" | string | <http://example.org/tree-ops-ext#gid-2> | ||||
C2.2 | T | C2 | R2 | "EMERSON ST" | "EMERSON ST" | string | <http://example.org/tree-ops-ext#gid-2> | ||||
C2.3 | T | C3 | R2 | "Liquidambar styraciflua" | "Liquidambar styraciflua" | string | <http://example.org/tree-ops-ext#gid-2> | ||||
C2.4 | T | C4 | R2 | "Large Tree Routine Prune" | "Large Tree Routine Prune" | en | string | <http://example.org/tree-ops-ext#gid-2> | |||
C2.5 | T | C5 | R2 | "11" | 11 | integer | <http://example.org/tree-ops-ext#gid-2> | ||||
C2.6 | T | C6 | R2 | "6/2/2010" | 2010-06-02 | date | M/d/yyyy | <http://example.org/tree-ops-ext#gid-2> | |||
C2.7 | T | C7 | R2 | "" | null | string | <http://example.org/tree-ops-ext#gid-2> | ||||
C2.8 | T | C8 | R2 | "" | false | boolean | YES|NO | "NO" | <http://example.org/tree-ops-ext#gid-2> | ||
C2.9 | T | C9 | R2 | "<Point><coordinates>-122.156749,37.440958</coordinates></Point>" | "<Point><coordinates>-122.156749,37.440958</coordinates></Point>" | xml | <http://example.org/tree-ops-ext#gid-2> | ||||
C3.1 | T | C1 | R3 | "6" | "6" | string | <http://example.org/tree-ops-ext#gid-6> | ||||
C3.2 | T | C2 | R3 | "ADDISON AV" | "ADDISON AV" | string | <http://example.org/tree-ops-ext#gid-6> | ||||
C3.3 | T | C3 | R3 | "Robinia pseudoacacia" | "Robinia pseudoacacia" | string | <http://example.org/tree-ops-ext#gid-6> | ||||
C3.4 | T | C4 | R3 | "Large Tree Routine Prune" | "Large Tree Routine Prune" | en | string | <http://example.org/tree-ops-ext#gid-6> | |||
C3.5 | T | C5 | R3 | "29" | 29 | integer | <http://example.org/tree-ops-ext#gid-6> | ||||
C3.6 | T | C6 | R3 | "6/1/2010" | 2010-06-01 | date | M/d/yyyy | <http://example.org/tree-ops-ext#gid-6> | |||
C3.7 | T | C7 | R3 | "cavity or decay; trunk decay; codominant leaders; included bark; large leader or limb decay; previous failure root damage; root decay; beware of BEES" | "cavity or decay" , "trunk decay" , "codominant leaders" , "included bark" , "large leader or limb decay" , "previous failure root damage" , "root decay" , "beware of BEES" | string | <http://example.org/tree-ops-ext#gid-6> | ||||
C3.8 | T | C8 | R3 | "YES" | true | boolean | YES|NO | "NO" | <http://example.org/tree-ops-ext#gid-6> | ||
C3.9 | T | C9 | R3 | "<Point><coordinates>-122.156299,37.441151</coordinates></Point>" | "<Point><coordinates>-122.156299,37.441151</coordinates></Point>" | xml | <http://example.org/tree-ops-ext#gid-6> |
For brevity, the propertyUrl
is not shown in the table of cell annotations. Where not explicitly set, the value of propertyUrl
defaults to the URI Template (see [[RFC6570]]) #{[column-name]}
, where [column-name]
is the value of the name
property for the column associated with the cell. For example, the value of propertyUrl
for all cells in column C2 ("name": "on_street"
) is http://example.org/tree-ops-ext.csv#on_street
.
Minimal mode output for this example is provided below:
The subject described by each row is explcitly defined using the aboutUrl
property; e.g. the subject of row R1 is http://example.org/tree-ops-ext#gid-1
.
Output for column C1 ({ "name": "GID" }
) is not included as column property suppressOutput
has value true.
Cells C1.7 and C2.7 (rows R1 and R2; column, { "name": "comments" }
) have null
values - no output is included for these cells.
Cell C3.7 (row R3; column, { "name": "comments" }
) contains a sequence of values; the set of values are included in an array.
Standard mode output for this example is provided below:
Table T ({ "url": "http://example.org/tree-ops-ext.csv"}
) has been explicitly identified: { "@id": "<http://exmple.org/tree-ops-ext>"}
.
Common properties and notes specified for table T ({ "url": "http://example.org/tree-ops-ext.csv"}
) are included in the output.
This example uses a single annotated table describing a listing of music events. Each row from the CSV+ file corresponds to three resources; the music event itself, the location where that event occurs and the offer to sell tickets for that event. The goal is to convert the CSV content into schema.org markup that a search engine such as Googlecan use to index music events. Details of how Google expects this information to be structured can be found here.
The input CSV+ file, published at http://example.org/events-listing.csv
, and the associated metadata description http://example.org/events-listing.csv-metadata.json
are provided below:
The CSV to JSON translation is limited to providing one statement, or triple, per column in the table. The target schema.org
markup requires 10 statements to describe each event. As the base CSV+ file contains 5 columns, an additional 5 virtual columns have been added in order to provide for the full complement of statements - including the relationships between the 3 resources (event, location and offer) described by each row of the table. Note that the virtual
property is set to true for these virtual columns.
Furthermore, note that no attempt is made to reconcile between locations or offers that may be associated with more than one event; every row in the table will create both a new location resource and offer resource in addition to the event resource. If considered necessary, applications such as OpenRefine may be used to identify and reconcile duplicate location resources once the JSON output has been generated.
The annotated table generated from parsing the CSV+ file and associated metadata is shown below and provides the basis for the conversion to JSON.
Annotations for the resulting table T, with 10 columns and 2 rows, are shown below:
id | core annotations | annotations | |||
---|---|---|---|---|---|
url | columns | rows | |||
T | http://example.org/events-listing.csv | C1, C2, C3, C4, C5, C6, C7, C8, C9, C10 | R1, R2 |
Annotations for the columns, rows and cells in table T are shown in the tables below.
Column annotations:
id | core annotations | annotations | |||||
---|---|---|---|---|---|---|---|
table | number | source number | cells | name | title | virtual | |
C1 | T | 1 | 1 | C1.1, C2.1 | name | Name | |
C2 | T | 2 | 2 | C1.2, C2.2 | start-date | Start Date | |
C3 | T | 3 | 3 | C1.3, C2.3 | location-name | Location Name | |
C4 | T | 4 | 4 | C1.4, C2.4 | location-address | Location Address | |
C5 | T | 5 | 5 | C1.5, C2.5 | ticket-url | Ticket Url | |
C6 | T | 6 | 6 | C1.6, C2.6 | type-event | true | |
C7 | T | 7 | 7 | C1.7, C2.7 | type-place | true | |
C8 | T | 8 | 8 | C1.8, C2.8 | type-offer | true | |
C9 | T | 9 | 9 | C1.9, C2.9 | location | true | |
C10 | T | 10 | 10 | C1.10, C2.10 | offers | true |
Row annotations:
id | core annotations | |||
---|---|---|---|---|
table | number | source number | cells | |
R1 | T | 1 | 2 | C1.1, C1.2, C1.3, C1.4, C1.5, C1.6, C1.7, C1.8, C1.9, C1.10 |
R2 | T | 2 | 3 | C2.1, C2.2, C2.3, C2.4, C2.5, C2.6, C2.7, C2.8, C2.9, C2.10 |
Cell annotations:
id | core annotations | annotations | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
table | column | row | string value | value | errors | datatype | format | aboutUrl | propertyUrl | valueUrl | |
C1.1 | T | C1 | R1 | "B.B. King" | "B.B. King" | string | <http://example.org/events-listing.csv#event-1> | schema:name | |||
C1.2 | T | C2 | R1 | "2014-04-12T19:30" | 2014-04-12T19:30:00 | datetime | yyyy-MM-ddTHH:mm:ss | <http://example.org/events-listing.csv#event-1> | schema:startDate | ||
C1.3 | T | C3 | R1 | "Lupo’s Heartbreak Hotel" | "Lupo’s Heartbreak Hotel" | string | <http://example.org/events-listing.csv#place-1> | schema:name | |||
C1.4 | T | C4 | R1 | "79 Washington St., Providence, RI" | "79 Washington St., Providence, RI" | string | <http://example.org/events-listing.csv#place-1> | schema:address | |||
C1.5 | T | C5 | R1 | "https://www.etix.com/ticket/1771656" | <https://www.etix.com/ticket/1771656> | anyURI | <http://example.org/events-listing.csv#offer-1> | schema:url | |||
C1.6 | T | C6 | R1 | "" | null | string | <http://example.org/events-listing.csv#event-1> | rdf:type | schema:MusicEvent | ||
C1.7 | T | C7 | R1 | "" | null | string | <http://example.org/events-listing.csv#place-1> | rdf:type | schema:Place | ||
C1.8 | T | C8 | R1 | "" | null | string | <http://example.org/events-listing.csv#offer-1> | rdf:type | schema:Offer | ||
C1.9 | T | C9 | R1 | "" | null | string | <http://example.org/events-listing.csv#event-1> | schema:location | <http://example.org/events-listing.csv#place-1> | ||
C1.10 | T | C10 | R1 | "" | null | string | <http://example.org/events-listing.csv#event-1> | schema:offers | <http://example.org/events-listing.csv#offer-1> | ||
C2.1 | T | C1 | R2 | "B.B. King" | "B.B. King" | string | <http://example.org/events-listing.csv#event-2> | schema:name | |||
C2.2 | T | C2 | R2 | "2014-04-13T20:00" | 2014-04-13T20:00:00 | datetime | yyyy-MM-ddTHH:mm:ss | <http://example.org/events-listing.csv#event-2> | schema:startDate | ||
C2.3 | T | C3 | R2 | "Lynn Auditorium" | "Lynn Auditorium" | string | <http://example.org/events-listing.csv#place-2> | schema:name | |||
C2.4 | T | C4 | R2 | "Lynn, MA, 01901" | "Lynn, MA, 01901" | string | <http://example.org/events-listing.csv#place-2> | schema:address | |||
C2.5 | T | C5 | R2 | "http://frontgatetickets.com/venue.php?id=11766" | <http://frontgatetickets.com/venue.php?id=11766> | anyURI | <http://example.org/events-listing.csv#offer-2> | schema:url | |||
C2.6 | T | C6 | R2 | "" | null | string | <http://example.org/events-listing.csv#event-2> | rdf:type | schema:MusicEvent | ||
C2.7 | T | C7 | R2 | "" | null | string | <http://example.org/events-listing.csv#place-2> | rdf:type | schema:Place | ||
C2.8 | T | C8 | R2 | "" | null | string | <http://example.org/events-listing.csv#offer-2> | rdf:type | schema:Offer | ||
C2.9 | T | C9 | R2 | "" | null | string | <http://example.org/events-listing.csv#event-2> | schema:location | <http://example.org/events-listing.csv#place-2> | ||
C2.10 | T | C10 | R2 | "" | null | string | <http://example.org/events-listing.csv#event-2> | schema:offers | <http://example.org/events-listing.csv#offer-2> |
Minimal mode output for this example is provided below:
Three resources are defined for each row within the table; event, location and offer - therefore three objects are created for each row.
Each column explicitly defines both aboutUrl
and propertyUrl
properties which are inherited by the column's cells.
Columns C6, C7 and C8 ({ "name": "type-event"}
, { "name": "type-place"}
and { "name": "type-offer"}
) define the semantic types of the resources described by each row: schema:MusicEvent
, schema:Place
and schema:Offer
respectively—noting that the use of rdf:type
is converted to the name @type
(as used in [[json-ld]]) by this conversion application.
Column C9 ({ "name": "location"}
) uses the aboutUrl
and valueUrl
to assert the relationship between the event and location resources.
Column C10 ({ "name": "offer"}
) uses the aboutUrl
and valueUrl
to assert the relationship between the event and offer resources.
Standard mode output for this example is provided below:
The resources described by each row are explcitly defined using the aboutUrl
property - in this case three resources per row (event, location and offer); the objects containing the name-values pairs associated with the cell values of a row are related to the object for each subject in that row using the name-value pair with name describes
.
This example is based on Use Case #4 - Publication of public sector roles and salaries and uses three annotated tables published within a table group. Information about senior roles and junior roles within a government department are published in CSV format by each department. These are validated against a centrally published schema to ensure that all the data published by departments is consistent. Additionally, a list of professions is also published centrally, providing a controlled vocabulary against which departmental submissions are validated.
The input CSV+ files and associated metadata descriptions are provided below:
In this example, the resource gov.uk/professions.csv
is identified using a relative URL to host http://example.org
. In reality this resource would be published centrally by government and served from some remote host. Similarly, the metadata description resource metadata.json
would be also be centrally published. Government departments seeking to validate their role and salary data would download a copy of this metadata description and place it, without modification, in the same directory as their CSV+ files whose names MUST match those specified in the metadata description; senior-roles.csv
and junior-roles.csv
.
The table group generated from parsing the CSV+ files and associated metadata is shown below and provides the basis for the conversion to JSON.
Annotations for the table group G and the three tables Ta, Tb, and Tc are shown below.
Table Group annotations:
id | core annotations | annotations | |
---|---|---|---|
resources |
|||
G | Ta, Tb, Tc | @type | TableGroup |
Table annotations:
id | core annotations | annotations | |||||
---|---|---|---|---|---|---|---|
url | columns | rows | primaryKey | suppressOutput | foreignKeys | ||
columns | reference | ||||||
Ta | http://example.org/gov.uk/professions.csv | Ca1 | Ra1, Ra2, Ra3, Ra4 | Ca1 | true | ||
Tb | http://example.org/senior-roles.csv | Cb1, Cb2, Cb3, Cb4, Cb5, Cb6 | Rb1, Rb2 | Cb1 | Cb5 | Cb1 | |
Cb6 | Ca1 | ||||||
Tc | http://example.org/junior-roles.csv | Cc1, Cc2, Cc3, Cc4, Cc5, Cc6, Cc7 | Rc1, Rc2 | Cc1 | Cb1 | ||
Cc7 | Ca1 |
In this example, output for the centrally published list of professions, table Ta (http://example.org/gov.uk/professions.csv
), is not required; only information from the departmental submissions is to be translated to JSON. Note the suppressOutput
annotation on this table.
Annotations for the columns, rows and cells in table T are shown in the tables below.
Column annotations:
id | core annotations | annotations | |||||
---|---|---|---|---|---|---|---|
table | number | source number | cells | name | title | required | |
Ca1 | Ta | 1 | 1 | Ca1.1, Ca2.1, Ca3.1, Ca4.1 | name | Profession | true |
Cb1 | Tb | 1 | 1 | Cb1.1, Cb2.1 | ref | Post Unique Reference | true |
Cb2 | Tb | 2 | 2 | Cb1.2, Cb2.2 | name | Name | |
Cb3 | Tb | 3 | 3 | Cb1.3, Cb2.3 | grade | Grade | |
Cb4 | Tb | 4 | 4 | Cb1.4, Cb2.4 | job | Job Title | |
Cb5 | Tb | 5 | 5 | Cb1.5, Cb2.5 | reportsTo | Reports to Senior Post | |
Cb6 | Tb | 6 | 6 | Cb1.6, Cb2.6 | profession | Profession | |
Cc1 | Tc | 1 | 1 | Cc1.1, Cc2.1 | reportsToSenior | Reporting Senior Post | |
Cc2 | Tc | 2 | 2 | Cc1.2, Cc2.2 | grade | Grade | |
Cc3 | Tc | 3 | 3 | Cc1.3, Cc2.3 | min_pay | Payscale Minimum (£) | |
Cc4 | Tc | 4 | 4 | Cc1.4, Cc2.4 | max_pay | Payscale Maximum (£) | |
Cc5 | Tc | 5 | 5 | Cc1.5, Cc2.5 | job | Generic Job Title | |
Cc6 | Tc | 6 | 6 | Cc1.6, Cc2.6 | number | Number of Posts (FTE) | |
Cc7 | Tc | 7 | 7 | Cc1.7, Cc2.7 | profession | Profession |
Row annotations:
id | core annotations | |||
---|---|---|---|---|
table | number | source number | cells | |
Ra1 | Ta | 1 | 2 | Ca1.1 |
Ra2 | Ta | 2 | 3 | Ca2.1 |
Ra3 | Ta | 3 | 4 | Ca3.1 |
Ra4 | Ta | 4 | 5 | Ca4.1 |
Rb1 | Tb | 1 | 2 | Cb1.1, Cb1.2, Cb1.3, Cb1.4, Cb1.5, Cb1.6 |
Rb2 | Tb | 2 | 3 | Cb2.1, Cb2.2, Cb2.3, Cb2.4, Cb2.5, Cb2.6 |
Rc1 | Tc | 1 | 2 | Cc1.1, Cc1.2, Cc1.3, Cc1.4, Cc1.5, Cc1.6, Cc1.7 |
Rc2 | Tc | 2 | 3 | Cc2.1, Cc2.2, Cc2.3, Cc2.4, Cc2.5, Cc2.6, Cc2.7 |
Cell annotations:
id | core annotations | annotations | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
table | column | row | string value | value | errors | datatype | aboutUrl | propertyUrl | valueUrl | ||
Ca1.1 | Ta | Ca1 | Ra1 | "Finance" | "Finance" | string | |||||
Ca2.1 | Ta | Ca1 | Ra2 | "Information Technology" | "Information Techology" | string | |||||
Ca3.1 | Ta | Ca1 | Ra3 | "Operational Delivery" | "Operational Delivery" | string | |||||
Ca4.1 | Ta | Ca1 | Ra4 | "Policy" | "Policy" | string | |||||
Cb1.1 | Tb | Cb1 | Rb1 | "90115" | "90115" | string | <http://example.org/senior-roles.csv#post-90115> | dc:identifier | |||
Cb1.2 | Tb | Cb2 | Rb1 | "Steve Egan" | "Steve Egan" | string | <http://example.org/senior-roles.csv#post-90115> | foaf:name | |||
Cb1.3 | Tb | Cb3 | Rb1 | "SCS1A" | "SCS1A" | string | <http://example.org/senior-roles.csv#post-90115> | <http://example.org/def/grade> | |||
Cb1.4 | Tb | Cb4 | Rb1 | "Deputy Chief Executive" | "Deputy Chief Executive" | string | <http://example.org/senior-roles.csv#post-90115> | <http://example.org/def/job> | |||
Cb1.5 | Tb | Cb5 | Rb1 | "90334" | "90334" | string | <http://example.org/senior-roles.csv#post-90115> | <http://example.org/def/reportsTo> | <http://example.org/senior-roles.csv#post-90334> | ||
Cb1.6 | Tb | Cb6 | Rb1 | "Finance" | "Finance" | string | <http://example.org/senior-roles.csv#post-90115> | <http://example.org/def/profession> | |||
Cb2.1 | Tb | Cb1 | Rb2 | "90334" | "90334" | string | <http://example.org/senior-roles.csv#post-90334> | dc:identifier | |||
Cb2.2 | Tb | Cb2 | Rb2 | "Sir Alan Langlands" | "Sir Alan Langlands" | string | <http://example.org/senior-roles.csv#post-90334> | foaf:name | |||
Cb2.3 | Tb | Cb3 | Rb2 | "SCS4" | "SCS4" | string | <http://example.org/senior-roles.csv#post-90334> | <http://example.org/def/grade> | |||
Cb2.4 | Tb | Cb4 | Rb2 | "Chief Executive" | "Chief Executive" | string | <http://example.org/senior-roles.csv#post-90334> | <http://example.org/def/job> | |||
Cb2.5 | Tb | Cb5 | Rb2 | "xx" | null | string | <http://example.org/senior-roles.csv#post-90334> | <http://example.org/def/reportsTo> | |||
Cb2.6 | Tb | Cb6 | Rb2 | "Policy" | "Policy" | string | <http://example.org/senior-roles.csv#post-90334> | <http://example.org/def/profession> | |||
Cc1.1 | Tc | Cc1 | Rc1 | "90115" | "90115" | string | <http://example.org/def/reportsTo> | <http://example.org/senior-roles.csv#post-90115> | |||
Cc1.2 | Tc | Cc2 | Rc1 | "4" | "4" | string | <http://example.org/def/grade> | ||||
Cc1.3 | Tc | Cc3 | Rc1 | "17426" | 17426 | integer | <http://example.org/def/min_pay> | ||||
Cc1.4 | Tc | Cc4 | Rc1 | "20002" | 20002 | integer | <http://example.org/def/max_pay> | ||||
Cc1.5 | Tc | Cc5 | Rc1 | "Administrator" | "Administrator" | string | <http://example.org/def/job> | ||||
Cc1.6 | Tc | Cc6 | Rc1 | "8.67" | 8.67 | number | <http://example.org/def/number-of-posts> | ||||
Cc1.7 | Tc | Cc7 | Rc1 | "Operational Delivery" | "Operational Delivery" | string | <http://example.org/def/profession> | ||||
Cc2.1 | Tc | Cc1 | Rc2 | "90115" | "90115" | string | <http://example.org/def/reportsTo> | <http://example.org/senior-roles.csv#post-90115> | |||
Cc2.2 | Tc | Cc2 | Rc2 | "5" | "5" | string | <http://example.org/def/grade> | ||||
Cc2.3 | Tc | Cc3 | Rc2 | "19546" | 19546 | integer | <http://example.org/def/min_pay> | ||||
Cc2.4 | Tc | Cc4 | Rc2 | "22478" | 22478 | integer | <http://example.org/def/max_pay> | ||||
Cc2.5 | Tc | Cc5 | Rc2 | "Administrator" | "Administrator" | string | <http://example.org/def/job> | ||||
Cc2.6 | Tc | Cc6 | Rc2 | "0.5" | 0.5 | number | <http://example.org/def/number-of-posts> | ||||
Cc2.7 | Tc | Cc7 | Rc2 | "Operational Delivery" | "Operational Delivery" | string | <http://example.org/def/profession> |
Notice that valueUrl is not specified for cell Cb2.5 because the cell value is null
and the virtual
property of column Cb5 is not specified.
Minimal mode output for this example is provided below:
Prefixes defined within the RDFa 1.1 Initial Context ([[rdfa-core]]) are not expanded; e.g. dc:
for <http://purl.org/dc/terms/>.
Output for table Ta ({ "url": "http://example.org/gov.uk/professions.csv" }
) is not included as property suppressOutput
has value true.
The propertyUrl
is specified for all cells in tables Tb and Tc.
Columns Cb5 and Cc1 ({ "name": "reportsTo" }
and { "name": "reportsToSenior" }
) use the aboutUrl
, propertyUrl
and valueUrl
properties to assert the relationship between the given post and the senior post it reports to for the cells therein. However, since senior posts and junior posts are described in different tables so it is not possible to create nested objects for this particular case.
Standard mode output for this example is provided below: