Building JSON-LD APIs: Best Practices

Developers share a common problem: they want a simple, but extensible way to create an API for a web service that gets the job done, doesn't design them into a corner, and allows developers to easily interact with their service without reinventing the wheel. JSON-LD [[json-ld]] has become an important solution, as it bridges the gap between formally data and more coloquial JSON interfaces used in APIs from numerous providers. This guide attempts to define certain best practices for publishing data using JSON-LD, and interacting with such services.

Introduction

Coming up with a data format for your API is a common problem. It can be hard to choose between different data representations, what names you want to pick and even harder if you want to leave room for extensibility. How do you make all these decisions? How do make your API easy to use so people can use short strings to reference common things, but URLs to enable people to come up with their own so it isn't limiting? How can you make it easy for other people to add their own data in and make it interoperable? How do you consume data from other similar apps? There are technologies that can help you do this. EXAMPLES. Now, it isn't perfect – sometimes it won't solve your problem, but it could maybe solve a lot of them.

The use of JSON on the web has grown immensely in the last decade, particularly with the explosion of APIs that eschew XML in favor of what is considered to be a more developer friendly format which is directly compatible with JavaScript. As a result, different sites have chosen their own proprietary representations for interacting with their sites, sometimes described using frameworks such as [[swagger]] which imply a particular URI composition for interacting with their services. This practice leads to vendor-specific semantic silos, where the meaning of a particular JSON document makes sense only by programming directly to the API documentation for a given service.

show examples from GitHub, Twitter, …?

As services grow the often introduce incompatible changes leading to a Version 2 or Version 3 of their API requiring developers to update client code to properly handle JSON documents. In many cases, even small changes can lead to incompatibilities. Additionally, composing information from multiple APIs becomes problematic, due to namespace or document format conventions that may differ between API endpoints. Moreover, the same principles are often repeated across different endpoints using arbitrary identifiers (name, email, website, etc.); the community needs to learn to stop repeating itself (DRY concept) and reuse common conventions, although this does not necessarily have to mean using exactly the same identifiers within the JSON itself (see JSON-LD Context).

This Note proposes to outline a number of best practices for API designers JSON developers based on the principles of separation of data model from syntax, the use of discoverable identifiers describing document contents, and general organizing principles that allow documents to be machine understandable (read, interpreted as JSON-LD using Linked Data, RDF and RDFS vocabulary and data model principles).

Key among these is the notion of vocabulary re-use, so that each endpoint does not need to separately describe the properties and structure of their JSON documents. Schema.org provides a great example of doing this, and includes an extension mechanism that may already be familiar to API designers.

This note expands on Data on the Web Best Practices [[dwbp]].

Notes

dlongley: all of these things are just common dev things ... and if people saw JSON-LD as a solution to those problems instead of this thing that they don't understand at all and is maybe related to something scary they heard of once then we could get people using it and saying "oh hey, this will solve our problem, let's use it" ... and then we can finally get things modeled properly and hooked up to everything else (be that RDFa stuff or whatever!) we need that to be the document we point people to – and it needs to be a super popular primer that shows how to use this tech to solve basic dev data definition and integration problems, no Linked Data nothing. once they understand that first part ... that it will just help with interop and figuring out how to define things in a nice way ... the rest will just be gravy.

gkellogg: Sure, I’ve designed some JSON APIs that use JSON-LD but in a way that makes it accessible for JSON programmers not needing to know or care about the “-LD” part. Note that schema.org has sort of tackled the problem, themselves, with their Action class, with examples on how to poke things using JSON-LD as a descriptor embedded in websites or emails.

The key is that payloads need to look like reasonable JSON, but also be self describing as RDF for those who care. Affording REST operations is where it gets more challenging (see Hydra).

dlongley: I don't think we'd involve REST (or HTTP) at all in this document. This is just about the actual payloads/messages being passed around, not the protocol it travels in. This is just about telling people how they ought to model their data in a sane way and how others can extend that model easily. We don't want to focus on whether these messages are being passed to a function, travelling over HTTP, or using some other protocol.

gkellogg: We need to scope out who the target audience is, what the use cases are, and what kind of document we want to produce (i.e., spec for a specific API, general best practices on designing APIs for JSON developers using JSON-LD, or what).

Also, note that Markus is engaged in his long and drawn-out problem of “solving” the self describing API mechanism with Hydra. This becomes more important when payloads really describe linked data and you need to understand how to communicate REST operations, performing paging, and deal with collections from an API.

dlongley: So this came up because of what people are doing with the payment request API in the Web Payments WG right now. They are looking at designing what information gets passed into a JS API for requesting payment. That information is pretty much modeled as a payment request message that can list a number of different payment methods that a merchant supports along with price information, etc.

The basic idea for the API is that the browser can read the supported payment methods and show the user a list of payment apps they have installed that match those methods. Then the browser hands the message off to those apps and they can do custom things based on the payment methods.

The group is spending a lot of time trying to decide things like whether or not payment method IDs will be short strings or URLs -- and if they are both, how we could support them or map them as needed. The group also isn't sure how to make this stuff extensible so people can define their own payment methods and put whatever machine-readable information they want to into payment-method specific sections of the message.

It never dawned on them that there's a technology that can help them with this stuff already.

We need to show people that there's a technology that can help them solve these problems in an interoperable, decentralized way -- that people can model their information and use a syntax that can carry all of that stuff together in a single message that doesn't require them to update the spec whenever something new comes along.

This is just a general modeling problem for people who do all of their modeling with JSON/JavaScript Objects. Another, related, general problem is how people go about writing applications that consume JSON data from multiple similar sources. Demonstrating to them the power of JSON-LD compaction by showing them that they can recompact data from N sources to a context their application understands would be very useful.

We want this to be a semi-informal, colloquial document that is just like "Hey, a common problem devs have to solve is deciding what to name their properties in the JSON they pass to their APIs. Sometimes you want to even let other people extend your JSON -- but how can you do that in a sane way? Sometimes you want your application to consume JSON that someone else wrote, but it's a little different. How can you solve that problem in a general way? There's a technology that can help you do this and it's called JSON-LD."

gkellogg: I don’t think we can do this without talking about the strong data model RDF provides, and how JSON is a projection of that model. This includes the ability for the publisher to shape the data for the intended consumers, but also allows consumers to re-shape the data for their own purposes. Common things are simple, complex things are possible.

Also, any discussion of API needs to consider REST and container semantics. This could include a survey of different techniques, but really should include LDP and Hydra as well.

Seems to me this is a document describing different use cases and best practices for managing them. But, it does need a focus, which I presume to be either payments, products, or credentials.

Vocabulary is important, and we can’t ignore schema.org, and it’s Action sub-vocabulary in particular. But clearly needs to consider application-specific vocabularies for the important use cases.

What group would likely publish this note? There is a Data on the Web Best Practices WG, which this would seem to suit.

Resource Representation

Publish data using developer friendly JSON

JSON [[json]] is the most popular format for publishing data through APIs; developers like it, it is easy to parse, and it is supported natively in most programming languages.

Use well-known identifiers when describing data

By sticking to basic JSON data expression, and providing a JSON-LD Context, all keys used within a JSON document can have unambigious meaning, as they bind to URLs which describe their meaning.

Cache JSON-LD Contexts

While most use of JSON-LD should not require a client to change the data representation, JSON-LD does allow the use of various algorithms to re-shape a JSON-LD document. These require the use of the JSON-LD Context, which is typically represented using a link to a remote document. Because it is remote, processing time can be severely impacted by the time it takes to retrieve this context. Services providing a JSON-LD Context SHOULD set HTTP cache-control headers to allow liberal caching of such contexts, and clients SHOULD attempt to use a locally cached version of these documents. Typically, libraries used to process JSON-LD documents should do this for you. (See also [[json-ld-best-practice-caching]]).

Use a top-level object

JSON documents may be in the form of a object, or an array of objects. For most purposes, develpers need a single entrypoint, so the JSON should be in the form of a single top-level object

Use native values

When possible, property values should use native JSON datatypes such as numbers (integer, decimal and floating point) and booleans (true and false).

Assume arrays are unordered

JSON specifies that the values in an array are ordered, however in many cases arrays are also used for values which are unordered. Unless specified within the JSON-LD Context, multiple array values should be presumed to be unordered. (See Lists and Sets in [[json-ld]]).

Provide one or more types for JSON objects

Identify objects with a unique identifier

External references should use datatyped term

Referenced objects provided inline should be

Introduction

Terminology

Notes

Resource Representation

Serializing Large Collections

Reuse Vocabularies

Describe API affordances