Health Check Response Format for HTTP APIs


This document proposes a service health check response format for HTTP APIs.

1. Introduction

The vast majority of modern APIs driving data to web and mobile applications use HTTP [RFC7230] as their protocol. The health and uptime of these APIs determine availability of the applications themselves. In distributed systems built with a number of APIs, understanding the health status of the APIs and making corresponding decisions, for failover or circuit-breaking, are essential for providing highly available solutions.

There exists a wide variety of operational software that relies on the ability to read health check response of APIs. There is currently no standard for the health check output response, however, so most applications either rely on the basic level of information included in HTTP status codes [RFC7231] or use task-specific formats.

Usage of task-specific or application-specific rformats creates significant challenges, disallowing any meaningful interoprerability across different implementations and between different tooling.

Standardizing a format for health checks can provide any of a number of benefits, including:

This document defines a “health check” format using the JSON format [RFC8259] for APIs to use as a standard point for the health information they offer. Having a well-defined format for this purpose promotes good practice and tooling.

2. Notational Conventions

The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in [RFC2119].

3. API Health Response

An API Health Response Format (or, interchangeably, “health check response”) uses the format described in JSON [RFC8259] and has the media type “application/”.

Its content consists of a single mandatory root field (“status”) and several optional fields:

For example:

  GET /health HTTP/1.1
  Accept: application/

  HTTP/1.1 200 OK
  Content-Type: application/
  Cache-Control: max-age=3600
  Connection: close

    "status": "pass",
    "version" : "1",
    "release_id" : "1.2.2",
    "uptime": "1209600.245",
    "connections" : 25,
    "notes": [""],
    "output": "",
    "details": [
        "componentId": "dfd6cf2b-1b6e-4412-a0b8-f6f7797a60d2",
        "componentName": "Cassandra",
        "componentType" : "datastore",
        "metricName" : "responseTime",
        "metricValue": 250,
        "metricUnit" : "milliseconds",
        "status": "pass",
        "time" : "2018-01-17T03:36:48Z",
        "output": ""
        "componentId": "dfd6cf2b-1b6e-4412-a0b8-f6f7797a60d2",
        "componentName": "Cassandra",
        "type" : "datastore",
        "metricName" : "connections",
        "metricValue": 75,
        "status": "warn",
        "time" : "2018-01-17T03:36:48Z",
        "output": ""
        "componentId": "6fd416e0-8920-410f-9c7b-c479000f7227",
        "componentName": "cpu",
        "type" : "system",
        "metricName" : "utilization",
        "metricValue": 85,
        "metricUnit" : "percent",
        "status": "warn",
        "time" : "2018-01-17T03:36:48Z",
        "output": ""
        "componentId": "6fd416e0-8920-410f-9c7b-c479000f7227",
        "componentName": "cpu",
        "type" : "system",
        "metricName" : "utilization",
        "metricValue": 85,
        "metricUnit" : "percent",
        "status": "warn",
        "time" : "2018-01-17T03:36:48Z",
        "output": ""
        "componentId": "6fd416e0-8920-410f-9c7b-c479000f7227",
        "componentName": "memory",
        "type" : "system",
        "node" : 1,
        "metricName" : "utilization",
        "metricValue": 8.5,
        "metricUnit" : "gb",
        "status": "warn",
        "time" : "2018-01-17T03:36:48Z",
        "output": ""
        "componentId": "6fd416e0-8920-410f-9c7b-c479000f7227",
        "componentName": "memory",
        "node" : 2,
        "type" : "system",
        "metricName" : "utilization",
        "metricValue": 5500,
        "metricUnit" : "mb",
        "status": "pass",
        "time" : "2018-01-17T03:36:48Z",
        "output": ""
    "links": [
      {"rel": "about", "uri": ""},
        "rel": "",
        "uri": ""
    "serviceID": "f03e522f-1f44-4062-9b55-9587f91c9c41",
    "description": "health of authz service"

4. Details Object

Following fields MAY appear and rules SHOULD be used for the details objects of the reponse.

5. Security Considerations

Clients need to exercise care when reporting health information. Malicious actors could use this information for orchestrating attacks. In some cases the health check endpoints may need to be authenticated and institute role-based access control.

6. IANA Considerations

6.1. Media Type Registration

TODO: application/ is being submitted for registration per [RFC6838]

Appendix A. Acknowledgements

Thanks to Mike Amundsen, Erik Wilde, Justin Bachorik and Randall Randall for their suggestions and feedback. And to Mark Nottingham for blueprint for authoring RFCs easily.

Appendix B. Creating and Serving Health Responses

When making an health check endpoint available, there are a few things to keep in mind:

Appendix C. Consuming Health Check Responses

Clients might use health check responses in a variety of ways.

Note that the health check response is a “living” document; links from the health check response MUST NOT be assumed to be valid beyond the freshness lifetime of the health check response, as per HTTP’s caching model [RFC7234].

As a result, clients ought to cache the health check response (as per [RFC7234]), to avoid fetching it before every interaction (which would otherwise be required).

Likewise, a client encountering a 404 (Not Found) on a link is encouraged to obtain a fresh copy of the health check response, to assure that it is up-to-date.

