Speech to Text

Service Overview

The IBM Speech to Text service provides a Representational State Transfer (REST) Application Programming Interface (API) that enables you to add IBM’s speech transcription capabilities to your applications. The service also supports an asynchronous HTTP interface for transcribing audio via non-blocking calls. The service transcribes speech from various languages and audio formats to text with low latency. The service supports transcription of the following languages: Brazilian Portuguese, Japanese, Mandarin Chinese, Modern Standard Arabic, Spanish, UK English, and US English. For most languages, the service supports two sampling rates, broadband and narrowband.

API Overview

The Speech to Text service provides the following endpoints:

/v1/models returns information about the models (languages and sampling rates) available for transcription.
/v1/sessions provides a collection of methods that provide a mechanism for a client to maintain a long, multi-turn exchange, or session, with the service or to establish multiple parallel conversations with a particular instance of the service.
/v1/recognize (sessionless) includes a single method that provides a simple means of transcribing audio without the overhead of establishing and maintaining a session, but it lacks some of the capabilities available with sessions.
/v1/register_callback (asynchronous) offers a single method that registers, or white-lists, a callback URL for use with methods of the asynchronous HTTP interface.
/v1/recognitions (asynchronous) provides a set of non-blocking methods for submitting, querying, and deleting jobs for recognition requests with the asynchronous HTTP interface.

API Usage

The following general information pertains to the transcription of audio:

You can pass the audio to be transcribed as a one-shot delivery or in streaming mode. With one-shot delivery, you pass all of the audio data to the service at one time. With streaming mode, you send audio data to the service in chunks over a persistent connection. If your data consists of multiple parts, you must stream the data. To use streaming, you must pass the Transfer-Encoding request header with a value of chunked. Both forms of data transmission impose a limit of 100 MB of total data for transcription.
You can use methods of the session-based, sessionless, or asynchronous HTTP interfaces to pass audio data to the service. All interfaces let you send the data via the body of the request; the session-based and sessionless methods also let you pass data in the form of one or more audio files as multipart form data. With the former approach, you control the transcription via a collection of request headers and query parameters. With the latter, you control the transcription primarily via JSON metadata sent as form data.
The service also offers a WebSocket interface as an alternative to its HTTP interfaces. The WebSocket interface supports efficient implementation, lower latency, and higher throughput. The interface establishes a persistent connection with the service, eliminating the need for session-based calls from the HTTP interface.
By default, all Watson services log requests and their results. Data is collected only to improve the Watson services. If you do not want to share your data, set the header parameter X-Watson-Learning-Opt-Out to true for each request. Data is collected for any request that omits this header.

For more information about using the Speech to Text service and the various interfaces it supports, see Using the Speech to Text service.

models ¶

Retrieves the models available for the service
GET`/speech-to-text/api/v1/models`

Returns a list of all models available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

Example URI

GET /speech-to-text/api/v1/models

Response 200

HideShow

OK.

Schema

{
  "description": "Information about the available models.",
  "required": [
    "models"
  ],
  "properties": {
    "models": {
      "type": "array",
      "items": {
        "required": [
          "name",
          "rate",
          "url",
          "language",
          "description"
        ],
        "properties": {
          "name": {
            "description": "Name of the model for use as an identifier in calls to the service (for example, `en-US_BroadbandModel`).",
            "type": "string"
          },
          "language": {
            "description": "Language identifier for the model (for example, `en-US`).",
            "type": "string"
          },
          "rate": {
            "description": "Sampling rate (minimum acceptable rate for audio) used by the model in Hertz.",
            "type": "integer",
            "format": "int32"
          },
          "url": {
            "description": "URI for the model.",
            "type": "string"
          },
          "sessions": {
            "description": "URI for the model for use with the POST `/v1/sessions` method.",
            "type": "string"
          },
          "description": {
            "description": "Brief description of the model.",
            "type": "string"
          }
        }
      }
    }
  }
}

Response 406

HideShow

Not Acceptable.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Response 415

HideShow

Unsupported Media Type.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Retrieves information about the model
GET`/speech-to-text/api/v1/models/{model_id}`

Returns information about a single specified model that is available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.

Example URI

GET /speech-to-text/api/v1/models/model_id

URI Parameters

model_id: string (required)
The identifier of the desired model in the form of its name from the output of GET /v1/models.

Response 200

HideShow

OK.

Schema

{
  "required": [
    "name",
    "rate",
    "url",
    "language",
    "description"
  ],
  "properties": {
    "name": {
      "description": "Name of the model for use as an identifier in calls to the service (for example, `en-US_BroadbandModel`).",
      "type": "string"
    },
    "language": {
      "description": "Language identifier for the model (for example, `en-US`).",
      "type": "string"
    },
    "rate": {
      "description": "Sampling rate (minimum acceptable rate for audio) used by the model in Hertz.",
      "type": "integer",
      "format": "int32"
    },
    "url": {
      "description": "URI for the model.",
      "type": "string"
    },
    "sessions": {
      "description": "URI for the model for use with the POST `/v1/sessions` method.",
      "type": "string"
    },
    "description": {
      "description": "Brief description of the model.",
      "type": "string"
    }
  }
}

Response 404

HideShow

Not Found. Model not found.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Response 406

HideShow

Not Acceptable.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Response 415

HideShow

Unsupported Media Type.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

sessions ¶

Creates a session
POST`/speech-to-text/api/v1/sessions`

Creates a session and locks recognition requests to that engine. You can use the session for multiple recognition requests so that each request is processed with the same Speech to Text engine. Use the cookie that is returned from this operation in the set-cookie header for each request that uses this session.

The session expires after 30 seconds of inactivity. Use a GET request on the session_id to prevent the session from expiring.

Example URI

POST /speech-to-text/api/v1/sessions

URI Parameters

model: string (required)
The identifier of the model to be used by the new session (use GET /v1/models or GET /v1/models/{model_id} for information about available models).

Request

HideShow

Schema

{
  "type": "string"
}

Response 201

HideShow

Created.

Schema

{
  "required": [
    "session_id",
    "new_session_uri",
    "recognize",
    "observe_result",
    "recognizeWS"
  ],
  "properties": {
    "session_id": {
      "description": "Identifier for the new session.",
      "type": "string"
    },
    "new_session_uri": {
      "description": "URI for the new session.",
      "type": "string"
    },
    "recognize": {
      "description": "URI for REST recognition requests.",
      "type": "string"
    },
    "observe_result": {
      "description": "URI for REST results observers.",
      "type": "string"
    },
    "recognizeWS": {
      "description": "URI for WebSocket recognition requests. Needed only for working with the WebSocket interface.",
      "type": "string"
    }
  }
}

Request

HideShow

Schema

{
  "type": "string"
}

Response 406

HideShow

Not Acceptable.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Request

HideShow

Schema

{
  "type": "string"
}

Response 415

HideShow

Unsupported Media Type.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Request

HideShow

Schema

{
  "type": "string"
}

Response 503

HideShow

Service Unavailable.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Deletes the specified session
DELETE`/speech-to-text/api/v1/sessions/{session_id}`

Deletes an existing session and its engine. You cannot send requests to a session after it is deleted.

Example URI

DELETE /speech-to-text/api/v1/sessions/session_id

URI Parameters

session_id: string (required)
The ID of the session to be deleted.

Response 204

HideShow

No Content.

Response 400

HideShow

Bad Request. Cookie must be set.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Response 404

HideShow

Not Found. 'session_id' not found.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Response 406

HideShow

Not Acceptable.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Observes results for a recognition task within a session
GET`/speech-to-text/api/v1/sessions/{session_id}/observe_result`

Requests results for a recognition task within the specified session. You can submit multiple requests for the same recognition task. To see interim results, set the query parameter interim_results=true.

Specify a sequence ID (with the sequence_id query parameter) that matches the sequence ID of a recognition request to see results for that recognition task. A request with a sequence ID can arrive before, during, or after the matching recognition request, but it must arrive no later than 30 seconds after the recognition completes to avoid a session timeout (status code 408). Send multiple requests for the sequence ID with a maximum gap of 30 seconds to avoid the timeout. Omit the sequence ID to observe results for an ongoing recognition task; if no recognition is ongoing, the method returns results for the next recognition task regardless of whether it specifies a sequence ID.

Example URI

GET /speech-to-text/api/v1/sessions/session_id/observe_result

URI Parameters

session_id: string (required)
The ID of the session whose results you want to observe.
sequence_id: string (required)
The sequence ID of the recognition task whose results you want to observe. Omit the parameter to obtain results either for an ongoing recognition, if any, or for the next recognition task regardless of whether it specifies a sequence ID.
interim_results: string (required)
If true, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent. If false, the response is a single SpeechRecognitionEvent with final results only.

Response 200

HideShow

OK.

Schema

{
  "required": [
    "results",
    "result_index"
  ],
  "properties": {
    "results": {
      "description": "The results array consists of zero or more final results followed by zero or one interim result. The final results are guaranteed not to change; the interim result may be replaced by zero or more final results (followed by zero or one interim result). The service periodically sends updates to the result list, with the `result_index` set to the lowest index in the array that has changed.",
      "type": "array",
      "items": {
        "required": [
          "final",
          "alternatives"
        ],
        "properties": {
          "final": {
            "description": "If `true`, the result for this utterance is not updated further.",
            "type": "boolean"
          },
          "alternatives": {
            "description": "Array of alternative transcripts.",
            "type": "array",
            "items": {
              "required": [
                "transcript"
              ],
              "properties": {
                "transcript": {
                  "description": "Transcription of the audio.",
                  "type": "string"
                },
                "confidence": {
                  "description": "Confidence score of the transcript in the range of 0 to 1. Available only for the best alternative and only in results marked as final.",
                  "type": "number",
                  "format": "double",
                  "minimum": 0,
                  "maximum": 1
                },
                "timestamps": {
                  "description": "Time alignments for each word from transcript as a list of lists. Each inner list consists of three elements: the word followed by its start and end time in hundredths of seconds. Example: `[[\"hello\",0.0,1.2],[\"world\",1.2,2.5]]`. Available only for the best alternative.",
                  "type": "array",
                  "items": {
                    "type": "string"
                  }
                },
                "word_confidence": {
                  "description": "Confidence score for each word of the transcript as a list of lists. Each inner list consists of two elements: the word and its confidence score in the range of 0 to 1. Example: `[[\"hello\",0.95],[\"world\",0.866]]`. Available only for the best alternative and only in results marked as final.",
                  "type": "array",
                  "items": {
                    "type": "string"
                  }
                }
              }
            }
          },
          "keywords_result": {
            "description": "Dictionary (or associative array) whose keys are the strings specified for `keywords` if both that parameter and `keywords_threshold` are specified. A keyword for which no matches are found is omitted from the array.",
            "required": [
              "keyword"
            ],
            "properties": {
              "keyword": {
                "description": "List of each keyword entered via the `keywords` parameter and, for each keyword, an array of `KeywordResult` objects that provides information about its occurrences in the input audio. The keys of the list are the actual keyword strings. A keyword for which no matches are spotted in the input is omitted from the array.",
                "type": "array",
                "items": {
                  "required": [
                    "normalized_text",
                    "start_time",
                    "end_time",
                    "confidence"
                  ],
                  "properties": {
                    "normalized_text": {
                      "description": "Specified keyword normalized to the spoken phrase that matched in the audio input.",
                      "type": "string"
                    },
                    "start_time": {
                      "description": "Start time in hundredths of seconds of the keyword match.",
                      "type": "number",
                      "format": "double"
                    },
                    "end_time": {
                      "description": "End time in hundredths of seconds of the keyword match.",
                      "type": "number",
                      "format": "double"
                    },
                    "confidence": {
                      "description": "Confidence score of the keyword match in the range of 0 to 1.",
                      "type": "number",
                      "format": "double",
                      "minimum": 0,
                      "maximum": 1
                    }
                  }
                }
              }
            }
          },
          "word_alternatives": {
            "description": "Array of word alternative hypotheses found for words of the input audio if `word_alternatives_threshold` is not null.",
            "type": "array",
            "items": {
              "required": [
                "start_time",
                "end_time",
                "alternatives"
              ],
              "properties": {
                "start_time": {
                  "description": "Start time in hundredths of seconds of the word from the input audio that corresponds to the word alternatives.",
                  "type": "number",
                  "format": "double"
                },
                "end_time": {
                  "description": "End time in hundredths of seconds of the word from the input audio that corresponds to the word alternatives.",
                  "type": "number",
                  "format": "double"
                },
                "alternatives": {
                  "description": "Array of word alternative hypotheses for a word from the input audio.",
                  "type": "array",
                  "items": {
                    "required": [
                      "confidence",
                      "word"
                    ],
                    "properties": {
                      "confidence": {
                        "description": "Confidence score of the word alternative hypothesis.",
                        "type": "number",
                        "format": "double",
                        "minimum": 0,
                        "maximum": 1
                      },
                      "word": {
                        "description": "Word alternative hypothesis for a word from the input audio.",
                        "type": "string"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    },
    "result_index": {
      "description": "An index that indicates the change point in the `results` array.",
      "type": "integer",
      "format": "int32"
    },
    "warnings": {
      "description": "An array of warning messages about invalid query parameters or JSON fields included with the request. Each warning includes a descriptive message and a list of invalid argument strings. For example, a message such as `\"Unknown arguments:\"` or `\"Unknown url query arguments:\"` followed by a list of the form `\"invalid_arg_1, invalid_arg_2.\"` The request succeeds despite the warnings.",
      "type": "array",
      "items": {
        "type": "string"
      }
    }
  }
}

Response 400

HideShow

Bad Request. User input error (for example, audio not matching the specified format) or an inactivity timeout occurred. If an existing session is closed, session_closed is set to true.

Schema

{
  "required": [
    "error",
    "code",
    "code_description",
    "session_closed"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    },
    "session_closed": {
      "description": "Specifies the value `true` if the active session is closed as a result of the problem.",
      "type": "boolean"
    }
  }
}

Response 404

HideShow

Not Found. The 'session_id' was not found or A specified sequence_id does not match the sequence ID of the recognition task.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Response 406

HideShow

Not Acceptable.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Response 408

HideShow

Request Timeout. Session closed due to inactivity (session timeout) for 30 seconds. session_closed is set to true.

Schema

{
  "required": [
    "error",
    "code",
    "code_description",
    "session_closed"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    },
    "session_closed": {
      "description": "Specifies the value `true` if the active session is closed as a result of the problem.",
      "type": "boolean"
    }
  }
}

Response 413

HideShow

Payload Too Large. Session closed because the input stream is larger than currently supported data limit. session_closed is set to true.

Schema

{
  "required": [
    "error",
    "code",
    "code_description",
    "session_closed"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    },
    "session_closed": {
      "description": "Specifies the value `true` if the active session is closed as a result of the problem.",
      "type": "boolean"
    }
  }
}

Response 415

HideShow

Unsupported Media Type.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Response 500

HideShow

Internal server error. The session is destroyed with session_closed set to true. Future requests that use this session return HTTP response code 404.

Schema

{
  "required": [
    "error",
    "code",
    "code_description",
    "session_closed"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    },
    "session_closed": {
      "description": "Specifies the value `true` if the active session is closed as a result of the problem.",
      "type": "boolean"
    }
  }
}

Checks whether a session is ready to accept a new recognition task
GET`/speech-to-text/api/v1/sessions/{session_id}/recognize`

Provides a way to check whether the specified session can accept another recognition request. Concurrent recognition tasks during the same session are not allowed. The returned state must be initialized to indicate that you can send another recognition request with the POST recognize method.

Example URI

GET /speech-to-text/api/v1/sessions/session_id/recognize

URI Parameters

session_id: string (required)
The ID of the session for the recognition task.

Response 200

HideShow

OK.

Schema

{
  "required": [
    "session"
  ],
  "properties": {
    "session": {
      "description": "Description of the state and possible actions for the current session.",
      "required": [
        "state",
        "model",
        "recognize",
        "observe_result",
        "recognizeWS"
      ],
      "properties": {
        "state": {
          "description": "State of the session. The state must be `initialized` to perform a new recognition request on the session.",
          "type": "string"
        },
        "model": {
          "description": "URI for information about the model that is used with the session.",
          "type": "string"
        },
        "recognize": {
          "description": "URI for REST recognition requests.",
          "type": "string"
        },
        "observe_result": {
          "description": "URI for REST results observers.",
          "type": "string"
        },
        "recognizeWS": {
          "description": "URI for WebSocket recognition requests. Needed only for working with the WebSocket interface.",
          "type": "string"
        }
      }
    }
  }
}

Response 404

HideShow

Not Found. 'session_id' not found.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Response 406

HideShow

Not Acceptable.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Response 415

HideShow

Unsupported Media Type.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Sends audio for speech recognition within a session
POST`/speech-to-text/api/v1/sessions/{session_id}/recognize{?sequence_id,continuous,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}`

Sends audio and returns transcription results for a session-based recognition request. By default, returns only the final results; to see interim results, set the query parameter interim_results=true in a GET request to the observe_result method before this POST request finishes. To enable polling by the observe_result method for large audio requests, specify an integer with the sequence_id query parameter for non-multipart requests or with the sequence_id parameter of the JSON metadata for multipart requests. The service imposes a data size limit of 100 MB per session. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set Transfer-Encoding to chunked to use streaming mode. In streaming mode, the server closes the session (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the session (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a path parameter, request headers, and query parameters. You provide the audio as the body of the request. Use the following parameters:

Required: session_id, Content-Type, and body
Optional: Transfer-Encoding, sequence_id, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request via a path parameter and as request headers, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

Required: session_id, Content-Type, metadata, and multipart
Optional: Transfer-Encoding

An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”:-1}"

Example URI

POST /speech-to-text/api/v1/sessions/session_id/recognize?sequence_id=&continuous=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=

URI Parameters

session_id: string (required)
The ID of the session for the recognition task.
sequence_id: string (required)
Non-multipart only: Sequence ID of this recognition task in the form of a user-specified integer. If omitted, no sequence ID is associated with the recognition task.
continuous: string (required)
Non-multipart only: If true, multiple final results representing consecutive phrases separated by long pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.
inactivity_timeout - <u>Non-multipart only:</u> The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error and with `session_closed` set to `true`. Useful for stopping audio submission from a live microphone when a user simply walks away. Use `: string (required)
1for infinity. See also thecontinuous` parameter.
keywords: string (required)
Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.
keywords_threshold: string (required)
Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.
max_alternatives: string (required)
Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
word_alternatives_threshold: string (required)
Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.
word_confidence: string (required)
Non-multipart only: If true, confidence measure per word is returned.
timestamps: string (required)
Non-multipart only: If true, time alignment for each word is returned.
profanity_filter: string (required)
Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.
smart_formatting: string (required)
Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request

HideShow

Body

{
  "metadata": "Hello, world!",
  "upload": "Hello, world!"
}

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 200

HideShow

OK.

Schema

{
  "required": [
    "results",
    "result_index"
  ],
  "properties": {
    "results": {
      "description": "The results array consists of zero or more final results followed by zero or one interim result. The final results are guaranteed not to change; the interim result may be replaced by zero or more final results (followed by zero or one interim result). The service periodically sends updates to the result list, with the `result_index` set to the lowest index in the array that has changed.",
      "type": "array",
      "items": {
        "required": [
          "final",
          "alternatives"
        ],
        "properties": {
          "final": {
            "description": "If `true`, the result for this utterance is not updated further.",
            "type": "boolean"
          },
          "alternatives": {
            "description": "Array of alternative transcripts.",
            "type": "array",
            "items": {
              "required": [
                "transcript"
              ],
              "properties": {
                "transcript": {
                  "description": "Transcription of the audio.",
                  "type": "string"
                },
                "confidence": {
                  "description": "Confidence score of the transcript in the range of 0 to 1. Available only for the best alternative and only in results marked as final.",
                  "type": "number",
                  "format": "double",
                  "minimum": 0,
                  "maximum": 1
                },
                "timestamps": {
                  "description": "Time alignments for each word from transcript as a list of lists. Each inner list consists of three elements: the word followed by its start and end time in hundredths of seconds. Example: `[[\"hello\",0.0,1.2],[\"world\",1.2,2.5]]`. Available only for the best alternative.",
                  "type": "array",
                  "items": {
                    "type": "string"
                  }
                },
                "word_confidence": {
                  "description": "Confidence score for each word of the transcript as a list of lists. Each inner list consists of two elements: the word and its confidence score in the range of 0 to 1. Example: `[[\"hello\",0.95],[\"world\",0.866]]`. Available only for the best alternative and only in results marked as final.",
                  "type": "array",
                  "items": {
                    "type": "string"
                  }
                }
              }
            }
          },
          "keywords_result": {
            "description": "Dictionary (or associative array) whose keys are the strings specified for `keywords` if both that parameter and `keywords_threshold` are specified. A keyword for which no matches are found is omitted from the array.",
            "required": [
              "keyword"
            ],
            "properties": {
              "keyword": {
                "description": "List of each keyword entered via the `keywords` parameter and, for each keyword, an array of `KeywordResult` objects that provides information about its occurrences in the input audio. The keys of the list are the actual keyword strings. A keyword for which no matches are spotted in the input is omitted from the array.",
                "type": "array",
                "items": {
                  "required": [
                    "normalized_text",
                    "start_time",
                    "end_time",
                    "confidence"
                  ],
                  "properties": {
                    "normalized_text": {
                      "description": "Specified keyword normalized to the spoken phrase that matched in the audio input.",
                      "type": "string"
                    },
                    "start_time": {
                      "description": "Start time in hundredths of seconds of the keyword match.",
                      "type": "number",
                      "format": "double"
                    },
                    "end_time": {
                      "description": "End time in hundredths of seconds of the keyword match.",
                      "type": "number",
                      "format": "double"
                    },
                    "confidence": {
                      "description": "Confidence score of the keyword match in the range of 0 to 1.",
                      "type": "number",
                      "format": "double",
                      "minimum": 0,
                      "maximum": 1
                    }
                  }
                }
              }
            }
          },
          "word_alternatives": {
            "description": "Array of word alternative hypotheses found for words of the input audio if `word_alternatives_threshold` is not null.",
            "type": "array",
            "items": {
              "required": [
                "start_time",
                "end_time",
                "alternatives"
              ],
              "properties": {
                "start_time": {
                  "description": "Start time in hundredths of seconds of the word from the input audio that corresponds to the word alternatives.",
                  "type": "number",
                  "format": "double"
                },
                "end_time": {
                  "description": "End time in hundredths of seconds of the word from the input audio that corresponds to the word alternatives.",
                  "type": "number",
                  "format": "double"
                },
                "alternatives": {
                  "description": "Array of word alternative hypotheses for a word from the input audio.",
                  "type": "array",
                  "items": {
                    "required": [
                      "confidence",
                      "word"
                    ],
                    "properties": {
                      "confidence": {
                        "description": "Confidence score of the word alternative hypothesis.",
                        "type": "number",
                        "format": "double",
                        "minimum": 0,
                        "maximum": 1
                      },
                      "word": {
                        "description": "Word alternative hypothesis for a word from the input audio.",
                        "type": "string"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    },
    "result_index": {
      "description": "An index that indicates the change point in the `results` array.",
      "type": "integer",
      "format": "int32"
    },
    "warnings": {
      "description": "An array of warning messages about invalid query parameters or JSON fields included with the request. Each warning includes a descriptive message and a list of invalid argument strings. For example, a message such as `\"Unknown arguments:\"` or `\"Unknown url query arguments:\"` followed by a list of the form `\"invalid_arg_1, invalid_arg_2.\"` The request succeeds despite the warnings.",
      "type": "array",
      "items": {
        "type": "string"
      }
    }
  }
}

Request

HideShow

Body

{
  "metadata": "Hello, world!",
  "upload": "Hello, world!"
}

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 400

HideShow

Bad Request. User input error (for example, audio not matching the specified format), the session is in the wrong state, or an inactivity timeout occurred. If an existing session is closed, session_closed is set to true.

Schema

{
  "required": [
    "error",
    "code",
    "code_description",
    "session_closed"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    },
    "session_closed": {
      "description": "Specifies the value `true` if the active session is closed as a result of the problem.",
      "type": "boolean"
    }
  }
}

Request

HideShow

Body

{
  "metadata": "Hello, world!",
  "upload": "Hello, world!"
}

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 404

HideShow

Not Found. 'session_id' not found.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Request

HideShow

Body

{
  "metadata": "Hello, world!",
  "upload": "Hello, world!"
}

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 406

HideShow

Not Acceptable.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Request

HideShow

Body

{
  "metadata": "Hello, world!",
  "upload": "Hello, world!"
}

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 408

HideShow

Request Timeout. Session closed due to inactivity (session timeout) for 30 seconds. session_closed is set to true.

Schema

{
  "required": [
    "error",
    "code",
    "code_description",
    "session_closed"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    },
    "session_closed": {
      "description": "Specifies the value `true` if the active session is closed as a result of the problem.",
      "type": "boolean"
    }
  }
}

Request

HideShow

Body

{
  "metadata": "Hello, world!",
  "upload": "Hello, world!"
}

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 413

HideShow

Payload Too Large. Session closed because the input stream is larger than 100 MB. session_closed is set to true.

Schema

{
  "required": [
    "error",
    "code",
    "code_description",
    "session_closed"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    },
    "session_closed": {
      "description": "Specifies the value `true` if the active session is closed as a result of the problem.",
      "type": "boolean"
    }
  }
}

Request

HideShow

Body

{
  "metadata": "Hello, world!",
  "upload": "Hello, world!"
}

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 415

HideShow

Unsupported Media Type.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Request

HideShow

Body

{
  "metadata": "Hello, world!",
  "upload": "Hello, world!"
}

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 500

HideShow

Internal Server Error. The session is destroyed with session_closed set to true. Future requests that use this session return HTTP response code 404.

Schema

{
  "required": [
    "error",
    "code",
    "code_description",
    "session_closed"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    },
    "session_closed": {
      "description": "Specifies the value `true` if the active session is closed as a result of the problem.",
      "type": "boolean"
    }
  }
}

Request

HideShow

Body

{
  "metadata": "Hello, world!",
  "upload": "Hello, world!"
}

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 503

HideShow

Service Unavailable. Session is already processing a request. Concurrent requests are not allowed on the same session. Session remains alive after this error.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

sessionless ¶

Sends audio for speech recognition in sessionless mode
POST`/speech-to-text/api/v1/recognize`

Sends audio and returns transcription results for a sessionless recognition request. Returns only the final results; to enable interim results, use session-based requests or the WebSocket API. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Streaming mode

For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set the Transfer-Encoding header to chunked to use streaming mode. In streaming mode, the server closes the connection (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (status code 400) if no speech is detected for inactivity_timeout seconds of audio (not processing time); use the inactivity_timeout parameter to change the default of 30 seconds.

Non-multipart requests

For non-multipart requests, you specify all parameters of the request as a collection of request headers and query parameters, and you provide the audio as the body of the request. Use the following parameters:

Required: Content-Type and body
Optional: Transfer-Encoding, model, continuous, inactivity_timeout, keywords, keywords_threshold, max_alternatives, word_alternatives_threshold, word_confidence, timestamps, profanity_filter, and smart_formatting

Multipart requests

For multipart requests, you specify a few parameters of the request as request headers and a query parameter, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:

Required: Content-Type, metadata, and multipart
Optional: Transfer-Encoding and model

metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”=-1}"

Example URI

POST /speech-to-text/api/v1/recognize

URI Parameters

model: string (required)
The identifier of the model to be used for the recognition request (use GET /v1/models for a list of available models).
continuous: string (required)
Non-multipart only: If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.
inactivity_timeout: string (required)
Non-multipart only: The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.
keywords: string (required)
Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.
keywords_threshold: string (required)
Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.
max_alternatives: string (required)
Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
word_alternatives_threshold: string (required)
Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.
word_confidence: string (required)
Non-multipart only: If true, confidence measure per word is returned.
timestamps: string (required)
Non-multipart only: If true, time alignment for each word is returned.
profanity_filter: string (required)
Non-multipart only: If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.
smart_formatting: string (required)
Non-multipart only: If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request

HideShow

Body

{
  "metadata": "Hello, world!",
  "upload": "Hello, world!"
}

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 200

HideShow

OK.

Schema

{
  "required": [
    "results",
    "result_index"
  ],
  "properties": {
    "results": {
      "description": "The results array consists of zero or more final results followed by zero or one interim result. The final results are guaranteed not to change; the interim result may be replaced by zero or more final results (followed by zero or one interim result). The service periodically sends updates to the result list, with the `result_index` set to the lowest index in the array that has changed.",
      "type": "array",
      "items": {
        "required": [
          "final",
          "alternatives"
        ],
        "properties": {
          "final": {
            "description": "If `true`, the result for this utterance is not updated further.",
            "type": "boolean"
          },
          "alternatives": {
            "description": "Array of alternative transcripts.",
            "type": "array",
            "items": {
              "required": [
                "transcript"
              ],
              "properties": {
                "transcript": {
                  "description": "Transcription of the audio.",
                  "type": "string"
                },
                "confidence": {
                  "description": "Confidence score of the transcript in the range of 0 to 1. Available only for the best alternative and only in results marked as final.",
                  "type": "number",
                  "format": "double",
                  "minimum": 0,
                  "maximum": 1
                },
                "timestamps": {
                  "description": "Time alignments for each word from transcript as a list of lists. Each inner list consists of three elements: the word followed by its start and end time in hundredths of seconds. Example: `[[\"hello\",0.0,1.2],[\"world\",1.2,2.5]]`. Available only for the best alternative.",
                  "type": "array",
                  "items": {
                    "type": "string"
                  }
                },
                "word_confidence": {
                  "description": "Confidence score for each word of the transcript as a list of lists. Each inner list consists of two elements: the word and its confidence score in the range of 0 to 1. Example: `[[\"hello\",0.95],[\"world\",0.866]]`. Available only for the best alternative and only in results marked as final.",
                  "type": "array",
                  "items": {
                    "type": "string"
                  }
                }
              }
            }
          },
          "keywords_result": {
            "description": "Dictionary (or associative array) whose keys are the strings specified for `keywords` if both that parameter and `keywords_threshold` are specified. A keyword for which no matches are found is omitted from the array.",
            "required": [
              "keyword"
            ],
            "properties": {
              "keyword": {
                "description": "List of each keyword entered via the `keywords` parameter and, for each keyword, an array of `KeywordResult` objects that provides information about its occurrences in the input audio. The keys of the list are the actual keyword strings. A keyword for which no matches are spotted in the input is omitted from the array.",
                "type": "array",
                "items": {
                  "required": [
                    "normalized_text",
                    "start_time",
                    "end_time",
                    "confidence"
                  ],
                  "properties": {
                    "normalized_text": {
                      "description": "Specified keyword normalized to the spoken phrase that matched in the audio input.",
                      "type": "string"
                    },
                    "start_time": {
                      "description": "Start time in hundredths of seconds of the keyword match.",
                      "type": "number",
                      "format": "double"
                    },
                    "end_time": {
                      "description": "End time in hundredths of seconds of the keyword match.",
                      "type": "number",
                      "format": "double"
                    },
                    "confidence": {
                      "description": "Confidence score of the keyword match in the range of 0 to 1.",
                      "type": "number",
                      "format": "double",
                      "minimum": 0,
                      "maximum": 1
                    }
                  }
                }
              }
            }
          },
          "word_alternatives": {
            "description": "Array of word alternative hypotheses found for words of the input audio if `word_alternatives_threshold` is not null.",
            "type": "array",
            "items": {
              "required": [
                "start_time",
                "end_time",
                "alternatives"
              ],
              "properties": {
                "start_time": {
                  "description": "Start time in hundredths of seconds of the word from the input audio that corresponds to the word alternatives.",
                  "type": "number",
                  "format": "double"
                },
                "end_time": {
                  "description": "End time in hundredths of seconds of the word from the input audio that corresponds to the word alternatives.",
                  "type": "number",
                  "format": "double"
                },
                "alternatives": {
                  "description": "Array of word alternative hypotheses for a word from the input audio.",
                  "type": "array",
                  "items": {
                    "required": [
                      "confidence",
                      "word"
                    ],
                    "properties": {
                      "confidence": {
                        "description": "Confidence score of the word alternative hypothesis.",
                        "type": "number",
                        "format": "double",
                        "minimum": 0,
                        "maximum": 1
                      },
                      "word": {
                        "description": "Word alternative hypothesis for a word from the input audio.",
                        "type": "string"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    },
    "result_index": {
      "description": "An index that indicates the change point in the `results` array.",
      "type": "integer",
      "format": "int32"
    },
    "warnings": {
      "description": "An array of warning messages about invalid query parameters or JSON fields included with the request. Each warning includes a descriptive message and a list of invalid argument strings. For example, a message such as `\"Unknown arguments:\"` or `\"Unknown url query arguments:\"` followed by a list of the form `\"invalid_arg_1, invalid_arg_2.\"` The request succeeds despite the warnings.",
      "type": "array",
      "items": {
        "type": "string"
      }
    }
  }
}

Request

HideShow

Body

{
  "metadata": "Hello, world!",
  "upload": "Hello, world!"
}

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 400

HideShow

Bad Request. User input error (for example, audio not matching the specified format) or Inactivity timeout (no speech detected).

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Request

HideShow

Body

{
  "metadata": "Hello, world!",
  "upload": "Hello, world!"
}

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 406

HideShow

Not Acceptable.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Request

HideShow

Body

{
  "metadata": "Hello, world!",
  "upload": "Hello, world!"
}

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 408

HideShow

Request Timeout. Request failed due to inactivity (no audio data sent) for 30 seconds.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Request

HideShow

Body

{
  "metadata": "Hello, world!",
  "upload": "Hello, world!"
}

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 413

HideShow

Payload Too Large. Request failed because the input stream is larger than 100 MB.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Request

HideShow

Body

{
  "metadata": "Hello, world!",
  "upload": "Hello, world!"
}

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 415

HideShow

Unsupported Media Type.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Request

HideShow

Body

{
  "metadata": "Hello, world!",
  "upload": "Hello, world!"
}

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 500

HideShow

Internal Server Error.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Request

HideShow

Body

{
  "metadata": "Hello, world!",
  "upload": "Hello, world!"
}

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 503

HideShow

Service Unavailable.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

asynchronous ¶

Registers a callback URL for use with the asynchronous interface
POST`/speech-to-text/api/v1/register_callback`

Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or white-list, the callback URL. To be registered successfully, the callback URL must respond to a GET request from the service, after which the service responds with response code 201 to the original registration request.

The service sends only a single GET request to the callback URL. If the service does not receive a response with a response code of 200 and a body that echoes a random alphanumeric challenge string from the service within 5 seconds, it does not white-list the URL; it sends response code 400 in response to the registration request. If the requested callback URL is already white-listed, the service responds to the registration request with response code 200.

Once you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.

If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of a random challenge string in its response to the request. It sends the signature in the X-Callback-Signature header of its GET request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.

Note: This method is currently a beta release that supports US English only.

Example URI

POST /speech-to-text/api/v1/register_callback

URI Parameters

callback_url: string (required)
An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature header to verify the origin of the request.
user_secret: string (required)
A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.

Request

HideShow

Schema

{
  "type": "string"
}

Response 200

HideShow

OK. The callback was already registered (white-listed). The status included in the response is already created.

Schema

{
  "required": [
    "status",
    "url"
  ],
  "properties": {
    "status": {
      "description": "The current status of the job: `created` if the callback URL was successfully white-listed as a result of the call or `already created` if the URL was already white-listed.",
      "type": "string"
    },
    "url": {
      "description": "The callback URL that is successfully registered.",
      "type": "string"
    }
  }
}

Request

HideShow

Schema

{
  "type": "string"
}

Response 201

HideShow

Created. The callback was successfully registered (white-listed). The status included in the response is created.

Schema

{
  "required": [
    "status",
    "url"
  ],
  "properties": {
    "status": {
      "description": "The current status of the job: `created` if the callback URL was successfully white-listed as a result of the call or `already created` if the URL was already white-listed.",
      "type": "string"
    },
    "url": {
      "description": "The callback URL that is successfully registered.",
      "type": "string"
    }
  }
}

Request

HideShow

Schema

{
  "type": "string"
}

Response 400

HideShow

Bad Request. The callback registration failed. The request was missing a required parameter or specified an invalid argument; the client sent an invalid response to the service’s GET request during the registration process; or the client failed to respond to the server’s request before the five-second timeout.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Request

HideShow

Schema

{
  "type": "string"
}

Response 503

HideShow

Service Unavailable. The service is currently unavailable.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Checks the status of all asynchronous jobs
GET`/speech-to-text/api/v1/recognitions`

Returns the status and ID of all outstanding jobs associated with the service credentials with which it is called. If a job was created with a callback URL and a user token, the method also returns the user token for the job. To obtain the results for a job whose status is completed, use the GET recognitions/{id} method. A job and its results remain available until you delete them with the DELETE recognitions/{id} method or until the job’s time to live expires, whichever comes first.

Note: This method is currently a beta release that supports US English only.

Example URI

GET /speech-to-text/api/v1/recognitions

Response 200

HideShow

OK.

Schema

{
  "required": [
    "recognitions"
  ],
  "properties": {
    "recognitions": {
      "description": "An array of objects that provides the status for each of the user's current jobs. The array is empty if the user has no current jobs.",
      "type": "array",
      "items": {
        "required": [
          "id",
          "status"
        ],
        "properties": {
          "id": {
            "description": "The ID of the job.",
            "type": "string"
          },
          "status": {
            "description": "The current status of the job: `waiting`: The service is preparing the job for processing; the service always returns this status when the job is initially created. `processing`: The service is actively processing the job. `completed`: The service has finished processing the job; if the job specified a callback URL and the event `recognitions.completed_with_results`, the service sent the results with the callback notification; otherwise, use the `GET recognitions/{id}` method to retrieve the results. `failed`: The job failed.",
            "type": "string"
          },
          "url": {
            "description": "For a `POST /v1/recognitions` request, the URL to use to request information about the job with the `GET recognitions/{id}` method.",
            "type": "string"
          },
          "user_token": {
            "description": "For a `GET /v1/recognitions` request, the user token associated with the job if the job was created with a callback URL and a user token.",
            "type": "string"
          }
        }
      }
    }
  }
}

Response 503

HideShow

Service Unavailable. The service is currently unavailable.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Creates a job for an asynchronous recognition request
POST`/speech-to-text/api/v1/recognitions{?events,user_token,results_ttl,model,continuous,inactivity_timeout,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}`

Creates a job for a new asynchronous recognition request. The job is owned by the user whose service credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:

By callback notification: Include the callback_url query parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events and user_token query parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.
By polling the service: Omit the callback_url, events, and user_token query parameters. You must then use the GET recognitions or GET recognitions/{id} methods to check the status of the job, using the latter to retrieve the results when the job is complete.

The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl parameter to specify how long the results are to remain available after the job is complete. Note that using the HTTPS GET recognitions/{id} method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.

The method supports the same basic parameters as all HTTP REST and WebSocket recognition requests; it does not support interim results or multipart data. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.

Note: This method is currently a beta release that supports US English only.

Example URI

POST /speech-to-text/api/v1/recognitions?events=&user_token=&results_ttl=&model=&continuous=&inactivity_timeout=&keywords=&keywords_threshold=&max_alternatives=&word_alternatives_threshold=&word_confidence=&timestamps=&profanity_filter=&smart_formatting=

URI Parameters

callback_url - A URL to which callback notifications are to be sent. The URL must already be successfully white-listed by using the `POST register_callback` method. Omit the parameter to poll the service for job completion and results. You can include the same callback URL with any number of job creation requests. Use the `user_token` query parameter to specify a unique user: string (required)
specified string with each job to differentiate the callback notifications for the jobs.
events: string (required)
If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are: recognitions.started generates a callback notification when the service begins to process the job. recognitions.completed generates a callback notification when the job is complete; you must use the GET recognitions/{id} method to retrieve the results before they time out or are deleted. recognitions.completed_with_results generates a callback notification when the job is complete; the notification includes the results of the request. recognitions.failed generates a callback notification if the service experiences an error while processing the job. Omit the parameter to subscribe to the default events: recognitions.started, recognitions.completed, and recognitions.failed. The recognitions.completed and recognitions.completed_with_results events are incompatible; you can specify only of the two events. If the job does not include a callback URL, omit the parameter.
user_token: string (required)
If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.
results_ttl: string (required)
The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.
model: string (required)
The identifier of the model to be used for the recognition request. Currently, only en-US-BroadbandModel (the default) is supported.
continuous: string (required)
If true, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.
inactivity_timeout: string (required)
The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1 for infinity. See also the continuous parameter.
keywords: string (required)
Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.
keywords_threshold: string (required)
Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.
max_alternatives: string (required)
Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
word_alternatives_threshold: string (required)
Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.
word_confidence: string (required)
If true, confidence measure per word is returned.
timestamps: string (required)
If true, time alignment for each word is returned.
profanity_filter: string (required)
If true (the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false to return results with no censoring. Applies to US English transcription only.
smart_formatting: string (required)
If true, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false (the default), no formatting is performed. Applies to US English transcription only.

Request

HideShow

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 201

HideShow

Created. The job was successfully created.

Schema

{
  "required": [
    "id",
    "status"
  ],
  "properties": {
    "id": {
      "description": "The ID of the job.",
      "type": "string"
    },
    "status": {
      "description": "The current status of the job: `waiting`: The service is preparing the job for processing; the service always returns this status when the job is initially created. `processing`: The service is actively processing the job. `completed`: The service has finished processing the job; if the job specified a callback URL and the event `recognitions.completed_with_results`, the service sent the results with the callback notification; otherwise, use the `GET recognitions/{id}` method to retrieve the results. `failed`: The job failed.",
      "type": "string"
    },
    "url": {
      "description": "For a `POST /v1/recognitions` request, the URL to use to request information about the job with the `GET recognitions/{id}` method.",
      "type": "string"
    },
    "user_token": {
      "description": "For a `GET /v1/recognitions` request, the user token associated with the job if the job was created with a callback URL and a user token.",
      "type": "string"
    }
  }
}

Request

HideShow

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 400

HideShow

Bad Request. The request specified an invalid argument. For example, the request specified a callback URL that has not been white-listed, the events or user_token parameter without also specifying a callback URL, or both the recognitions.completed and recognitions.completed_with_results events.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Request

HideShow

Schema

{
  "type": "array",
  "items": {
    "type": "string",
    "format": "byte"
  }
}

Response 503

HideShow

Service Unavailable. The service is currently unavailable.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Deletes the specified asynchronous job
DELETE`/speech-to-text/api/v1/recognitions/{id}`

Deletes the specified job regardless of its current state. If you delete an active job, the service cancels the job without producing results. Once you delete a job, its results are no longer available. The service automatically deletes a job and its results when the time to live for the results expires. You must submit the request with the service credentials of the user who created the job.

Note: This method is currently a beta release that supports US English only.

Example URI

DELETE /speech-to-text/api/v1/recognitions/id

URI Parameters

id: string (required)
The ID of the job that is to be deleted.

Response 204

HideShow

No Content. The job was successfully deleted.

Response 404

HideShow

Not Found. The specified job ID was not found.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Response 503

HideShow

Service Unavailable. The service is currently unavailable.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Checks the status of the specified asynchronous job
GET`/speech-to-text/api/v1/recognitions/{id}`

Returns information about the specified job. The response always includes the status of the job. If the status is completed, the response includes the results of the recognition request; otherwise, the response includes the job ID. You must submit the request with the service credentials of the user who created the job.

You can use the method to retrieve the results of any job, regardless of whether it was submitted with a callback URL and the recognitions.completed_with_results event, and you can retrieve the results multiple times for as long as they remain available.

Note: This method is currently a beta release that supports US English only.

Example URI

GET /speech-to-text/api/v1/recognitions/id

URI Parameters

id: string (required)
The ID of the job whose status is to be checked.

Response 200

HideShow

OK.

Schema

{
  "required": [
    "status"
  ],
  "properties": {
    "status": {
      "description": "The current status of the job: `waiting`: The service is preparing the job for processing; the service also returns this status when the job is initially created. `processing`: The service is actively processing the job. `completed`: The service has finished processing the job; if the job specified a callback URL and the event `recognitions.completed_with_results`, the service sent the results with the callback notification; otherwise, use the `GET recognitions/{id}` method to retrieve the results. `failed`: The job failed.",
      "type": "string"
    },
    "id": {
      "description": "If the status is not `completed`, the ID of the job.",
      "type": "string"
    },
    "results": {
      "description": "If the status is `completed`, the results of the recognition request as an array that includes a single instance of a `SpeechRecognitionEvent` object.",
      "type": "array",
      "items": {
        "required": [
          "results",
          "result_index"
        ],
        "properties": {
          "results": {
            "description": "The results array consists of zero or more final results followed by zero or one interim result. The final results are guaranteed not to change; the interim result may be replaced by zero or more final results (followed by zero or one interim result). The service periodically sends updates to the result list, with the `result_index` set to the lowest index in the array that has changed.",
            "type": "array",
            "items": {
              "required": [
                "final",
                "alternatives"
              ],
              "properties": {
                "final": {
                  "description": "If `true`, the result for this utterance is not updated further.",
                  "type": "boolean"
                },
                "alternatives": {
                  "description": "Array of alternative transcripts.",
                  "type": "array",
                  "items": {
                    "required": [
                      "transcript"
                    ],
                    "properties": {
                      "transcript": {
                        "description": "Transcription of the audio.",
                        "type": "string"
                      },
                      "confidence": {
                        "description": "Confidence score of the transcript in the range of 0 to 1. Available only for the best alternative and only in results marked as final.",
                        "type": "number",
                        "format": "double",
                        "minimum": 0,
                        "maximum": 1
                      },
                      "timestamps": {
                        "description": "Time alignments for each word from transcript as a list of lists. Each inner list consists of three elements: the word followed by its start and end time in hundredths of seconds. Example: `[[\"hello\",0.0,1.2],[\"world\",1.2,2.5]]`. Available only for the best alternative.",
                        "type": "array",
                        "items": {
                          "type": "string"
                        }
                      },
                      "word_confidence": {
                        "description": "Confidence score for each word of the transcript as a list of lists. Each inner list consists of two elements: the word and its confidence score in the range of 0 to 1. Example: `[[\"hello\",0.95],[\"world\",0.866]]`. Available only for the best alternative and only in results marked as final.",
                        "type": "array",
                        "items": {
                          "type": "string"
                        }
                      }
                    }
                  }
                },
                "keywords_result": {
                  "description": "Dictionary (or associative array) whose keys are the strings specified for `keywords` if both that parameter and `keywords_threshold` are specified. A keyword for which no matches are found is omitted from the array.",
                  "required": [
                    "keyword"
                  ],
                  "properties": {
                    "keyword": {
                      "description": "List of each keyword entered via the `keywords` parameter and, for each keyword, an array of `KeywordResult` objects that provides information about its occurrences in the input audio. The keys of the list are the actual keyword strings. A keyword for which no matches are spotted in the input is omitted from the array.",
                      "type": "array",
                      "items": {
                        "required": [
                          "normalized_text",
                          "start_time",
                          "end_time",
                          "confidence"
                        ],
                        "properties": {
                          "normalized_text": {
                            "description": "Specified keyword normalized to the spoken phrase that matched in the audio input.",
                            "type": "string"
                          },
                          "start_time": {
                            "description": "Start time in hundredths of seconds of the keyword match.",
                            "type": "number",
                            "format": "double"
                          },
                          "end_time": {
                            "description": "End time in hundredths of seconds of the keyword match.",
                            "type": "number",
                            "format": "double"
                          },
                          "confidence": {
                            "description": "Confidence score of the keyword match in the range of 0 to 1.",
                            "type": "number",
                            "format": "double",
                            "minimum": 0,
                            "maximum": 1
                          }
                        }
                      }
                    }
                  }
                },
                "word_alternatives": {
                  "description": "Array of word alternative hypotheses found for words of the input audio if `word_alternatives_threshold` is not null.",
                  "type": "array",
                  "items": {
                    "required": [
                      "start_time",
                      "end_time",
                      "alternatives"
                    ],
                    "properties": {
                      "start_time": {
                        "description": "Start time in hundredths of seconds of the word from the input audio that corresponds to the word alternatives.",
                        "type": "number",
                        "format": "double"
                      },
                      "end_time": {
                        "description": "End time in hundredths of seconds of the word from the input audio that corresponds to the word alternatives.",
                        "type": "number",
                        "format": "double"
                      },
                      "alternatives": {
                        "description": "Array of word alternative hypotheses for a word from the input audio.",
                        "type": "array",
                        "items": {
                          "required": [
                            "confidence",
                            "word"
                          ],
                          "properties": {
                            "confidence": {
                              "description": "Confidence score of the word alternative hypothesis.",
                              "type": "number",
                              "format": "double",
                              "minimum": 0,
                              "maximum": 1
                            },
                            "word": {
                              "description": "Word alternative hypothesis for a word from the input audio.",
                              "type": "string"
                            }
                          }
                        }
                      }
                    }
                  }
                }
              }
            }
          },
          "result_index": {
            "description": "An index that indicates the change point in the `results` array.",
            "type": "integer",
            "format": "int32"
          },
          "warnings": {
            "description": "An array of warning messages about invalid query parameters or JSON fields included with the request. Each warning includes a descriptive message and a list of invalid argument strings. For example, a message such as `\"Unknown arguments:\"` or `\"Unknown url query arguments:\"` followed by a list of the form `\"invalid_arg_1, invalid_arg_2.\"` The request succeeds despite the warnings.",
            "type": "array",
            "items": {
              "type": "string"
            }
          }
        }
      }
    }
  }
}

Response 404

HideShow

Not Found. The specified job ID was not found.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Response 503

HideShow

Service Unavailable. The service is currently unavailable.

Schema

{
  "required": [
    "error",
    "code",
    "code_description"
  ],
  "properties": {
    "error": {
      "description": "Description of the problem.",
      "type": "string"
    },
    "code": {
      "description": "HTTP response code.",
      "type": "integer",
      "format": "int32"
    },
    "code_description": {
      "description": "Response message.",
      "type": "string"
    }
  }
}

Service Overview ¶

API Overview ¶

API Usage ¶

models ¶

Retrieves the models available for the serviceGET/speech-to-text/api/v1/models

Example URI

Schema

Schema

Schema

Retrieves information about the modelGET/speech-to-text/api/v1/models/{model_id}

Example URI

Schema

Schema

Schema

Schema

sessions ¶

Creates a sessionPOST/speech-to-text/api/v1/sessions

Example URI

Schema

Schema

Schema

Schema

Schema

Schema

Schema

Schema

Deletes the specified sessionDELETE/speech-to-text/api/v1/sessions/{session_id}

Example URI

Schema

Schema

Schema

Observes results for a recognition task within a sessionGET/speech-to-text/api/v1/sessions/{session_id}/observe_result

Example URI

Schema

Schema

Schema

Schema

Schema

Schema

Schema

Schema

Checks whether a session is ready to accept a new recognition taskGET/speech-to-text/api/v1/sessions/{session_id}/recognize

Example URI

Schema

Schema

Schema

Schema

Sends audio for speech recognition within a sessionPOST/speech-to-text/api/v1/sessions/{session_id}/recognize{?sequence_id,continuous,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}

Streaming mode ¶

Non-multipart requests ¶

Multipart requests ¶

Example URI

Body

Schema

Schema

Body

Schema

Schema

Body

Schema

Schema

Body

Schema

Schema

Body

Schema

Schema

Body

Schema

Schema

Body

Schema

Schema

Body

Schema

Schema

Body

Schema

Schema

sessionless ¶

Service Overview

API Overview

API Usage

Retrieves the models available for the service
GET`/speech-to-text/api/v1/models`

Retrieves information about the model
GET`/speech-to-text/api/v1/models/{model_id}`

Creates a session
POST`/speech-to-text/api/v1/sessions`

Deletes the specified session
DELETE`/speech-to-text/api/v1/sessions/{session_id}`

Observes results for a recognition task within a session
GET`/speech-to-text/api/v1/sessions/{session_id}/observe_result`

Checks whether a session is ready to accept a new recognition task
GET`/speech-to-text/api/v1/sessions/{session_id}/recognize`

Sends audio for speech recognition within a session
POST`/speech-to-text/api/v1/sessions/{session_id}/recognize{?sequence_id,continuous,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}`

Streaming mode

Non-multipart requests

Multipart requests

Sends audio for speech recognition in sessionless mode
POST`/speech-to-text/api/v1/recognize`

Streaming mode

Non-multipart requests

Multipart requests

Registers a callback URL for use with the asynchronous interface
POST`/speech-to-text/api/v1/register_callback`

Checks the status of all asynchronous jobs
GET`/speech-to-text/api/v1/recognitions`

Creates a job for an asynchronous recognition request
POST`/speech-to-text/api/v1/recognitions{?events,user_token,results_ttl,model,continuous,inactivity_timeout,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}`

Deletes the specified asynchronous job
DELETE`/speech-to-text/api/v1/recognitions/{id}`

Checks the status of the specified asynchronous job
GET`/speech-to-text/api/v1/recognitions/{id}`