/speech-to-text/api/v1/models
Returns a list of all models available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.
200
OK.
{
"description": "Information about the available models.",
"required": [
"models"
],
"properties": {
"models": {
"type": "array",
"items": {
"required": [
"name",
"rate",
"url",
"language",
"description"
],
"properties": {
"name": {
"description": "Name of the model for use as an identifier in calls to the service (for example, `en-US_BroadbandModel`).",
"type": "string"
},
"language": {
"description": "Language identifier for the model (for example, `en-US`).",
"type": "string"
},
"rate": {
"description": "Sampling rate (minimum acceptable rate for audio) used by the model in Hertz.",
"type": "integer",
"format": "int32"
},
"url": {
"description": "URI for the model.",
"type": "string"
},
"sessions": {
"description": "URI for the model for use with the POST `/v1/sessions` method.",
"type": "string"
},
"description": {
"description": "Brief description of the model.",
"type": "string"
}
}
}
}
}
}
406
Not Acceptable.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
415
Unsupported Media Type.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
/speech-to-text/api/v1/models/{model_id}
Returns information about a single specified model that is available for use with the service. The information includes the name of the model, whether it pertains to broadband or narrowband audio, and its minimum sampling rate in Hertz, among other things.
string
(required) The identifier of the desired model in the form of its name
from the output of GET /v1/models
.
200
OK.
{
"required": [
"name",
"rate",
"url",
"language",
"description"
],
"properties": {
"name": {
"description": "Name of the model for use as an identifier in calls to the service (for example, `en-US_BroadbandModel`).",
"type": "string"
},
"language": {
"description": "Language identifier for the model (for example, `en-US`).",
"type": "string"
},
"rate": {
"description": "Sampling rate (minimum acceptable rate for audio) used by the model in Hertz.",
"type": "integer",
"format": "int32"
},
"url": {
"description": "URI for the model.",
"type": "string"
},
"sessions": {
"description": "URI for the model for use with the POST `/v1/sessions` method.",
"type": "string"
},
"description": {
"description": "Brief description of the model.",
"type": "string"
}
}
}
404
Not Found. Model not found
.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
406
Not Acceptable.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
415
Unsupported Media Type.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
/speech-to-text/api/v1/sessions
Creates a session and locks recognition requests to that engine. You can use the session for multiple recognition requests so that each request is processed with the same Speech to Text engine. Use the cookie that is returned from this operation in the set-cookie
header for each request that uses this session.
The session expires after 30 seconds of inactivity. Use a GET request on the session_id
to prevent the session from expiring.
string
(required) The identifier of the model to be used by the new session (use GET /v1/models
or GET /v1/models/{model_id}
for information about available models).
{
"type": "string"
}
201
Created.
{
"required": [
"session_id",
"new_session_uri",
"recognize",
"observe_result",
"recognizeWS"
],
"properties": {
"session_id": {
"description": "Identifier for the new session.",
"type": "string"
},
"new_session_uri": {
"description": "URI for the new session.",
"type": "string"
},
"recognize": {
"description": "URI for REST recognition requests.",
"type": "string"
},
"observe_result": {
"description": "URI for REST results observers.",
"type": "string"
},
"recognizeWS": {
"description": "URI for WebSocket recognition requests. Needed only for working with the WebSocket interface.",
"type": "string"
}
}
}
{
"type": "string"
}
406
Not Acceptable.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
{
"type": "string"
}
415
Unsupported Media Type.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
{
"type": "string"
}
503
Service Unavailable.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
/speech-to-text/api/v1/sessions/{session_id}
Deletes an existing session and its engine. You cannot send requests to a session after it is deleted.
string
(required) The ID of the session to be deleted.
204
No Content.
400
Bad Request. Cookie must be set
.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
404
Not Found. 'session_id' not found
.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
406
Not Acceptable.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
/speech-to-text/api/v1/sessions/{session_id}/observe_result
Requests results for a recognition task within the specified session. You can submit multiple requests for the same recognition task. To see interim results, set the query parameter interim_results=true
.
Specify a sequence ID (with the sequence_id
query parameter) that matches the sequence ID of a recognition request to see results for that recognition task. A request with a sequence ID can arrive before, during, or after the matching recognition request, but it must arrive no later than 30 seconds after the recognition completes to avoid a session timeout (status code 408). Send multiple requests for the sequence ID with a maximum gap of 30 seconds to avoid the timeout. Omit the sequence ID to observe results for an ongoing recognition task; if no recognition is ongoing, the method returns results for the next recognition task regardless of whether it specifies a sequence ID.
string
(required) The ID of the session whose results you want to observe.
string
(required) The sequence ID of the recognition task whose results you want to observe. Omit the parameter to obtain results either for an ongoing recognition, if any, or for the next recognition task regardless of whether it specifies a sequence ID.
string
(required) If true
, interim results are returned as a stream of JSON objects; each object represents a single SpeechRecognitionEvent
. If false
, the response is a single SpeechRecognitionEvent
with final results only.
200
OK.
{
"required": [
"results",
"result_index"
],
"properties": {
"results": {
"description": "The results array consists of zero or more final results followed by zero or one interim result. The final results are guaranteed not to change; the interim result may be replaced by zero or more final results (followed by zero or one interim result). The service periodically sends updates to the result list, with the `result_index` set to the lowest index in the array that has changed.",
"type": "array",
"items": {
"required": [
"final",
"alternatives"
],
"properties": {
"final": {
"description": "If `true`, the result for this utterance is not updated further.",
"type": "boolean"
},
"alternatives": {
"description": "Array of alternative transcripts.",
"type": "array",
"items": {
"required": [
"transcript"
],
"properties": {
"transcript": {
"description": "Transcription of the audio.",
"type": "string"
},
"confidence": {
"description": "Confidence score of the transcript in the range of 0 to 1. Available only for the best alternative and only in results marked as final.",
"type": "number",
"format": "double",
"minimum": 0,
"maximum": 1
},
"timestamps": {
"description": "Time alignments for each word from transcript as a list of lists. Each inner list consists of three elements: the word followed by its start and end time in hundredths of seconds. Example: `[[\"hello\",0.0,1.2],[\"world\",1.2,2.5]]`. Available only for the best alternative.",
"type": "array",
"items": {
"type": "string"
}
},
"word_confidence": {
"description": "Confidence score for each word of the transcript as a list of lists. Each inner list consists of two elements: the word and its confidence score in the range of 0 to 1. Example: `[[\"hello\",0.95],[\"world\",0.866]]`. Available only for the best alternative and only in results marked as final.",
"type": "array",
"items": {
"type": "string"
}
}
}
}
},
"keywords_result": {
"description": "Dictionary (or associative array) whose keys are the strings specified for `keywords` if both that parameter and `keywords_threshold` are specified. A keyword for which no matches are found is omitted from the array.",
"required": [
"keyword"
],
"properties": {
"keyword": {
"description": "List of each keyword entered via the `keywords` parameter and, for each keyword, an array of `KeywordResult` objects that provides information about its occurrences in the input audio. The keys of the list are the actual keyword strings. A keyword for which no matches are spotted in the input is omitted from the array.",
"type": "array",
"items": {
"required": [
"normalized_text",
"start_time",
"end_time",
"confidence"
],
"properties": {
"normalized_text": {
"description": "Specified keyword normalized to the spoken phrase that matched in the audio input.",
"type": "string"
},
"start_time": {
"description": "Start time in hundredths of seconds of the keyword match.",
"type": "number",
"format": "double"
},
"end_time": {
"description": "End time in hundredths of seconds of the keyword match.",
"type": "number",
"format": "double"
},
"confidence": {
"description": "Confidence score of the keyword match in the range of 0 to 1.",
"type": "number",
"format": "double",
"minimum": 0,
"maximum": 1
}
}
}
}
}
},
"word_alternatives": {
"description": "Array of word alternative hypotheses found for words of the input audio if `word_alternatives_threshold` is not null.",
"type": "array",
"items": {
"required": [
"start_time",
"end_time",
"alternatives"
],
"properties": {
"start_time": {
"description": "Start time in hundredths of seconds of the word from the input audio that corresponds to the word alternatives.",
"type": "number",
"format": "double"
},
"end_time": {
"description": "End time in hundredths of seconds of the word from the input audio that corresponds to the word alternatives.",
"type": "number",
"format": "double"
},
"alternatives": {
"description": "Array of word alternative hypotheses for a word from the input audio.",
"type": "array",
"items": {
"required": [
"confidence",
"word"
],
"properties": {
"confidence": {
"description": "Confidence score of the word alternative hypothesis.",
"type": "number",
"format": "double",
"minimum": 0,
"maximum": 1
},
"word": {
"description": "Word alternative hypothesis for a word from the input audio.",
"type": "string"
}
}
}
}
}
}
}
}
}
},
"result_index": {
"description": "An index that indicates the change point in the `results` array.",
"type": "integer",
"format": "int32"
},
"warnings": {
"description": "An array of warning messages about invalid query parameters or JSON fields included with the request. Each warning includes a descriptive message and a list of invalid argument strings. For example, a message such as `\"Unknown arguments:\"` or `\"Unknown url query arguments:\"` followed by a list of the form `\"invalid_arg_1, invalid_arg_2.\"` The request succeeds despite the warnings.",
"type": "array",
"items": {
"type": "string"
}
}
}
}
400
Bad Request. User input error (for example, audio not matching the specified format) or an inactivity timeout occurred. If an existing session is closed, session_closed
is set to true
.
{
"required": [
"error",
"code",
"code_description",
"session_closed"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
},
"session_closed": {
"description": "Specifies the value `true` if the active session is closed as a result of the problem.",
"type": "boolean"
}
}
}
404
Not Found. The 'session_id' was not found
or A specified sequence_id does not match the sequence ID of the recognition task
.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
406
Not Acceptable.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
408
Request Timeout. Session closed due to inactivity
(session timeout) for 30 seconds. session_closed
is set to true
.
{
"required": [
"error",
"code",
"code_description",
"session_closed"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
},
"session_closed": {
"description": "Specifies the value `true` if the active session is closed as a result of the problem.",
"type": "boolean"
}
}
}
413
Payload Too Large. Session closed because the input stream is larger than currently supported data limit
. session_closed
is set to true
.
{
"required": [
"error",
"code",
"code_description",
"session_closed"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
},
"session_closed": {
"description": "Specifies the value `true` if the active session is closed as a result of the problem.",
"type": "boolean"
}
}
}
415
Unsupported Media Type.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
500
Internal server error. The session is destroyed with session_closed
set to true
. Future requests that use this session return HTTP response code 404.
{
"required": [
"error",
"code",
"code_description",
"session_closed"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
},
"session_closed": {
"description": "Specifies the value `true` if the active session is closed as a result of the problem.",
"type": "boolean"
}
}
}
/speech-to-text/api/v1/sessions/{session_id}/recognize
Provides a way to check whether the specified session can accept another recognition request. Concurrent recognition tasks during the same session are not allowed. The returned state must be initialized
to indicate that you can send another recognition request with the POST recognize
method.
string
(required) The ID of the session for the recognition task.
200
OK.
{
"required": [
"session"
],
"properties": {
"session": {
"description": "Description of the state and possible actions for the current session.",
"required": [
"state",
"model",
"recognize",
"observe_result",
"recognizeWS"
],
"properties": {
"state": {
"description": "State of the session. The state must be `initialized` to perform a new recognition request on the session.",
"type": "string"
},
"model": {
"description": "URI for information about the model that is used with the session.",
"type": "string"
},
"recognize": {
"description": "URI for REST recognition requests.",
"type": "string"
},
"observe_result": {
"description": "URI for REST results observers.",
"type": "string"
},
"recognizeWS": {
"description": "URI for WebSocket recognition requests. Needed only for working with the WebSocket interface.",
"type": "string"
}
}
}
}
}
404
Not Found. 'session_id' not found
.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
406
Not Acceptable.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
415
Unsupported Media Type.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
/speech-to-text/api/v1/sessions/{session_id}/recognize{?sequence_id,continuous,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}
Sends audio and returns transcription results for a session-based recognition request. By default, returns only the final results; to see interim results, set the query parameter interim_results=true
in a GET
request to the observe_result
method before this POST
request finishes. To enable polling by the observe_result
method for large audio requests, specify an integer with the sequence_id
query parameter for non-multipart requests or with the sequence_id
parameter of the JSON metadata for multipart requests. The service imposes a data size limit of 100 MB per session. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.
For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set Transfer-Encoding
to chunked
to use streaming mode. In streaming mode, the server closes the session (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the session (status code 400) if no speech is detected for inactivity_timeout
seconds of audio (not processing time); use the inactivity_timeout
parameter to change the default of 30 seconds.
For non-multipart requests, you specify all parameters of the request as a path parameter, request headers, and query parameters. You provide the audio as the body of the request. Use the following parameters:
Required: session_id
, Content-Type
, and body
Optional: Transfer-Encoding
, sequence_id
, continuous
, inactivity_timeout
, keywords
, keywords_threshold
, max_alternatives
, word_alternatives_threshold
, word_confidence
, timestamps
, profanity_filter
, and smart_formatting
For multipart requests, you specify a few parameters of the request via a path parameter and as request headers, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type
is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:
Required: session_id
, Content-Type
, metadata
, and multipart
Optional: Transfer-Encoding
An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).
metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”:-1}"
string
(required) The ID of the session for the recognition task.
string
(required) Non-multipart only: Sequence ID of this recognition task in the form of a user-specified integer. If omitted, no sequence ID is associated with the recognition task.
string
(required) Non-multipart only: If true
, multiple final results representing consecutive phrases separated by long pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.
string
(required) 1for infinity. See also the
continuous` parameter.
string
(required) Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.
string
(required) Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.
string
(required) Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
string
(required) Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.
string
(required) Non-multipart only: If true
, confidence measure per word is returned.
string
(required) Non-multipart only: If true
, time alignment for each word is returned.
string
(required) Non-multipart only: If true
(the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false
to return results with no censoring. Applies to US English transcription only.
string
(required) Non-multipart only: If true
, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false
(the default), no formatting is performed. Applies to US English transcription only.
{
"metadata": "Hello, world!",
"upload": "Hello, world!"
}
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
200
OK.
{
"required": [
"results",
"result_index"
],
"properties": {
"results": {
"description": "The results array consists of zero or more final results followed by zero or one interim result. The final results are guaranteed not to change; the interim result may be replaced by zero or more final results (followed by zero or one interim result). The service periodically sends updates to the result list, with the `result_index` set to the lowest index in the array that has changed.",
"type": "array",
"items": {
"required": [
"final",
"alternatives"
],
"properties": {
"final": {
"description": "If `true`, the result for this utterance is not updated further.",
"type": "boolean"
},
"alternatives": {
"description": "Array of alternative transcripts.",
"type": "array",
"items": {
"required": [
"transcript"
],
"properties": {
"transcript": {
"description": "Transcription of the audio.",
"type": "string"
},
"confidence": {
"description": "Confidence score of the transcript in the range of 0 to 1. Available only for the best alternative and only in results marked as final.",
"type": "number",
"format": "double",
"minimum": 0,
"maximum": 1
},
"timestamps": {
"description": "Time alignments for each word from transcript as a list of lists. Each inner list consists of three elements: the word followed by its start and end time in hundredths of seconds. Example: `[[\"hello\",0.0,1.2],[\"world\",1.2,2.5]]`. Available only for the best alternative.",
"type": "array",
"items": {
"type": "string"
}
},
"word_confidence": {
"description": "Confidence score for each word of the transcript as a list of lists. Each inner list consists of two elements: the word and its confidence score in the range of 0 to 1. Example: `[[\"hello\",0.95],[\"world\",0.866]]`. Available only for the best alternative and only in results marked as final.",
"type": "array",
"items": {
"type": "string"
}
}
}
}
},
"keywords_result": {
"description": "Dictionary (or associative array) whose keys are the strings specified for `keywords` if both that parameter and `keywords_threshold` are specified. A keyword for which no matches are found is omitted from the array.",
"required": [
"keyword"
],
"properties": {
"keyword": {
"description": "List of each keyword entered via the `keywords` parameter and, for each keyword, an array of `KeywordResult` objects that provides information about its occurrences in the input audio. The keys of the list are the actual keyword strings. A keyword for which no matches are spotted in the input is omitted from the array.",
"type": "array",
"items": {
"required": [
"normalized_text",
"start_time",
"end_time",
"confidence"
],
"properties": {
"normalized_text": {
"description": "Specified keyword normalized to the spoken phrase that matched in the audio input.",
"type": "string"
},
"start_time": {
"description": "Start time in hundredths of seconds of the keyword match.",
"type": "number",
"format": "double"
},
"end_time": {
"description": "End time in hundredths of seconds of the keyword match.",
"type": "number",
"format": "double"
},
"confidence": {
"description": "Confidence score of the keyword match in the range of 0 to 1.",
"type": "number",
"format": "double",
"minimum": 0,
"maximum": 1
}
}
}
}
}
},
"word_alternatives": {
"description": "Array of word alternative hypotheses found for words of the input audio if `word_alternatives_threshold` is not null.",
"type": "array",
"items": {
"required": [
"start_time",
"end_time",
"alternatives"
],
"properties": {
"start_time": {
"description": "Start time in hundredths of seconds of the word from the input audio that corresponds to the word alternatives.",
"type": "number",
"format": "double"
},
"end_time": {
"description": "End time in hundredths of seconds of the word from the input audio that corresponds to the word alternatives.",
"type": "number",
"format": "double"
},
"alternatives": {
"description": "Array of word alternative hypotheses for a word from the input audio.",
"type": "array",
"items": {
"required": [
"confidence",
"word"
],
"properties": {
"confidence": {
"description": "Confidence score of the word alternative hypothesis.",
"type": "number",
"format": "double",
"minimum": 0,
"maximum": 1
},
"word": {
"description": "Word alternative hypothesis for a word from the input audio.",
"type": "string"
}
}
}
}
}
}
}
}
}
},
"result_index": {
"description": "An index that indicates the change point in the `results` array.",
"type": "integer",
"format": "int32"
},
"warnings": {
"description": "An array of warning messages about invalid query parameters or JSON fields included with the request. Each warning includes a descriptive message and a list of invalid argument strings. For example, a message such as `\"Unknown arguments:\"` or `\"Unknown url query arguments:\"` followed by a list of the form `\"invalid_arg_1, invalid_arg_2.\"` The request succeeds despite the warnings.",
"type": "array",
"items": {
"type": "string"
}
}
}
}
{
"metadata": "Hello, world!",
"upload": "Hello, world!"
}
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
400
Bad Request. User input error (for example, audio not matching the specified format), the session is in the wrong state, or an inactivity timeout occurred. If an existing session is closed, session_closed
is set to true
.
{
"required": [
"error",
"code",
"code_description",
"session_closed"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
},
"session_closed": {
"description": "Specifies the value `true` if the active session is closed as a result of the problem.",
"type": "boolean"
}
}
}
{
"metadata": "Hello, world!",
"upload": "Hello, world!"
}
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
404
Not Found. 'session_id' not found
.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
{
"metadata": "Hello, world!",
"upload": "Hello, world!"
}
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
406
Not Acceptable.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
{
"metadata": "Hello, world!",
"upload": "Hello, world!"
}
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
408
Request Timeout. Session closed due to inactivity
(session timeout) for 30 seconds. session_closed
is set to true
.
{
"required": [
"error",
"code",
"code_description",
"session_closed"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
},
"session_closed": {
"description": "Specifies the value `true` if the active session is closed as a result of the problem.",
"type": "boolean"
}
}
}
{
"metadata": "Hello, world!",
"upload": "Hello, world!"
}
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
413
Payload Too Large. Session closed because the input stream is larger than 100 MB
. session_closed
is set to true
.
{
"required": [
"error",
"code",
"code_description",
"session_closed"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
},
"session_closed": {
"description": "Specifies the value `true` if the active session is closed as a result of the problem.",
"type": "boolean"
}
}
}
{
"metadata": "Hello, world!",
"upload": "Hello, world!"
}
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
415
Unsupported Media Type.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
{
"metadata": "Hello, world!",
"upload": "Hello, world!"
}
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
500
Internal Server Error. The session is destroyed with session_closed
set to true
. Future requests that use this session return HTTP response code 404.
{
"required": [
"error",
"code",
"code_description",
"session_closed"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
},
"session_closed": {
"description": "Specifies the value `true` if the active session is closed as a result of the problem.",
"type": "boolean"
}
}
}
{
"metadata": "Hello, world!",
"upload": "Hello, world!"
}
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
503
Service Unavailable. Session is already processing a request
. Concurrent requests are not allowed on the same session. Session remains alive after this error.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
/speech-to-text/api/v1/recognize
Sends audio and returns transcription results for a sessionless recognition request. Returns only the final results; to enable interim results, use session-based requests or the WebSocket API. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.
For requests to transcribe audio with more than one audio file (multipart requests) or to transcribe live audio as it becomes available, you must set the Transfer-Encoding
header to chunked
to use streaming mode. In streaming mode, the server closes the connection (status code 408) if the service receives no data chunk for 30 seconds and the service has no audio to transcribe for 30 seconds. The server also closes the connection (status code 400) if no speech is detected for inactivity_timeout
seconds of audio (not processing time); use the inactivity_timeout
parameter to change the default of 30 seconds.
For non-multipart requests, you specify all parameters of the request as a collection of request headers and query parameters, and you provide the audio as the body of the request. Use the following parameters:
Required: Content-Type
and body
Optional: Transfer-Encoding
, model
, continuous
, inactivity_timeout
, keywords
, keywords_threshold
, max_alternatives
, word_alternatives_threshold
, word_confidence
, timestamps
, profanity_filter
, and smart_formatting
For multipart requests, you specify a few parameters of the request as request headers and a query parameter, but you specify most parameters as multipart form data in the form of JSON metadata, in which only part_content_type
is required. You then specify the audio files for the request as subsequent parts of the form data. Use the following parameters:
Required: Content-Type
, metadata
, and multipart
Optional: Transfer-Encoding
and model
An example of the multipart metadata for the first part of a series of FLAC files follows. This first part of the request is sent as JSON. The remaining parts are one or more audio files (the example sends only a single audio file).
metadata="{“part_content_type”:“audio/flac”,“data_parts_count”:1,“continuous”:true,“inactivity_timeout”=-1}"
string
(required) The identifier of the model to be used for the recognition request (use GET /v1/models
for a list of available models).
string
(required) Non-multipart only: If true
, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.
string
(required) Non-multipart only: The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1
for infinity. See also the continuous
parameter.
string
(required) Non-multipart only: Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.
string
(required) Non-multipart only: Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.
string
(required) Non-multipart only: Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
string
(required) Non-multipart only: Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.
string
(required) Non-multipart only: If true
, confidence measure per word is returned.
string
(required) Non-multipart only: If true
, time alignment for each word is returned.
string
(required) Non-multipart only: If true
(the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false
to return results with no censoring. Applies to US English transcription only.
string
(required) Non-multipart only: If true
, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false
(the default), no formatting is performed. Applies to US English transcription only.
{
"metadata": "Hello, world!",
"upload": "Hello, world!"
}
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
200
OK.
{
"required": [
"results",
"result_index"
],
"properties": {
"results": {
"description": "The results array consists of zero or more final results followed by zero or one interim result. The final results are guaranteed not to change; the interim result may be replaced by zero or more final results (followed by zero or one interim result). The service periodically sends updates to the result list, with the `result_index` set to the lowest index in the array that has changed.",
"type": "array",
"items": {
"required": [
"final",
"alternatives"
],
"properties": {
"final": {
"description": "If `true`, the result for this utterance is not updated further.",
"type": "boolean"
},
"alternatives": {
"description": "Array of alternative transcripts.",
"type": "array",
"items": {
"required": [
"transcript"
],
"properties": {
"transcript": {
"description": "Transcription of the audio.",
"type": "string"
},
"confidence": {
"description": "Confidence score of the transcript in the range of 0 to 1. Available only for the best alternative and only in results marked as final.",
"type": "number",
"format": "double",
"minimum": 0,
"maximum": 1
},
"timestamps": {
"description": "Time alignments for each word from transcript as a list of lists. Each inner list consists of three elements: the word followed by its start and end time in hundredths of seconds. Example: `[[\"hello\",0.0,1.2],[\"world\",1.2,2.5]]`. Available only for the best alternative.",
"type": "array",
"items": {
"type": "string"
}
},
"word_confidence": {
"description": "Confidence score for each word of the transcript as a list of lists. Each inner list consists of two elements: the word and its confidence score in the range of 0 to 1. Example: `[[\"hello\",0.95],[\"world\",0.866]]`. Available only for the best alternative and only in results marked as final.",
"type": "array",
"items": {
"type": "string"
}
}
}
}
},
"keywords_result": {
"description": "Dictionary (or associative array) whose keys are the strings specified for `keywords` if both that parameter and `keywords_threshold` are specified. A keyword for which no matches are found is omitted from the array.",
"required": [
"keyword"
],
"properties": {
"keyword": {
"description": "List of each keyword entered via the `keywords` parameter and, for each keyword, an array of `KeywordResult` objects that provides information about its occurrences in the input audio. The keys of the list are the actual keyword strings. A keyword for which no matches are spotted in the input is omitted from the array.",
"type": "array",
"items": {
"required": [
"normalized_text",
"start_time",
"end_time",
"confidence"
],
"properties": {
"normalized_text": {
"description": "Specified keyword normalized to the spoken phrase that matched in the audio input.",
"type": "string"
},
"start_time": {
"description": "Start time in hundredths of seconds of the keyword match.",
"type": "number",
"format": "double"
},
"end_time": {
"description": "End time in hundredths of seconds of the keyword match.",
"type": "number",
"format": "double"
},
"confidence": {
"description": "Confidence score of the keyword match in the range of 0 to 1.",
"type": "number",
"format": "double",
"minimum": 0,
"maximum": 1
}
}
}
}
}
},
"word_alternatives": {
"description": "Array of word alternative hypotheses found for words of the input audio if `word_alternatives_threshold` is not null.",
"type": "array",
"items": {
"required": [
"start_time",
"end_time",
"alternatives"
],
"properties": {
"start_time": {
"description": "Start time in hundredths of seconds of the word from the input audio that corresponds to the word alternatives.",
"type": "number",
"format": "double"
},
"end_time": {
"description": "End time in hundredths of seconds of the word from the input audio that corresponds to the word alternatives.",
"type": "number",
"format": "double"
},
"alternatives": {
"description": "Array of word alternative hypotheses for a word from the input audio.",
"type": "array",
"items": {
"required": [
"confidence",
"word"
],
"properties": {
"confidence": {
"description": "Confidence score of the word alternative hypothesis.",
"type": "number",
"format": "double",
"minimum": 0,
"maximum": 1
},
"word": {
"description": "Word alternative hypothesis for a word from the input audio.",
"type": "string"
}
}
}
}
}
}
}
}
}
},
"result_index": {
"description": "An index that indicates the change point in the `results` array.",
"type": "integer",
"format": "int32"
},
"warnings": {
"description": "An array of warning messages about invalid query parameters or JSON fields included with the request. Each warning includes a descriptive message and a list of invalid argument strings. For example, a message such as `\"Unknown arguments:\"` or `\"Unknown url query arguments:\"` followed by a list of the form `\"invalid_arg_1, invalid_arg_2.\"` The request succeeds despite the warnings.",
"type": "array",
"items": {
"type": "string"
}
}
}
}
{
"metadata": "Hello, world!",
"upload": "Hello, world!"
}
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
400
Bad Request. User input error
(for example, audio not matching the specified format) or Inactivity timeout
(no speech detected).
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
{
"metadata": "Hello, world!",
"upload": "Hello, world!"
}
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
406
Not Acceptable.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
{
"metadata": "Hello, world!",
"upload": "Hello, world!"
}
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
408
Request Timeout. Request failed due to inactivity
(no audio data sent) for 30 seconds.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
{
"metadata": "Hello, world!",
"upload": "Hello, world!"
}
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
413
Payload Too Large. Request failed because the input stream is larger than 100 MB
.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
{
"metadata": "Hello, world!",
"upload": "Hello, world!"
}
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
415
Unsupported Media Type.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
{
"metadata": "Hello, world!",
"upload": "Hello, world!"
}
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
500
Internal Server Error.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
{
"metadata": "Hello, world!",
"upload": "Hello, world!"
}
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
503
Service Unavailable.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
/speech-to-text/api/v1/register_callback
Registers a callback URL with the service for use with subsequent asynchronous recognition requests. The service attempts to register, or white-list, the callback URL. To be registered successfully, the callback URL must respond to a GET
request from the service, after which the service responds with response code 201 to the original registration request.
The service sends only a single GET
request to the callback URL. If the service does not receive a response with a response code of 200 and a body that echoes a random alphanumeric challenge string from the service within 5 seconds, it does not white-list the URL; it sends response code 400 in response to the registration request. If the requested callback URL is already white-listed, the service responds to the registration request with response code 200.
Once you successfully register a callback URL, you can use it with an indefinite number of recognition requests. You can register a maximum of 20 callback URLS in a one-hour span of time.
If you specify a user secret with the request, the service uses it as a key to calculate an HMAC-SHA1 signature of a random challenge string in its response to the request. It sends the signature in the X-Callback-Signature
header of its GET
request to the URL during registration. It also uses the secret to calculate a signature over the payload of every callback notification that uses the URL. The signature provides authentication and data integrity for HTTP communications.
Note: This method is currently a beta release that supports US English only.
string
(required) An HTTP or HTTPS URL to which callback notifications are to be sent. To be white-listed, the URL must successfully echo the challenge string during URL verification. During verification, the client can also check the signature that the service sends in the X-Callback-Signature
header to verify the origin of the request.
string
(required) A user-specified string that the service uses to generate the HMAC-SHA1 signature that it sends via the X-Callback-Signature
header. The service includes the header during URL verification and with every notification sent to the callback URL. It calculates the signature over the payload of the notification. If you omit the parameter, the service does not send the header.
{
"type": "string"
}
200
OK. The callback was already registered (white-listed). The status included in the response is already created
.
{
"required": [
"status",
"url"
],
"properties": {
"status": {
"description": "The current status of the job: `created` if the callback URL was successfully white-listed as a result of the call or `already created` if the URL was already white-listed.",
"type": "string"
},
"url": {
"description": "The callback URL that is successfully registered.",
"type": "string"
}
}
}
{
"type": "string"
}
201
Created. The callback was successfully registered (white-listed). The status included in the response is created
.
{
"required": [
"status",
"url"
],
"properties": {
"status": {
"description": "The current status of the job: `created` if the callback URL was successfully white-listed as a result of the call or `already created` if the URL was already white-listed.",
"type": "string"
},
"url": {
"description": "The callback URL that is successfully registered.",
"type": "string"
}
}
}
{
"type": "string"
}
400
Bad Request. The callback registration failed. The request was missing a required parameter or specified an invalid argument; the client sent an invalid response to the service’s GET
request during the registration process; or the client failed to respond to the server’s request before the five-second timeout.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
{
"type": "string"
}
503
Service Unavailable. The service is currently unavailable.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
/speech-to-text/api/v1/recognitions
Returns the status and ID of all outstanding jobs associated with the service credentials with which it is called. If a job was created with a callback URL and a user token, the method also returns the user token for the job. To obtain the results for a job whose status is completed
, use the GET recognitions/{id}
method. A job and its results remain available until you delete them with the DELETE recognitions/{id}
method or until the job’s time to live expires, whichever comes first.
Note: This method is currently a beta release that supports US English only.
200
OK.
{
"required": [
"recognitions"
],
"properties": {
"recognitions": {
"description": "An array of objects that provides the status for each of the user's current jobs. The array is empty if the user has no current jobs.",
"type": "array",
"items": {
"required": [
"id",
"status"
],
"properties": {
"id": {
"description": "The ID of the job.",
"type": "string"
},
"status": {
"description": "The current status of the job: `waiting`: The service is preparing the job for processing; the service always returns this status when the job is initially created. `processing`: The service is actively processing the job. `completed`: The service has finished processing the job; if the job specified a callback URL and the event `recognitions.completed_with_results`, the service sent the results with the callback notification; otherwise, use the `GET recognitions/{id}` method to retrieve the results. `failed`: The job failed.",
"type": "string"
},
"url": {
"description": "For a `POST /v1/recognitions` request, the URL to use to request information about the job with the `GET recognitions/{id}` method.",
"type": "string"
},
"user_token": {
"description": "For a `GET /v1/recognitions` request, the user token associated with the job if the job was created with a callback URL and a user token.",
"type": "string"
}
}
}
}
}
}
503
Service Unavailable. The service is currently unavailable.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
/speech-to-text/api/v1/recognitions{?events,user_token,results_ttl,model,continuous,inactivity_timeout,keywords,keywords_threshold,max_alternatives,word_alternatives_threshold,word_confidence,timestamps,profanity_filter,smart_formatting}
Creates a job for a new asynchronous recognition request. The job is owned by the user whose service credentials are used to create it. How you learn the status and results of a job depends on the parameters you include with the job creation request:
By callback notification: Include the callback_url
query parameter to specify a URL to which the service is to send callback notifications when the status of the job changes. Optionally, you can also include the events
and user_token
query parameters to subscribe to specific events and to specify a string that is to be included with each notification for the job.
By polling the service: Omit the callback_url
, events
, and user_token
query parameters. You must then use the GET recognitions
or GET recognitions/{id}
methods to check the status of the job, using the latter to retrieve the results when the job is complete.
The two approaches are not mutually exclusive. You can poll the service for job status or obtain results from the service manually even if you include a callback URL. In both cases, you can include the results_ttl
parameter to specify how long the results are to remain available after the job is complete. Note that using the HTTPS GET recognitions/{id}
method to retrieve results is more secure than receiving them via callback notification over HTTP because it provides confidentiality in addition to authentication and data integrity.
The method supports the same basic parameters as all HTTP REST and WebSocket recognition requests; it does not support interim results or multipart data. The service imposes a data size limit of 100 MB. It automatically detects the endianness of the incoming audio and, for audio that includes multiple channels, downmixes the audio to one-channel mono during transcoding.
Note: This method is currently a beta release that supports US English only.
string
(required) specified string with each job to differentiate the callback notifications for the jobs.
string
(required) If the job includes a callback URL, a comma-separated list of notification events to which to subscribe. Valid events are: recognitions.started
generates a callback notification when the service begins to process the job. recognitions.completed
generates a callback notification when the job is complete; you must use the GET recognitions/{id}
method to retrieve the results before they time out or are deleted. recognitions.completed_with_results
generates a callback notification when the job is complete; the notification includes the results of the request. recognitions.failed
generates a callback notification if the service experiences an error while processing the job. Omit the parameter to subscribe to the default events: recognitions.started
, recognitions.completed
, and recognitions.failed
. The recognitions.completed
and recognitions.completed_with_results
events are incompatible; you can specify only of the two events. If the job does not include a callback URL, omit the parameter.
string
(required) If the job includes a callback URL, a user-specified string that the service is to include with each callback notification for the job; the token allows the user to maintain an internal mapping between jobs and notification events. If the job does not include a callback URL, omit the parameter.
string
(required) The number of minutes for which the results are to be available after the job has finished. If not delivered via a callback, the results must be retrieved within this time. Omit the parameter to use a time to live of one week. The parameter is valid with or without a callback URL.
string
(required) The identifier of the model to be used for the recognition request. Currently, only en-US-BroadbandModel
(the default) is supported.
string
(required) If true
, multiple final results that represent consecutive phrases separated by pauses are returned. Otherwise, recognition ends after the first “end of speech” incident is detected.
string
(required) The time in seconds after which, if only silence (no speech) is detected in submitted audio, the connection is closed with a 400 error. Useful for stopping audio submission from a live microphone when a user simply walks away. Use -1
for infinity. See also the continuous
parameter.
string
(required) Array of keyword strings to spot in the audio. Each keyword string can include one or more tokens. Keywords are spotted only in the final hypothesis, not in interim results. Omit the parameter or specify an empty array if you do not need to spot keywords.
string
(required) Confidence value that is the lower bound for spotting a keyword. A word is considered to match a keyword if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No keyword spotting is performed if the default value (null) is used. If you specify a threshold, you must also specify one or more keywords.
string
(required) Maximum number of alternative transcripts to be returned. By default, a single transcription is returned.
string
(required) Confidence value that is the lower bound for identifying a hypothesis as a possible word alternative (also known as “Confusion Networks”). An alternative word is considered if its confidence is greater than or equal to the threshold. Specify a probability between 0 and 1 inclusive. No alternative words are computed if the default value (null) is used.
string
(required) If true
, confidence measure per word is returned.
string
(required) If true
, time alignment for each word is returned.
string
(required) If true
(the default), filters profanity from all output except for keyword results by replacing inappropriate words with a series of asterisks. Set the parameter to false
to return results with no censoring. Applies to US English transcription only.
string
(required) If true
, converts dates, times, series of digits and numbers, phone numbers, currency values, and Internet addresses into more readable, conventional representations in the final transcript of a recognition request. If false
(the default), no formatting is performed. Applies to US English transcription only.
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
201
Created. The job was successfully created.
{
"required": [
"id",
"status"
],
"properties": {
"id": {
"description": "The ID of the job.",
"type": "string"
},
"status": {
"description": "The current status of the job: `waiting`: The service is preparing the job for processing; the service always returns this status when the job is initially created. `processing`: The service is actively processing the job. `completed`: The service has finished processing the job; if the job specified a callback URL and the event `recognitions.completed_with_results`, the service sent the results with the callback notification; otherwise, use the `GET recognitions/{id}` method to retrieve the results. `failed`: The job failed.",
"type": "string"
},
"url": {
"description": "For a `POST /v1/recognitions` request, the URL to use to request information about the job with the `GET recognitions/{id}` method.",
"type": "string"
},
"user_token": {
"description": "For a `GET /v1/recognitions` request, the user token associated with the job if the job was created with a callback URL and a user token.",
"type": "string"
}
}
}
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
400
Bad Request. The request specified an invalid argument. For example, the request specified a callback URL that has not been white-listed, the events
or user_token
parameter without also specifying a callback URL, or both the recognitions.completed
and recognitions.completed_with_results
events.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
{
"type": "array",
"items": {
"type": "string",
"format": "byte"
}
}
503
Service Unavailable. The service is currently unavailable.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
/speech-to-text/api/v1/recognitions/{id}
Deletes the specified job regardless of its current state. If you delete an active job, the service cancels the job without producing results. Once you delete a job, its results are no longer available. The service automatically deletes a job and its results when the time to live for the results expires. You must submit the request with the service credentials of the user who created the job.
Note: This method is currently a beta release that supports US English only.
string
(required) The ID of the job that is to be deleted.
204
No Content. The job was successfully deleted.
404
Not Found. The specified job ID was not found.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
503
Service Unavailable. The service is currently unavailable.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
/speech-to-text/api/v1/recognitions/{id}
Returns information about the specified job. The response always includes the status of the job. If the status is completed
, the response includes the results of the recognition request; otherwise, the response includes the job ID. You must submit the request with the service credentials of the user who created the job.
You can use the method to retrieve the results of any job, regardless of whether it was submitted with a callback URL and the recognitions.completed_with_results
event, and you can retrieve the results multiple times for as long as they remain available.
Note: This method is currently a beta release that supports US English only.
string
(required) The ID of the job whose status is to be checked.
200
OK.
{
"required": [
"status"
],
"properties": {
"status": {
"description": "The current status of the job: `waiting`: The service is preparing the job for processing; the service also returns this status when the job is initially created. `processing`: The service is actively processing the job. `completed`: The service has finished processing the job; if the job specified a callback URL and the event `recognitions.completed_with_results`, the service sent the results with the callback notification; otherwise, use the `GET recognitions/{id}` method to retrieve the results. `failed`: The job failed.",
"type": "string"
},
"id": {
"description": "If the status is not `completed`, the ID of the job.",
"type": "string"
},
"results": {
"description": "If the status is `completed`, the results of the recognition request as an array that includes a single instance of a `SpeechRecognitionEvent` object.",
"type": "array",
"items": {
"required": [
"results",
"result_index"
],
"properties": {
"results": {
"description": "The results array consists of zero or more final results followed by zero or one interim result. The final results are guaranteed not to change; the interim result may be replaced by zero or more final results (followed by zero or one interim result). The service periodically sends updates to the result list, with the `result_index` set to the lowest index in the array that has changed.",
"type": "array",
"items": {
"required": [
"final",
"alternatives"
],
"properties": {
"final": {
"description": "If `true`, the result for this utterance is not updated further.",
"type": "boolean"
},
"alternatives": {
"description": "Array of alternative transcripts.",
"type": "array",
"items": {
"required": [
"transcript"
],
"properties": {
"transcript": {
"description": "Transcription of the audio.",
"type": "string"
},
"confidence": {
"description": "Confidence score of the transcript in the range of 0 to 1. Available only for the best alternative and only in results marked as final.",
"type": "number",
"format": "double",
"minimum": 0,
"maximum": 1
},
"timestamps": {
"description": "Time alignments for each word from transcript as a list of lists. Each inner list consists of three elements: the word followed by its start and end time in hundredths of seconds. Example: `[[\"hello\",0.0,1.2],[\"world\",1.2,2.5]]`. Available only for the best alternative.",
"type": "array",
"items": {
"type": "string"
}
},
"word_confidence": {
"description": "Confidence score for each word of the transcript as a list of lists. Each inner list consists of two elements: the word and its confidence score in the range of 0 to 1. Example: `[[\"hello\",0.95],[\"world\",0.866]]`. Available only for the best alternative and only in results marked as final.",
"type": "array",
"items": {
"type": "string"
}
}
}
}
},
"keywords_result": {
"description": "Dictionary (or associative array) whose keys are the strings specified for `keywords` if both that parameter and `keywords_threshold` are specified. A keyword for which no matches are found is omitted from the array.",
"required": [
"keyword"
],
"properties": {
"keyword": {
"description": "List of each keyword entered via the `keywords` parameter and, for each keyword, an array of `KeywordResult` objects that provides information about its occurrences in the input audio. The keys of the list are the actual keyword strings. A keyword for which no matches are spotted in the input is omitted from the array.",
"type": "array",
"items": {
"required": [
"normalized_text",
"start_time",
"end_time",
"confidence"
],
"properties": {
"normalized_text": {
"description": "Specified keyword normalized to the spoken phrase that matched in the audio input.",
"type": "string"
},
"start_time": {
"description": "Start time in hundredths of seconds of the keyword match.",
"type": "number",
"format": "double"
},
"end_time": {
"description": "End time in hundredths of seconds of the keyword match.",
"type": "number",
"format": "double"
},
"confidence": {
"description": "Confidence score of the keyword match in the range of 0 to 1.",
"type": "number",
"format": "double",
"minimum": 0,
"maximum": 1
}
}
}
}
}
},
"word_alternatives": {
"description": "Array of word alternative hypotheses found for words of the input audio if `word_alternatives_threshold` is not null.",
"type": "array",
"items": {
"required": [
"start_time",
"end_time",
"alternatives"
],
"properties": {
"start_time": {
"description": "Start time in hundredths of seconds of the word from the input audio that corresponds to the word alternatives.",
"type": "number",
"format": "double"
},
"end_time": {
"description": "End time in hundredths of seconds of the word from the input audio that corresponds to the word alternatives.",
"type": "number",
"format": "double"
},
"alternatives": {
"description": "Array of word alternative hypotheses for a word from the input audio.",
"type": "array",
"items": {
"required": [
"confidence",
"word"
],
"properties": {
"confidence": {
"description": "Confidence score of the word alternative hypothesis.",
"type": "number",
"format": "double",
"minimum": 0,
"maximum": 1
},
"word": {
"description": "Word alternative hypothesis for a word from the input audio.",
"type": "string"
}
}
}
}
}
}
}
}
}
},
"result_index": {
"description": "An index that indicates the change point in the `results` array.",
"type": "integer",
"format": "int32"
},
"warnings": {
"description": "An array of warning messages about invalid query parameters or JSON fields included with the request. Each warning includes a descriptive message and a list of invalid argument strings. For example, a message such as `\"Unknown arguments:\"` or `\"Unknown url query arguments:\"` followed by a list of the form `\"invalid_arg_1, invalid_arg_2.\"` The request succeeds despite the warnings.",
"type": "array",
"items": {
"type": "string"
}
}
}
}
}
}
}
404
Not Found. The specified job ID was not found.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}
503
Service Unavailable. The service is currently unavailable.
{
"required": [
"error",
"code",
"code_description"
],
"properties": {
"error": {
"description": "Description of the problem.",
"type": "string"
},
"code": {
"description": "HTTP response code.",
"type": "integer",
"format": "int32"
},
"code_description": {
"description": "Response message.",
"type": "string"
}
}
}