POST
/
v1
/
ai
/
speech

Specify either speed or length, not both; otherwise a request with both will result in an 500 server error as the desired speed might not match the desired length.

The output of this POST request is a JSON object from which you must extract and decode the base64-encoded audio data. Here is an example of how to do so in your terminal:

jq -r '.audio' lmnt-output.json | base64 --decode > lmnt-audio-output.mp3

The file format of your audio output depends on the format specified in the inital request (this example assumes format=mp3).

Authorizations

X-API-Key
string
headerrequired

Your API key; get it from your LMNT account page.

Body

multipart/form-data
voice
string
required

The voice id of the voice to use for synthesis; voice ids can be retrieved by calls to List voices or Voice info

text
string
required

The text to synthesize; max 5000 characters per request (including spaces).

language
string
default: en

The desired language of the synthesized speech. Two letter ISO 639-1 code. One of de, en, es, fr, pt, zh, ko, hi. Does not work with professional clones and the blizzard model.

model
string
default: aurora

The model to use for synthesis. One of aurora (default) or blizzard. Learn more about models here.

format
string
default: mp3

The file format of the synthesized audio output, either aac, mp3, mulaw, raw, wav.

conversational
boolean
default: false

Set this to true to generate conversational-style speech rather than reading-style speech. Does not work with the blizzard model.

sample_rate
number
default: 24000

The desired output sample rate in Hz, one of: 8000, 16000, 24000; defaults to 24000 for all formats except mulaw which defaults to 8000.

speed
number
default: 1.0

The talking speed of the generated speech, a floating point value between 0.25 (slow) and 2.0 (fast).

length
number

Produce speech of this length in seconds; maximum 300.0 (5 minutes). Does not work with the blizzard model.

return_durations
string
default: false

If set as true, response will contain a durations object; see definition in the response section below.

seed
integer

Seed used to specify a different take; defaults to random (see here for more details).

Response

200 - application/json
audio
string
required

The base64-encoded audio file; the format is determined by the format parameter.

seed
integer
required

The seed used to generate this speech; can be used to replicate this output take (assuming the same text is resynthsized with this seed number, see here for more details).

durations
object[]

A JSON object outlining the spoken duration of each synthesized input element (words and non-words like spaces, punctuation, etc.). See an example of this object for the input string "Hello world!"

Was this page helpful?