POST
/
v1
/
ai
/
speech

Specify either speed or length, not both; otherwise a request with both will result in an 500 server error as the desired speed might not match the desired length.

The output of this POST request is a JSON object from which you must extract and decode the base64-encoded audio data. Here is an example of how to do so in your terminal:

jq -r '.audio' lmnt-output.json | base64 --decode > lmnt-audio-output.mp3

The file format of your audio output depends on the format specified in the inital request (this example assumes format=mp3).

Authorizations

X-API-Key
string
header
required

Your API key; get it from your LMNT account page.

Body

multipart/form-data
text
string
required

The text to synthesize; max 5000 characters per request (including spaces)

voice
string
required

The voice id of the voice to use for synthesis; voice ids can be retrieved by calls to List voices or Voice info

conversational
boolean
default:
false

Set this to true to generate conversational-style speech rather than reading-style speech. Does not work with the blizzard model.

format
enum<string>
default:
mp3

The file format of the synthesized audio output

Available options:
aac,
mp3,
mulaw,
raw,
wav
language
enum<string>
default:
en

The desired language of the synthesized speech. Two letter ISO 639-1 code. Does not work with professional clones and the blizzard model.

Available options:
de,
en,
es,
fr,
pt,
zh,
ko,
hi
length
number

Produce speech of this length in seconds; maximum 300.0 (5 minutes). Does not work with the blizzard model.

Required range: x < 300
model
enum<string>
default:
aurora

The model to use for synthesis. One of aurora (default) or blizzard. Learn more about models here.

Available options:
aurora,
blizzard
return_durations
boolean
default:
false

If set as true, response will contain a durations object.

sample_rate
enum<number>
default:
24000

The desired output sample rate in Hz

Available options:
8000,
16000,
24000
seed
integer

Seed used to specify a different take; defaults to random

speed
number
default:
1

The talking speed of the generated speech, a floating point value between 0.25 (slow) and 2.0 (fast).

Required range: 0.25 < x < 2

Response

200 - application/json
audio
string
required

The base64-encoded audio file; the format is determined by the format parameter.

seed
integer
required

The seed used to generate this speech; can be used to replicate this output take (assuming the same text is resynthsized with this seed number, see here for more details).

durations
object[]

A JSON object outlining the spoken duration of each synthesized input element (words and non-words like spaces, punctuation, etc.). See an example of this object for the input string "Hello world!"

Was this page helpful?