Generate speech
Synthesizes speech from a text string and returns the audio data as a binary stream.
Want to stream timestamps with your speech? Check out the streaming WebSocket endpoint and examples using the SDKs in the synchronizing timing guide.
Authorizations
Your API key; get it from your LMNT account page.
Body
The voice id of the voice to use for synthesis; voice ids can be retrieved by calls to List voices
or Voice info
The text to synthesize; max 5000 characters per request (including spaces)
The model to use for synthesis. One of aurora
(default) or blizzard
. Learn more about models here.
aurora
, blizzard
The desired language of the synthesized speech. Two letter ISO 639-1 code. Does not work with professional clones and the blizzard
model.
de
, en
, es
, fr
, pt
, zh
, ko
, hi
The file format of the synthesized audio output
aac
, mp3
, mulaw
, raw
, wav
The desired output sample rate in Hz
8000
, 16000
, 24000
The talking speed of the generated speech, a floating point value between 0.25
(slow) and 2.0
(fast).
0.25 < x < 2
Seed used to specify a different take; defaults to random
Set this to true
to generate conversational-style speech rather than reading-style speech. Does not work with the blizzard
model.
Produce speech of this length in seconds; maximum 300.0 (5 minutes). Does not work with the blizzard
model.
x < 300
Controls the stability of the generated speech. A lower value (like 0.3) produces more consistent, reliable speech. A higher value (like 0.9) gives more flexibility in how words are spoken, but might occasionally produce unusual intonations or speech patterns.
0 < x < 1
Influences how expressive and emotionally varied the speech becomes. Lower values (like 0.3) create more neutral, consistent speaking styles. Higher values (like 1.0) allow for more dynamic emotional range and speaking styles.
x > 0
Response
The response is of type file
.
Was this page helpful?