Synthesize speech
Synthesizes speech from a text string. Returns binary audio data in one of many supported audio formats. This simplified version of synthesis can be directly used in HTML5 audio tags.
Specify either speed or length, not both; otherwise a request with both will result in an 500 server error as the desired speed might not match the desired length.
For more detailed output such as the duration of each spoken word, use the Speech POST request.
Query Parameters
Your API key; get it from your LMNT account page.
The voice id of the voice to use for synthesis; voice ids can be retrieved by calls to List voices
or Voice info
.
The text to synthesize; max 5000 characters per request (including spaces).
The desired language of the synthesized speech. Two letter ISO 639-1 code. One of de
, en
, es
, fr
, pt
, zh
, ko
, hi
. Does not work with professional clones and the blizzard
model.
The model to use for synthesis. One of aurora
(default) or blizzard
. Learn more about models here.
The file format of the synthesized audio output, either aac
, mp3
, mulaw
, raw
, wav
.
Set this to true
to generate conversational-style speech rather than reading-style speech. Does not work with the blizzard
model.
The desired output sample rate in Hz, one of: 8000
, 16000
, 24000
; defaults to 24000
for all formats except mulaw
which defaults to 8000
.
The talking speed of the generated speech, a floating point value between 0.25
(slow) and 2.0
(fast).
Produce speech of this length in seconds; maximum 300.0 (5 minutes). Does not work with the blizzard
model.
Response
The response is of type object
.
Was this page helpful?