GET
/
v1
/
ai
/
speech

Specify either speed or length, not both; otherwise a request with both will result in an 500 server error as the desired speed might not match the desired length.

For more detailed output such as the duration of each spoken word, use the Speech POST request.

Query Parameters

X-API-Key
string
required

Your API key; get it from your LMNT account page.

voice
string
required

The voice id of the voice to use for synthesis; voice ids can be retrieved by calls to List voices or Voice info.

text
string
required

The text to synthesize; max 5000 characters per request (including spaces).

format
string

The file format of the synthesized audio output, either aac, mp3, mulaw, raw, wav; defaults to mp3.

language
string

The desired language of the synthesized speech. Two letter ISO 639-1 code. One of de, en, es, fr, pt, zh, ko, hi. Does not work with professional clones.

conversational
boolean

Set this to true to generate conversational-style speech rather than reading-style speech; defaults to false.

sample_rate
number

The desired output sample rate in Hz, one of: 8000, 16000, 24000; defaults to 24000 for all formats except mulaw which defaults to 8000.

speed
number

The talking speed of the generated speech, a floating point value between 0.25 (slow) and 2.0 (fast); defaults to 1.0.

length
number

Produce speech of this length in seconds; maximum 300.0 (5 minutes).

seed
integer

Seed used to specify a different take; defaults to random (see here for more details).

Response

200 - */*

The response is of type object.