This is the reference for the v1 Lmnt Python SDK. The v2 SDK has a different API and is not compatible with this reference.
Speech
class is your primary touch-point.
Instantiate a Speech
object with your
Speech
instance, make sure to clean up by calling its close()
method.
close()
for you:
api_key
argument, we recommend using python-dotenv
to add LMNT_API_KEY="My API Key"
to your .env
file so that your API key is not stored in source control.
list_voices
async list_voices(starred=False, owner='all')
Returns the voices available for use in speech synthesis calls.
Parameters
if
true
, only return starred voices.Specify which voices to return. Choose from
system
, me
, or all
.Return value
A list of voice metadata objects. Here’s a sample object:voice_info
async voice_info(voice_id)
Returns details of a specific voice.
Parameters
The id of the voice to update. If you don’t know the id, you can get it from
list_voices()
.Return value
The voice metadata object (A dictionary containing details of the voice). Here’s a sample object:create_voice
async create_voice(name, enhance, filenames, type='instant', gender=None, description=None)
Creates a new voice from a set of audio files. Returns the voice metadata object.
Parameters
The name of the voice.
For unclean audio with background noise, applies processing to attempt to improve quality. Not on by default as it can also degrade quality in some circumstances.
A list of filenames to use for the voice.
The type of voice to create. Must be one of
instant
or professional
.The gender of the voice, e.g.
male
, female
, nonbinary
. For categorization purposes.A description of the voice.
Return value
The voice metadata object. Here’s a sample object:update_voice
async update_voice(voice_id, **kwargs)
Updates metadata for a specific voice. A voice that is not owned by you can only have its starred
field updated.
Only provided fields will be changed.
Parameters
The id of the voice to update. If you don’t know the id, you can get it from
list_voices()
.The name of the voice.
Whether the voice is starred by you.
The gender of the voice, e.g.
male
, female
, nonbinary
. For categorization purposes.A description of the voice.
Return value
The updated voice metadata object.delete_voice
async delete_voice(voice_id)
Deletes a voice and cancels any pending operations on it. The voice must be owned by you. Cannot be undone.
Parameters
The id of the voice to update. If you don’t know the id, you can get it from
list_voices()
.Return value
A success or error message. Here’s a sample object:close
async close()
Releases resources associated with this instance.
synthesize
async synthesize(text, voice, **kwargs)
Synthesizes speech for a supplied text string.
Parameters
The text to synthesize.
Which voice to render; id is found using the
list_voices
call.The model to use for synthesis. One of
aurora
(default) or blizzard
. Learn more about models here.aac
, mp3
, wav
; Defaults to mp3
(24kHz 16-bit mono).The desired language of the synthesized speech. Two letter ISO 639-1 code. One of
de
, en
, es
, fr
, pt
, zh
, ko
, hi
. Does not work with professional clones and the blizzard
model.Whether to include word durations detail in the response.
Whether to include the seed used for synthesis in the response.
The desired output sample rate in Hz, one of:
8000
, 16000
, 24000
; defaults to 24000
for all formats except mulaw
which defaults to 8000
.The seed used to specify a different take; Defaults to a random value.
Return value
The synthesized audio encoded in the requested format as a bytes object.
A list of text duration objects. Only returned if
return_durations
is True
.The seed used for synthesis. Only returned if
return_seed
is True
.Notes
- The
mp3
bitrate is 96kbps.
synthesize_streaming
async synthesize_streaming(voice, return_extras=False, **kwargs)
Creates a new, full-duplex streaming session. You can use the returned session object to concurrently stream text content to the server
and receive speech data from the server.
Parameters
Which voice to render; id can be found using the
list_voices
call.The desired output audio format. One of:
mp3
: 96kbps MP3 audio. This format is useful for applications that need to play the audio directly to the user.raw
: 16-bit little-endian linear PCM audio. This format is useful for applications that need to process the audio further, such as adding effects or mixing multiple audio streams.ulaw
: 8-bit G711 µ-law audio with a WAV header. This format is most useful for telephony applications.
The desired language of the synthesized speech. Two letter ISO 639-1 code. One of
de
, en
, es
, fr
, pt
, zh
, ko
, hi
.The desired output audio sample rate. One of:
24000
: 24kHz audio. This sample rate is useful for applications that need high-quality audio.16000
: 16kHz audio. This sample rate is useful for applications that need to save bandwidth.8000
: 8kHz audio. This sample rate is most useful for telephony applications and µ-law encoding.
Whether to return extra data (durations data and warnings) with each audio chunk.
Return value
AStreamingSynthesisConnection
instance, which you can use to stream data.