Speech
Reference for the Speech class in the Python SDK
The Speech
class is your primary touch-point.
Instantiate a Speech
object with your
When you’re done with the Speech
instance, make sure to clean up by calling its close()
method.
Alternatively, you can use this class as an async context manager, which will call close()
for you:
While you can provide an api_key
argument, we recommend using python-dotenv
to add LMNT_API_KEY="My API Key"
to your .env
file so that your API key is not stored in source control.
list_voices
async list_voices(starred=False, owner='all')
Returns the voices available for use in speech synthesis calls.
Parameters
if true
, only return starred voices.
Specify which voices to return. Choose from system
, me
, or all
.
Return value
A list of voice metadata objects. Here’s a sample object:
voice_info
async voice_info(voice_id)
Returns details of a specific voice.
Parameters
The id of the voice to update. If you don’t know the id, you can get it from list_voices()
.
Return value
The voice metadata object (A dictionary containing details of the voice). Here’s a sample object:
create_voice
async create_voice(name, enhance, filenames, type='instant', gender=None, description=None)
Creates a new voice from a set of audio files. Returns the voice metadata object.
Parameters
The name of the voice.
For unclean audio with background noise, applies processing to attempt to improve quality. Not on by default as it can also degrade quality in some circumstances.
A list of filenames to use for the voice.
The type of voice to create. Must be one of instant
or professional
.
The gender of the voice, e.g. male
, female
, nonbinary
. For categorization purposes.
A description of the voice.
Return value
The voice metadata object. Here’s a sample object:
update_voice
async update_voice(voice_id, **kwargs)
Updates metadata for a specific voice. A voice that is not owned by you can only have its starred
field updated.
Only provided fields will be changed.
Parameters
The id of the voice to update. If you don’t know the id, you can get it from list_voices()
.
The name of the voice.
Whether the voice is starred by you.
The gender of the voice, e.g. male
, female
, nonbinary
. For categorization purposes.
A description of the voice.
Return value
The updated voice metadata object.
delete_voice
async delete_voice(voice_id)
Deletes a voice and cancels any pending operations on it. The voice must be owned by you. Cannot be undone.
Parameters
The id of the voice to update. If you don’t know the id, you can get it from list_voices()
.
Return value
A success or error message. Here’s a sample object:
close
async close()
Releases resources associated with this instance.
synthesize
async synthesize(text, voice, **kwargs)
Synthesizes speech for a supplied text string.
Parameters
The text to synthesize.
Which voice to render; id is found using the list_voices
call.
aac
, mp3
, wav
; Defaults to mp3
(24kHz 16-bit mono).
The desired language of the synthesized speech. Two letter ISO 639-1 code. One of de
, en
, es
, fr
, pt
, zh
, ko
, hi
.
Produce speech of this length in seconds; maximum 300.0 (5 minutes).
Whether to include word durations detail in the response.
Whether to include the seed used for synthesis in the response.
The desired output sample rate in Hz, one of: 8000
, 16000
, 24000
; defaults to 24000
for all formats except mulaw
which defaults to 8000
.
Floating point value between 0.25 (slow) and 2.0 (fast).
The seed used to specify a different take; Defaults to a random value.
Return value
The synthesized audio encoded in the requested format as a bytes object.
A list of text duration objects. Only returned if return_durations
is True
.
The seed used for synthesis. Only returned if return_seed
is True
.
Here is the schema for the return value:
Notes
- The
mp3
bitrate is 96kbps. - The
length
parameter specifies how long you want the output speech to be. We will automatically speed up / slow down the speech as needed to fit this length.
synthesize_streaming
async synthesize_streaming(voice, return_extras=False, **kwargs)
Creates a new, full-duplex streaming session. You can use the returned session object to concurrently stream text content to the server and receive speech data from the server.
Parameters
Which voice to render; id can be found using the list_voices
call.
The speed of the speech. Floating point value between 0.25 (slow) and 2.0 (fast).
The desired output audio format. One of:
mp3
: 96kbps MP3 audio. This format is useful for applications that need to play the audio directly to the user.raw
: 16-bit little-endian linear PCM audio. This format is useful for applications that need to process the audio further, such as adding effects or mixing multiple audio streams.ulaw
: 8-bit G711 µ-law audio with a WAV header. This format is most useful for telephony applications.
The desired language of the synthesized speech. Two letter ISO 639-1 code. One of de
, en
, es
, fr
, pt
, zh
, ko
, hi
.
The desired output audio sample rate. One of:
24000
: 24kHz audio. This sample rate is useful for applications that need high-quality audio.16000
: 16kHz audio. This sample rate is useful for applications that need to save bandwidth.8000
: 8kHz audio. This sample rate is most useful for telephony applications and µ-law encoding.
Whether to return extra data (durations data and warnings) with each audio chunk.
Return value
A StreamingSynthesisConnection
instance, which you can use to stream data.
Was this page helpful?