Speech
Reference for the Speech class in the Python SDK
The Speech
class is your primary touch-point.
Instantiate a Speech
object with your
from lmnt.api import Speech
speech = Speech('LMNT_API_KEY')
When you’re done with the Speech
instance, make sure to clean up by calling its close()
method.
await speech.close()
Alternatively, you can use this class as an async context manager, which will call close()
for you:
async with Speech('LMNT_API_KEY') as speech:
pass
While you can provide an api_key
argument, we recommend using python-dotenv
to add LMNT_API_KEY="My API Key"
to your .env
file so that your API key is not stored in source control.
list_voices
async list_voices(starred=False, owner='all')
Returns the voices available for use in speech synthesis calls.
voices = await speech.list_voices()
Parameters
if true
, only return starred voices.
Specify which voices to return. Choose from system
, me
, or all
.
Return value
A list of voice metadata objects. Here’s a sample object:
[
{
"name": "Curtis",
"id": "curtis",
"state": "ready",
"owner": "system",
"starred": false,
"gender": "male",
"description": "Curtis' voice carries the seasoned timbre of middle age, filled with warmth, curiosity, and a hint of wisdom gained over the years."
}
]
voice_info
async voice_info(voice_id)
Returns details of a specific voice.
voice = await speech.voice_info('voice_id')
Parameters
The id of the voice to update. If you don’t know the id, you can get it from list_voices()
.
Return value
The voice metadata object (A dictionary containing details of the voice). Here’s a sample object:
{
"name": "Curtis",
"id": "curtis",
"state": "ready",
"owner": "system",
"starred": false,
"gender": "male",
"description": "Curtis' voice carries the seasoned timbre of middle age, filled with warmth, curiosity, and a hint of wisdom gained over the years."
}
create_voice
async create_voice(name, enhance, filenames, type='instant', gender=None, description=None)
Creates a new voice from a set of audio files. Returns the voice metadata object.
voice = await speech.create_voice('new-voice', True, ['file1.mp3', 'file2.mp3'])
Parameters
The name of the voice.
For unclean audio with background noise, applies processing to attempt to improve quality. Not on by default as it can also degrade quality in some circumstances.
A list of filenames to use for the voice.
The type of voice to create. Must be one of instant
or professional
.
The gender of the voice, e.g. male
, female
, nonbinary
. For categorization purposes.
A description of the voice.
Return value
The voice metadata object. Here’s a sample object:
{
"id": "123444566422",
"name": "new-voice",
"owner": "me",
"state": "ready",
"starred": false,
"description": "Totam necessitatibus saepe repudiandae perferendis. Tempora iure provident. Consequatur debitis assumenda. Earum debitis cum.",
"type": "instant",
"gender": "male"
}
update_voice
async update_voice(voice_id, **kwargs)
Updates metadata for a specific voice. A voice that is not owned by you can only have its starred
field updated.
Only provided fields will be changed.
updated_voice = await speech.update_voice('voice_id', name='new-voice', starred=True)
Parameters
The id of the voice to update. If you don’t know the id, you can get it from list_voices()
.
The name of the voice.
Whether the voice is starred by you.
The gender of the voice, e.g. male
, female
, nonbinary
. For categorization purposes.
A description of the voice.
Return value
The updated voice metadata object.
delete_voice
async delete_voice(voice_id)
Deletes a voice and cancels any pending operations on it. The voice must be owned by you. Cannot be undone.
await speech.delete_voice('voice_id')
Parameters
The id of the voice to update. If you don’t know the id, you can get it from list_voices()
.
Return value
A success or error message. Here’s a sample object:
{
"success": "true"
}
close
async close()
Releases resources associated with this instance.
await speech.close()
synthesize
async synthesize(text, voice, **kwargs)
Synthesizes speech for a supplied text string.
synth = await speech.synthesize('Hello world!', 'voice_id')
Parameters
The text to synthesize.
Which voice to render; id is found using the list_voices
call.
aac
, mp3
, wav
; Defaults to mp3
(24kHz 16-bit mono).
The desired language of the synthesized speech. Two letter ISO 639-1 code. One of de
, en
, es
, fr
, pt
, zh
.
Produce speech of this length in seconds; maximum 300.0 (5 minutes).
Whether to include word durations detail in the response.
Whether to include the seed used for synthesis in the response.
The desired output sample rate in Hz, one of: 8000
, 16000
, 24000
; defaults to 24000
for all formats except mulaw
which defaults to 8000
.
Floating point value between 0.25 (slow) and 2.0 (fast).
The seed used to specify a different take; Defaults to a random value.
Return value
The synthesized audio encoded in the requested format as a bytes object.
A list of text duration objects. Only returned if return_durations
is True
.
The seed used for synthesis. Only returned if return_seed
is True
.
Here is the schema for the return value:
{
"audio": binary-audio-file,
"durations": [
{
"text": "string",
"start": 0,
"duration": 0
}
...
],
"seed": "int"
}
Notes
- The
mp3
bitrate is 96kbps. - The
length
parameter specifies how long you want the output speech to be. We will automatically speed up / slow down the speech as needed to fit this length.
synthesize_streaming
async synthesize_streaming(voice, return_extras=False, **kwargs)
Creates a new, full-duplex streaming session. You can use the returned session object to concurrently stream text content to the server and receive speech data from the server.
connection = await speech.synthesize_streaming('voice_id')
Parameters
Which voice to render; id can be found using the list_voices
call.
The speed of the speech. Floating point value between 0.25 (slow) and 2.0 (fast).
The desired output audio format. One of:
mp3
: 96kbps MP3 audio. This format is useful for applications that need to play the audio directly to the user.raw
: 16-bit little-endian linear PCM audio. This format is useful for applications that need to process the audio further, such as adding effects or mixing multiple audio streams.ulaw
: 8-bit G711 µ-law audio with a WAV header. This format is most useful for telephony applications.
The desired language of the synthesized speech. Two letter ISO 639-1 code. One of de
, en
, es
, fr
, pt
, zh
.
The desired output audio sample rate. One of:
24000
: 24kHz audio. This sample rate is useful for applications that need high-quality audio.16000
: 16kHz audio. This sample rate is useful for applications that need to save bandwidth.8000
: 8kHz audio. This sample rate is most useful for telephony applications and µ-law encoding.
Whether to return extra data (durations data and warnings) with each audio chunk.
Return value
A StreamingSynthesisConnection
instance, which you can use to stream data.
Was this page helpful?