The Speech class is your primary touch-point.

Instantiate a Speech object with your

from lmnt.api import Speech

speech = Speech('LMNT_API_KEY')

When you’re done with the Speech instance, make sure to clean up by calling its close() method.

await speech.close()

Alternatively, you can use this class as an async context manager, which will call close() for you:

async with Speech('LMNT_API_KEY') as speech:
  pass

While you can provide an api_key argument, we recommend using python-dotenv to add LMNT_API_KEY="My API Key" to your .env file so that your API key is not stored in source control.


list_voices

async list_voices(starred=False, owner='all')

Returns the voices available for use in speech synthesis calls.

voices = await speech.list_voices()

Parameters

starred
bool
default:
"false"

if true, only return starred voices.

owner
str
default:
"all"

Specify which voices to return. Choose from system, me, or all.

Return value

A list of voice metadata objects. Here’s a sample object:

[
  {
    "name": "Morgan",
    "id": "morgan",
    "state": "ready",
    "owner": "system",
    "starred": false,
    "gender": "F",
    "description": "UK. Young adult. Conversational"
  }
]

voice_info

async voice_info(voice_id)

Returns details of a specific voice.

voice = await speech.voice_info('voice_id')

Parameters

voice_id
str
required

The id of the voice to update. If you don’t know the id, you can get it from list_voices().

Return value

The voice metadata object (A dictionary containing details of the voice). Here’s a sample object:

  {
    "name": "Morgan",
    "id": "morgan",
    "state": "ready",
    "owner": "system",
    "starred": false,
    "gender": "F",
    "description": "UK. Young adult. Conversational"
  }

create_voice

async create_voice(name, enhance, filenames, type='instant', gender=None, description=None)

Creates a new voice from a set of audio files. Returns the voice metadata object.

voice = await speech.create_voice('new-voice', True, ['file1.mp3', 'file2.mp3'])

Parameters

name
str
required

The name of the voice.

enhance
bool
required

For unclean audio with background noise, applies processing to attempt to improve quality. Not on by default as it can also degrade quality in some circumstances.

filenames
[str]
required

A list of filenames to use for the voice.

type
str
default:
"instant"

The type of voice to create. Must be one of instant or professional.

gender
str
default:
"None"

The gender of the voice, e.g. male, female, nonbinary. For categorization purposes.

description
str
default:
"None"

A description of the voice.

Return value

The voice metadata object. Here’s a sample object:

{
    "id": "123444566422",
    "name": "new-voice",
    "owner": "me",
    "state": "ready",
    "starred": false,
    "description": "Totam necessitatibus saepe repudiandae perferendis. Tempora iure provident. Consequatur debitis assumenda. Earum debitis cum.",
    "type": "instant",
    "gender": "male"
}

update_voice

async update_voice(voice_id, **kwargs)

Updates metadata for a specific voice. A voice that is not owned by you can only have its starred field updated. Only provided fields will be changed.

updated_voice = await speech.update_voice('voice_id', name='new-voice', starred=True)

Parameters

voice_id
str
required

The id of the voice to update. If you don’t know the id, you can get it from list_voices().

name
str

The name of the voice.

starred
bool

Whether the voice is starred by you.

gender
str

The gender of the voice, e.g. male, female, nonbinary. For categorization purposes.

description
str

A description of the voice.

Return value

The updated voice metadata object.


delete_voice

async delete_voice(voice_id)

Deletes a voice and cancels any pending operations on it. The voice must be owned by you. Cannot be undone.

await speech.delete_voice('voice_id')

Parameters

voice_id
str
required

The id of the voice to update. If you don’t know the id, you can get it from list_voices().

Return value

A success or error message. Here’s a sample object:

{
    "success": "true"
}

close

async close()

Releases resources associated with this instance.

await speech.close()

synthesize

async synthesize(text, voice, **kwargs)

Synthesizes speech for a supplied text string.

synth = await speech.synthesize('Hello world!', 'voice_id')

Parameters

text
str
required

The text to synthesize.

voice
str
required

Which voice to render; id is found using the list_voices call.

model
str
default:
"aurora"

The model to use for synthesis. One of aurora (default) or blizzard. Learn more about models here.

format
str
default:
"mp3"

aac, mp3, wav; Defaults to mp3 (24kHz 16-bit mono).

language
str
default:
"en"

The desired language of the synthesized speech. Two letter ISO 639-1 code. One of de, en, es, fr, pt, zh, ko, hi. Does not work with professional clones and the blizzard model.

length
float

Produce speech of this length in seconds; maximum 300.0 (5 minutes). Does not work with the blizzard model.

return_durations
bool
default:
"false"

Whether to include word durations detail in the response.

return_seed
bool
default:
"false"

Whether to include the seed used for synthesis in the response.

sample_rate
int
default:
"24000"

The desired output sample rate in Hz, one of: 8000, 16000, 24000; defaults to 24000 for all formats except mulaw which defaults to 8000.

speed
float
default:
"1.0"

Floating point value between 0.25 (slow) and 2.0 (fast).

seed
int

The seed used to specify a different take; Defaults to a random value.

Return value

audio
bytes

The synthesized audio encoded in the requested format as a bytes object.

durations
list of duration objects

A list of text duration objects. Only returned if return_durations is True.

seed
int

The seed used for synthesis. Only returned if return_seed is True.

Here is the schema for the return value:

{
  "audio": binary-audio-file,
  "durations": [
    {
      "text": "string",
      "start": 0,
      "duration": 0
    }
    ...
  ],
  "seed": "int"
}

Notes

  • The mp3 bitrate is 96kbps.
  • The length parameter specifies how long you want the output speech to be. We will automatically speed up / slow down the speech as needed to fit this length.

synthesize_streaming

async synthesize_streaming(voice, return_extras=False, **kwargs)

Creates a new, full-duplex streaming session. You can use the returned session object to concurrently stream text content to the server and receive speech data from the server.

connection = await speech.synthesize_streaming('voice_id')

Parameters

voice
str
required

Which voice to render; id can be found using the list_voices call.

speed
float
default:
"1.0"

The speed of the speech. Floating point value between 0.25 (slow) and 2.0 (fast).

format
string
default:
"mp3"

The desired output audio format. One of:

  • mp3: 96kbps MP3 audio. This format is useful for applications that need to play the audio directly to the user.
  • raw: 16-bit little-endian linear PCM audio. This format is useful for applications that need to process the audio further, such as adding effects or mixing multiple audio streams.
  • ulaw: 8-bit G711 µ-law audio with a WAV header. This format is most useful for telephony applications.
language
str
default:
"en"

The desired language of the synthesized speech. Two letter ISO 639-1 code. One of de, en, es, fr, pt, zh, ko, hi.

sample_rate
number
default:
"24000"

The desired output audio sample rate. One of:

  • 24000: 24kHz audio. This sample rate is useful for applications that need high-quality audio.
  • 16000: 16kHz audio. This sample rate is useful for applications that need to save bandwidth.
  • 8000: 8kHz audio. This sample rate is most useful for telephony applications and µ-law encoding.
return_extras
bool
default:
"false"

Whether to return extra data (durations data and warnings) with each audio chunk.

Return value

A StreamingSynthesisConnection instance, which you can use to stream data.

Was this page helpful?