Speech

The Speech class is your primary touch-point.

Instantiate a Speech object with your

from lmnt.api import Speech

speech = Speech('LMNT_API_KEY')

When you’re done with the Speech instance, make sure to clean up by calling its close() method.

await speech.close()

Alternatively, you can use this class as an async context manager, which will call close() for you:

async with Speech('LMNT_API_KEY') as speech:
  pass

While you can provide an api_key argument, we recommend using python-dotenv to add LMNT_API_KEY="My API Key" to your .env file so that your API key is not stored in source control.

list_voices

async list_voices(starred=False, owner='all')

Returns the voices available for use in speech synthesis calls.

voices = await speech.list_voices()

Parameters

starred

bool

default: "false"

if true, only return starred voices.

owner

str

default: "all"

Specify which voices to return. Choose from system, me, or all.

Return value

A list of voice metadata objects. Here’s a sample object:

[
  {
    "name": "Morgan",
    "id": "morgan",
    "state": "ready",
    "owner": "system",
    "starred": false,
    "gender": "F",
    "description": "UK. Young adult. Conversational"
  }
]

voice_info

async voice_info(voice_id)

Returns details of a specific voice.

voice = await speech.voice_info('voice_id')

Parameters

voice_id

str

required

The id of the voice to update. If you don’t know the id, you can get it from list_voices().

Return value

The voice metadata object (A dictionary containing details of the voice). Here’s a sample object:

  {
    "name": "Morgan",
    "id": "morgan",
    "state": "ready",
    "owner": "system",
    "starred": false,
    "gender": "F",
    "description": "UK. Young adult. Conversational"
  }

create_voice

async create_voice(name, enhance, filenames, type='instant', gender=None, description=None)

Creates a new voice from a set of audio files. Returns the voice metadata object.

voice = await speech.create_voice('new-voice', True, ['file1.mp3', 'file2.mp3'])

Parameters

name

str

required

The name of the voice.

enhance

bool

required

For unclean audio with background noise, applies processing to attempt to improve quality. Not on by default as it can also degrade quality in some circumstances.

filenames

[str]

required

A list of filenames to use for the voice.

type

str

default: "instant"

The type of voice to create. Must be one of instant or professional.

gender

str

default: "None"

The gender of the voice, e.g. male, female, nonbinary. For categorization purposes.

description

str

default: "None"

A description of the voice.

Return value

The voice metadata object. Here’s a sample object:

{
    "id": "123444566422",
    "name": "new-voice",
    "owner": "me",
    "state": "ready",
    "starred": false,
    "description": "Totam necessitatibus saepe repudiandae perferendis. Tempora iure provident. Consequatur debitis assumenda. Earum debitis cum.",
    "type": "instant",
    "gender": "male"
}

update_voice

async update_voice(voice_id, **kwargs)

Updates metadata for a specific voice. A voice that is not owned by you can only have its starred field updated. Only provided fields will be changed.

updated_voice = await speech.update_voice('voice_id', name='new-voice', starred=True)

Parameters

voice_id

str

required

The id of the voice to update. If you don’t know the id, you can get it from list_voices().

name

str

The name of the voice.

starred

bool

Whether the voice is starred by you.

gender

str

The gender of the voice, e.g. male, female, nonbinary. For categorization purposes.

description

str

A description of the voice.

Return value

The updated voice metadata object.

delete_voice

async delete_voice(voice_id)

Deletes a voice and cancels any pending operations on it. The voice must be owned by you. Cannot be undone.

await speech.delete_voice('voice_id')

Parameters

voice_id

str

required

The id of the voice to update. If you don’t know the id, you can get it from list_voices().

Return value

A success or error message. Here’s a sample object:

{
    "success": "true"
}

close

async close()

Releases resources associated with this instance.

await speech.close()

synthesize

async synthesize(text, voice, **kwargs)

Synthesizes speech for a supplied text string.

synth = await speech.synthesize('Hello world!', 'voice_id')

Parameters

text

str

required

The text to synthesize.

voice

str

required

Which voice to render; id is found using the list_voices call.

format

str

default: "mp3"

aac, mp3, wav; Defaults to mp3 (24kHz 16-bit mono).

language

str

default: "en"

The desired language of the synthesized speech. Two letter ISO 639-1 code. One of de, en, es, fr, pt, zh.

length

float

Produce speech of this length in seconds; maximum 300.0 (5 minutes).

return_durations

bool

default: "false"

Whether to include word durations detail in the response.

return_seed

bool

default: "false"

Whether to include the seed used for synthesis in the response.

sample_rate

int

default: "24000"

The desired output sample rate in Hz, one of: 8000, 16000, 24000; defaults to 24000 for all formats except mulaw which defaults to 8000.

speed

float

default: "1.0"

Floating point value between 0.25 (slow) and 2.0 (fast).

seed

int

The seed used to specify a different take; Defaults to a random value.

Return value

audio

bytes

The synthesized audio encoded in the requested format as a bytes object.

durations

list of duration objects

A list of text duration objects. Only returned if return_durations is True.

seed

int

The seed used for synthesis. Only returned if return_seed is True.

Here is the schema for the return value:

{
  "audio": binary-audio-file,
  "durations": [
    {
      "text": "string",
      "start": 0,
      "duration": 0
    }
    ...
  ],
  "seed": "int"
}

Notes

The mp3 bitrate is 96kbps.
The length parameter specifies how long you want the output speech to be. We will automatically speed up / slow down the speech as needed to fit this length.

synthesize_streaming

async synthesize_streaming(voice, return_extras=False, **kwargs)

Creates a new, full-duplex streaming session. You can use the returned session object to concurrently stream text content to the server and receive speech data from the server.

connection = await speech.synthesize_streaming('voice_id')

Parameters

voice

str

required

Which voice to render; id can be found using the list_voices call.

speed

float

default: "1.0"

The speed of the speech. Floating point value between 0.25 (slow) and 2.0 (fast).

format

string

default: "mp3"

The desired output audio format. One of:

mp3: 96kbps MP3 audio. This format is useful for applications that need to play the audio directly to the user.
raw: 16-bit little-endian linear PCM audio. This format is useful for applications that need to process the audio further, such as adding effects or mixing multiple audio streams.
ulaw: 8-bit G711 µ-law audio with a WAV header. This format is most useful for telephony applications.

language

str

default: "en"

The desired language of the synthesized speech. Two letter ISO 639-1 code. One of de, en, es, fr, pt, zh.

sample_rate

number

default: "24000"

The desired output audio sample rate. One of:

24000: 24kHz audio. This sample rate is useful for applications that need high-quality audio.
16000: 16kHz audio. This sample rate is useful for applications that need to save bandwidth.
8000: 8kHz audio. This sample rate is most useful for telephony applications and µ-law encoding.

return_extras

bool

default: "false"

Whether to return extra data (durations data and warnings) with each audio chunk.

Return value

A StreamingSynthesisConnection instance, which you can use to stream data.

Python

Node

Unity

list_voices

Parameters

Return value

voice_info

Parameters

Return value

create_voice

Parameters

Return value

update_voice

Parameters

Return value

delete_voice

Parameters

Return value

close

synthesize

Parameters

Return value

Notes

synthesize_streaming

Parameters

Return value

Python

Node

Unity

​list_voices

​Parameters

​Return value

​voice_info

​Parameters

​Return value

​create_voice

​Parameters

​Return value

​update_voice

​Parameters

​Return value

​delete_voice

​Parameters

​Return value

​close

​synthesize

​Parameters

​Return value

​Notes

​synthesize_streaming

​Parameters

​Return value

list_voices

Parameters

Return value

voice_info

Parameters

Return value

create_voice

Parameters

Return value

update_voice

Parameters

Return value

delete_voice

Parameters

Return value

close

synthesize

Parameters

Return value

Notes

synthesize_streaming

Parameters

Return value