LMNT offers two primary surfaces to build with our models, each suited for different use cases.
| Speech API | Speech Sessions API | |
|---|---|---|
| What it is | Turn text into speech | Stream text in from an LLM, realtime speech out |
| Best for | Content with preproduced text like voiceovers, localization, narration, advertisements, audiobooks, etc | Turning your favorite LLM into a realtime voice agent, and keeping your voices consistent as you upgrade LLMs |
| Learn more | Speech API docs | Speech Sessions API docs |
This guide covers common patterns for working with the Speech API, including speech generation, getting word timestamps, and making the generated speech sound more human. For complete API specifications, see the Speech API reference.
Basic speech generation
import asyncio
from lmnt import AsyncLmnt
async def main():
client = AsyncLmnt()
async with client.speech.with_streaming_response.generate(
text=(
"Uhh, did you see the weather in Palo Alto tomorrow? "
"Yeah, can't believe it's gonna rain, dude. Like what?"
),
voice='leah',
) as response:
await response.stream_to_file('hello.mp3')
asyncio.run(main())Speech generation with word timestamps
The Speech API allows you to get exact word timestamps when you need to sync your generated speech with subtitles, lip movement, or other modalities.
import asyncio
import base64
from lmnt import AsyncLmnt
async def main():
client = AsyncLmnt()
response = await client.speech.generate_detailed(
text=(
"Uhh, did you see the weather in Palo Alto tomorrow? "
"Yeah, can't believe it's gonna rain, dude. Like what?"
),
voice='leah',
format='mp3',
return_timestamps=True,
)
with open('hello.mp3', 'wb') as f:
f.write(base64.b64decode(response.audio))
for t in response.timestamps or []:
print(f'{t.start:.3f}s {t.text!r}')
asyncio.run(main())Generating conversational speech
Conversational speech (talking to someone) feels quite different than read speech (audiobooks), and your text prompt has a big impact on the speech generated by the model.
See our text prompting guide to learn more.