LMNT offers two primary surfaces to build with our models, each suited for different use cases.
| Speech API | Speech Sessions API | |
|---|---|---|
| What it is | Turn text into speech | Stream text in from an LLM, realtime speech out |
| Best for | Content with preproduced text like voiceovers, localization, narration, audiobooks, etc | Turning your favorite LLM into a realtime voice agent, and keeping your voices consistent as you upgrade LLMs |
| Learn more | Speech API docs | Speech Sessions API docs |
This guide covers common patterns for working with the Speech Sessions API, including streaming text from your LLM & streaming speech out, when to flush, handling user interruptions, and getting your LLM to produce conversational text. For complete API specifications, see the Speech Sessions API reference.
Basic text streaming in and speech streaming out
import asyncio
from anthropic import AsyncAnthropic
from lmnt import AsyncLmnt
DEFAULT_PROMPT = 'Read me an excerpt of a short sci-fi story in the public domain.'
VOICE_ID = 'elowen'
async def main():
client = AsyncLmnt()
connection = await client.speech.sessions.create(voice=VOICE_ID)
t1 = asyncio.create_task(reader_task(connection))
t2 = asyncio.create_task(writer_task(connection))
await asyncio.gather(t1, t2)
async def reader_task(connection):
"""Streams audio data from LMNT and writes it to `output.mp3`."""
with open('output.mp3', 'wb') as f:
async for message in connection:
f.write(message.audio)
async def writer_task(connection):
"""Streams text from Claude to LMNT."""
client = AsyncAnthropic()
async with client.messages.stream(
model='claude-sonnet-4-6',
max_tokens=1024,
messages=[{'role': 'user', 'content': DEFAULT_PROMPT}],
) as stream:
async for text in stream.text_stream:
await connection.append_text(text)
print(text, end='', flush=True)
# After `finish` is called, the server will close the connection
# when it has finished generating speech.
await connection.finish()
asyncio.run(main())Flushing when your LLM has finished a turn
As you stream text in, your speech session buffers a small amount of text while it waits for enough context to generate natural-sounding speech.
Call flush the moment your LLM is done streaming text. The speech session generates
all remaining text immediately and the connection stays open, ready for the next turn.
async for chunk in llm_stream:
text = extract_text(chunk)
if text:
await connection.append_text(text)
await connection.flush()If you forget to call flush (or finish), the last bit of text the LLM
produced will sit in the buffer indefinitely.
Be careful when you send flush. If you send flush at arbitrary points
instead of end of turn, your speech may sound less natural.
Getting your LLM to sound conversational
LLMs default to formal, structured responses that sound robotic when spoken aloud.
Prompting them with explicit guidance — that the response will be spoken, that contractions and fillers words belong, that bulleted lists and headers don't — has the biggest impact on how natural your speech sounds.
See our LLM prompting guide for a detailed breakdown and a copy-pasteable prompt template to use as a starting point