Using the Speech Sessions API

LMNT offers two primary surfaces to build with our models, each suited for different use cases.

	Speech API	Speech Sessions API
What it is	Turn text into speech	Stream text in from an LLM, realtime speech out
Best for	Content with preproduced text like voiceovers, localization, narration, advertisements, audiobooks, etc	Turning your favorite LLM into a realtime voice agent, and keeping your voices consistent as you upgrade LLMs
Learn more	Speech API docs	Speech Sessions API docs

This guide covers common patterns for working with the Speech Sessions API, including streaming text from your LLM & streaming speech out, when to flush, handling user interruptions, and getting your LLM to produce conversational text. For complete API specifications, see the Speech Sessions API reference.

Basic text streaming in and speech streaming out

import asyncio
 
from anthropic import AsyncAnthropic
from lmnt import AsyncLmnt
 
DEFAULT_PROMPT = 'Read me an excerpt of a short sci-fi story in the public domain.'
VOICE_ID = 'elowen'
 
async def main():
  client = AsyncLmnt()
  session = await client.speech.sessions.create(voice=VOICE_ID)
  await asyncio.gather(reader_task(session), writer_task(session))
  await session.close()
 
 
async def reader_task(session):
  """Streams audio data from LMNT and writes it to `output.mp3`."""
  with open('output.mp3', 'wb') as f:
    async for message in session:
      if message.type == 'audio':
        f.write(message.audio)
 
 
async def writer_task(session):
  """Streams text from Claude to LMNT."""
  anthropic = AsyncAnthropic()
  async with anthropic.messages.stream(
    model='claude-sonnet-4-6',
    max_tokens=1024,
    messages=[{'role': 'user', 'content': DEFAULT_PROMPT}],
  ) as stream:
    async for text in stream.text_stream:
      await session.send_text(text)
      print(text, end='', flush=True)
 
  # After `send_finish` is called, the server closes the session
  # once it has finished generating speech.
  await session.send_finish()
 
 
asyncio.run(main())

Flushing when your LLM has finished a turn

As you stream text in, your speech session buffers a small amount of text while it waits for enough context to generate natural-sounding speech.

Call flush the moment your LLM is done streaming text. The speech session generates all remaining text immediately and the connection stays open, ready for the next turn.

async for chunk in llm_stream:
  text = extract_text(chunk)
  if text:
    await session.send_text(text)
 
await session.send_flush()

If you forget to call flush (or finish), the last bit of text the LLM produced will sit in the buffer indefinitely.

Be careful when you send flush. If you send flush at arbitrary points instead of end of turn, your speech may sound less natural.

Getting your LLM to sound conversational

LLMs default to formal, structured responses that sound robotic when spoken aloud.

Prompting them with explicit guidance — that the response will be spoken, that contractions and fillers words belong, that bulleted lists and headers don't — has the biggest impact on how natural your speech sounds.

See our LLM prompting guide for a detailed breakdown and a copy-pasteable prompt template to use as a starting point