In this quickstart, we’ll create a voice AI agent using LiveKit that can have real-time conversations with users. This example demonstrates how to integrate LMNT into LiveKit’s multimodal agent framework.

Set up your project

Create a project directory

mkdir livekit-lmnt-agent && cd livekit-lmnt-agent

Set up a virtual environment

python -m venv venv
source venv/bin/activate

Install dependencies

pip install livekit-agents[lmnt,deepgram,openai,silero,turn-detector] python-dotenv

Configure the environment

Create a file named .env in your project directory and add:
LMNT_API_KEY=your_lmnt_api_key
LIVEKIT_URL=wss://your-livekit-server.com
LIVEKIT_API_KEY=your_livekit_api_key
LIVEKIT_API_SECRET=your_livekit_api_secret
DEEPGRAM_API_KEY=your_deepgram_api_key
OPENAI_API_KEY=your_openai_api_key
Replace the placeholder values with your actual API keys:

Create the agent

Create a file named agent.py:
from dotenv import load_dotenv
from livekit import agents
from livekit.agents import AgentSession, Agent
from livekit.plugins import (
    openai,
    lmnt,
    deepgram,
    silero,
)
from livekit.plugins.turn_detector.multilingual import MultilingualModel

load_dotenv()


class VoiceAssistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions=(
                "You are a helpful voice assistant. "
                "Keep your responses concise and conversational. "
                "Avoid using punctuation that doesn't translate well to speech."
            )
        )


async def entrypoint(ctx: agents.JobContext):
    session = AgentSession(
        stt=deepgram.STT(model="nova-2", language="en-US"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=lmnt.TTS(
            voice="leah",   # Voice ID from LMNT library
        ),
        vad=silero.VAD.load(),  # Voice activity detection
        turn_detection=MultilingualModel(),  # Contextual turn detection
        preemptive_generation=True,  # Preemptive generation for faster response times
    )

    await session.start(
        room=ctx.room,
        agent=VoiceAssistant(),
    )

    await session.generate_reply(
        instructions="Greet the user and ask how you can help them today."
    )


if __name__ == "__main__":
    agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint))

Run the agent

Start your agent:
python agent.py dev
The agent will connect to your LiveKit server and wait for participants to join rooms. When someone joins a room, the agent will automatically start a conversation.

Understanding the code

Let’s examine the key components:

Agent Class Definition

class VoiceAssistant(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions=(
                "You are a helpful voice assistant. "
                "Keep your responses concise and conversational. "
                "Avoid using punctuation that doesn't translate well to speech."
            )
        )
The agent class defines the personality and behavior of your voice assistant.

LMNT TTS Configuration

tts=lmnt.TTS(
    model="blizzard",           # High-quality TTS model
    voice="leah",               # Voice ID from LMNT library
    language="en",              # ISO 639-1 language code
    temperature=0.7,            # Speech expressiveness (0.3-1.0)
    top_p=0.9,                 # Speech generation stability
)
The LMNT TTS service supports these parameters:
  • model: TTS model (default: “blizzard”)
  • voice: Voice ID from LMNT’s voice library
  • language: Two-letter ISO 639-1 language code
  • temperature: Controls expressiveness - lower values (0.3) for neutral speech, higher (1.0) for dynamic range
  • top_p: Controls stability - lower values for consistency, higher for flexibility

Agent Session Pipeline

session = AgentSession(
    stt=deepgram.STT(model="nova-2"),  # Speech-to-text
    llm=openai.LLM(model="gpt-4o-mini"),  # Language model
    tts=lmnt.TTS(...),                 # Text-to-speech with LMNT
    vad=silero.VAD.load(),             # Voice activity detection
    turn_detection=MultilingualModel(), # Contextual turn detection
    preemptive_generation=True, # Preemptive generation for faster response times
)
This creates a complete STT-LLM-TTS pipeline with:
  • Speech recognition with Deepgram Nova-2 model
  • Language generation with OpenAI GPT-4o-mini
  • Voice synthesis with LMNT
  • Voice activity detection for turn-taking

Customize your agent

Try these modifications to enhance your agent:

Testing your agent

To test your agent:
  1. Make sure your LiveKit server is running
  2. Clone LiveKit’s frontend example and run it with your livekit room credentials
  3. Join a room - your agent will automatically connect and start the conversation
  4. Speak naturally and experience real-time voice interactions

Next steps