LMNT is an API for text-to-speech and voice cloning. Welcome!

Introduction

LMNT

Environment setup

Text-to-speech example

How to use the speech session API to stream text to the server and receive synthesized speech in real-time.

Speech session example

Learn more about our flagship model: Blizzard

Models

Voice cloning

Optimizing latency

Synchronizing timing

LMNT supports multiple languages. Learn how to use them here.

Languages

Learn how to use LMNT in your Vercel apps.

Vercel

Vercel guide

ElevenLabs -> LMNT

Migrating from ElevenLabs to LMNT

PlayHT -> LMNT

Migrating from PlayHT to LMNT

Generates speech from text and streams the audio as binary data chunks in real-time as they are generated.

This is the recommended endpoint for most text-to-speech use cases. You can either stream the chunks for low-latency playback or collect all chunks to get the complete audio file.


Generate speech stream

Generates speech from text and returns a JSON object that contains a **base64-encoded audio string** and optionally word-level durations (timestamps).
This endpoint waits for the entire synthesis before responding, so it is not ideal for latency-sensitive applications.


Generate speech (detailed)

Speech session (WebSocket)

LMNT offers three distinct speech synthesis endpoints, each optimized for different use cases and integration patterns. Choose the right endpoint based on your text availability, latency requirements, and metadata needs.

Which endpoint should I use?

LMNT Speech API Endpoints - Usage Guide

Submits a request to create a voice given configuration data and some source audio.

Create voice

Returns a list of voices available to you.

List voices

Voice info

Updates metadata for a specific voice. Only provided fields will be changed.

Update voice

Deletes a voice and cancels any pending operations on it. Cannot be undone.

Delete voice

Account info

Integrate LMNT with your exisiting workflows.

Overview

SDKs

Reference

Record of changes across our models and API surfaces.

Product Updates

Synthesizes speech from a text string and returns the audio data as a binary stream.

Generate speech

Synthesizes speech from a text string and provides advanced information about the synthesis. **Returns a JSON object** that contains a base64-encoded audio file, the seed used in speech generation, and optionally an object detailing the duration of each spoken word.

Synthesizes speech from a text string and returns the audio data as a binary stream. Uses query parameters instead of a form body. This simplified version of synthesis can be directly used in HTML5 audio tags.

Generate speech (simple)

Stream text to our servers and receive synthesized speech in real-time.

Converts speech from one voice to another.

Convert voice

Reference for the Speech class in the Python SDK

Speech

Reference for the StreamingSynthesisConnection class in the Python SDK

StreamingSynthesisConnection

Reference for the Speech class in the Node.js SDK v1

Reference for the StreamingSynthesisConnection class in the Node.js SDK v1

Our API provides access to two models: Aurora and Blizzard. This page explains how to choose the right model for your needs.

Overview

Getting started

Guides

Integrations

Migrations

Introduction

Python SDK

NodeJS SDK

Unity SDK

REST API