API Reference

SDKs

Community

LMNT

Playground

Support

LMNT is an API for text-to-speech and voice cloning. Welcome!

Introduction

Synthesizes speech from a text string. **Returns binary audio data** in one of many supported audio formats. This simplified version of synthesis can be directly used in HTML5 audio tags.

Synthesize speech

Synthesizes speech from a text string and provides advanced information about the synthesis. **Returns a JSON object** that contains a base64-encoded audio file, the seed used in speech generation, and optionally an object detailing the duration of each spoken word.

Stream text to our servers and receive synthesized speech in real-time.

Streaming WebSocket

Streaming speech synthesis

Submits a request to create a voice given configuration data and some source audio.

Create voice

Returns a list of voices available to you.

List voices

Voice info

Updates metadata for a specific voice. Only provided fields will be changed.

Update voice

Deletes a voice and cancels any pending operations on it. Cannot be undone.

Delete voice

Integrate LMNT with your exisiting workflows.

Overview

Reference for the Speech class in the Python SDK

Speech

Reference for the StreamingSynthesisConnection class in the Python SDK

StreamingSynthesisConnection

Reference for the Speech class in the Node.js SDK

Reference for the StreamingSynthesisConnection class in the Node.js SDK

Reference

Account info

Environment setup

Text-to-speech example

Streaming example

Optimizing latency

Optimizing quality (voice cloning)

Overriding pronunciations

Synchronizing timing

LMNT supports multiple languages. Learn how to use them here.

Languages

Learn how to use LMNT in your Vercel apps.

Overview

Getting started

Guides

Integrations

Migrations

Introduction

Python SDK

NodeJS SDK

Unity SDK

REST API