# LMNT Developer Documentation This file provides an overview of the LMNT API documentation and developer resources. ## Root URL LMNT Docs https://docs.lmnt.com ## Build ### First steps --- # Intro to LMNT URL: https://docs.lmnt.com/intro LMNT provides fast, lifelike, and affordable speech models. Our models excel at latency, voice cloning, accents, styles, languages, and more. --- Looking to play around with LMNT? Visit [the Playground](https://app.lmnt.com). LMNT offers two primary surfaces to build with our models, each suited for different use cases. | | Speech API | Speech Sessions API | | --- | --- | --- | | **What it is** | Turn text into speech | Stream text in from an LLM, realtime speech out | | **Best for** | Content with preproduced text like voiceovers, localization, narration, advertisements, audiobooks, etc | Turning your favorite LLM into a realtime voice agent, and keeping your voices consistent as you upgrade LLMs | | **Learn more** | [Speech API docs](/build-with-lmnt/speech-api) | [Speech Sessions API docs](/build-with-lmnt/speech-sessions-api) | ## Recommended path for new developers Follow these steps to go from zero to a working LMNT integration. Set up your environment, install an SDK, and generate your first speech with LMNT. [Go to the Quickstart](/quickstart) Learn the core request structure, how to get word timestamps if you want them, and generating conversational speech. [Read the Speech API guide](/build-with-lmnt/speech-api) Discover what LMNT can do, including voice cloning, accents, and languages. [Browse the features overview](/build-with-lmnt/overview) --- ## Develop with LMNT Tools to help you build and scale your applications with LMNT. Generate speech & clone voices in your browser. Explore the full LMNT API and client SDK documentation. Best practices for steering voice and text prompts to generate the speech you want. --- ## Key capabilities Create custom voices from 5–10 seconds of reference speech. Generate speech in 31 languages with native code-switching. --- # Get started with LMNT URL: https://docs.lmnt.com/quickstart Make your first API call to LMNT and build a simple storyteller. --- ## Prerequisites * An [LMNT account](https://app.lmnt.com) * An [API key](https://app.lmnt.com/settings/api) ## Call the API Get your API key from the [Playground](https://app.lmnt.com/settings/api) and set it as an environment variable: ```sh export LMNT_API_KEY=your-api-key ``` To persist the key across shell sessions, add the line to your shell profile (such as `~/.zshrc` or `~/.bashrc`). ```sh pip install lmnt ``` Save this as `quickstart.py`: ```python from lmnt import Lmnt client = Lmnt() text = ( 'The lazy yellow dog was caught by the slow red fox ' 'as he lay sleeping in the sun.' ) with client.speech.with_streaming_response.generate( text=text, voice='leah', ) as response: response.stream_to_file('story.mp3') ``` ```sh python quickstart.py ```

Example output

{/* refresh-response-examples: regenerate this example output by running the snippet above against the live API */} ```text wrote ~32 KB to story.mp3 ``` Get your API key from the [Playground](https://app.lmnt.com/settings/api) and set it as an environment variable: ```sh export LMNT_API_KEY=your-api-key ``` To persist the key across shell sessions, add the line to your shell profile (such as `~/.zshrc` or `~/.bashrc`). ```sh npm install lmnt-node ``` Save this as `quickstart.ts`: ```typescript import { createWriteStream } from 'fs'; import { pipeline } from 'stream/promises'; import Lmnt from 'lmnt-node'; const client = new Lmnt(); const text = 'The lazy yellow dog was caught by the slow red fox ' + 'as he lay sleeping in the sun.'; const response = await client.speech.generate({ text, voice: 'leah', }).asResponse(); await pipeline(response.body, createWriteStream('story.mp3')); ``` ```sh npx tsx quickstart.ts ```

Example output

{/* refresh-response-examples: regenerate this example output by running the snippet above against the live API */} ```text wrote ~32 KB to story.mp3 ``` Get your API key from the [Playground](https://app.lmnt.com/settings/api) and set it as an environment variable: ```sh export LMNT_API_KEY=your-api-key ``` To persist the key across shell sessions, add the line to your shell profile (such as `~/.zshrc` or `~/.bashrc`). Save this as `quickstart.sh`: ```sh curl --request POST \ --url https://api.lmnt.com/v1/ai/speech/bytes \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'Content-Type: application/json' \ --data '{ "text": "The lazy yellow dog was caught by the slow red fox as he lay sleeping in the sun.", "voice": "leah" }' \ --output story.mp3 ``` ```sh bash quickstart.sh ```

Example output

{/* refresh-response-examples: regenerate this example output by running the snippet above against the live API */} ```text wrote ~32 KB to story.mp3 ``` ## Next steps You've made your first API call. Next, learn the Speech API patterns you'll use in every LMNT integration. Learn how to get word timestamps if you want them, generating conversational speech, and other core patterns. Once you're comfortable with the basics, explore further: Browse all LMNT capabilities Reference documentation for Python & TypeScript client libraries ### Building with LMNT --- # Features overview URL: https://docs.lmnt.com/build-with-lmnt/overview Explore LMNT's advanced features and capabilities. --- LMNT's capabilities are organized into two main areas: * **Model capabilities:** Control how LMNT generates speech that matches the feeling your want. * **Streaming & realtime:** Realtime serving for latency sensitive use cases like voice agents. If you're new, familiarize yourself with [model capabilities](#model-capabilities) first. ## Model capabilities Ways to steer the model to generate speech that matches the feeling you're looking for. | Feature | Description | | --- | --- | | [Voice cloning](/build-with-lmnt/voice-cloning) | Create custom voices from 5–10 seconds of reference speech. | | [Accents](/build-with-lmnt/accents) | Steer the accent of generated speech. | | [Languages](/build-with-lmnt/languages) | Generate speech in 31 languages with native code-switching. | | [Word timestamps](/build-with-lmnt/word-timestamps) | Get precise per-word timing to sync subtitles, lip movement, and other modalities. | ## Streaming & realtime Ways to meet latency deadlines for your specific use cases. | Feature | Description | | --- | --- | | [Speech API](/build-with-lmnt/speech-api) | Streams generated speech to you. Full text must be known ahead of time. | | [Speech Sessions API](/build-with-lmnt/speech-sessions-API) | Stream in text from an LLM, and LMNT streams speech back to you. Great for voice agents. | --- # Using the Speech API URL: https://docs.lmnt.com/build-with-lmnt/speech-api Practical patterns for using the Speech API effectively. --- LMNT offers two primary surfaces to build with our models, each suited for different use cases. | | Speech API | Speech Sessions API | | --- | --- | --- | | **What it is** | Turn text into speech | Stream text in from an LLM, realtime speech out | | **Best for** | Content with preproduced text like voiceovers, localization, narration, advertisements, audiobooks, etc | Turning your favorite LLM into a realtime voice agent, and keeping your voices consistent as you upgrade LLMs | | **Learn more** | [Speech API docs](/build-with-lmnt/speech-api) | [Speech Sessions API docs](/build-with-lmnt/speech-sessions-api) | This guide covers common patterns for working with the Speech API, including speech generation, getting word timestamps, and making the generated speech sound more human. For complete API specifications, see the [Speech API reference](/api/speech/generate). ## Basic speech generation ```python Python import asyncio from lmnt import AsyncLmnt async def main(): client = AsyncLmnt() async with client.speech.with_streaming_response.generate( text=( "Uhh, did you see the weather in Palo Alto tomorrow? " "Yeah, can't believe it's gonna rain, dude. Like what?" ), voice='leah', ) as response: await response.stream_to_file('hello.mp3') asyncio.run(main()) ``` ```typescript TypeScript import { createWriteStream } from 'fs'; import { pipeline } from 'stream/promises'; import Lmnt from 'lmnt-node'; const client = new Lmnt(); const response = await client.speech.generate({ text: "Uhh, did you see the weather in Palo Alto tomorrow? " + "Yeah, can't believe it's gonna rain, dude. Like what?", voice: 'leah', }).asResponse(); await pipeline(response.body, createWriteStream('hello.mp3')); ``` ```sh cURL curl --request POST \ --url https://api.lmnt.com/v1/ai/speech/bytes \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' \ --header 'Content-Type: application/json' \ --data '{ "text": "Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can'\''t believe it'\''s gonna rain, dude. Like what?", "voice": "leah" }' \ --output hello.mp3 ``` ## Speech generation with word timestamps The Speech API allows you to get exact word timestamps when you need to sync your generated speech with subtitles, lip movement, or other modalities. ```python Python import asyncio import base64 from lmnt import AsyncLmnt async def main(): client = AsyncLmnt() response = await client.speech.generate_detailed( text=( "Uhh, did you see the weather in Palo Alto tomorrow? " "Yeah, can't believe it's gonna rain, dude. Like what?" ), voice='leah', format='mp3', return_timestamps=True, ) with open('hello.mp3', 'wb') as f: f.write(base64.b64decode(response.audio)) for t in response.timestamps or []: print(f'{t.start:.3f}s {t.text!r}') asyncio.run(main()) ``` ```typescript TypeScript import { writeFileSync } from 'fs'; import Lmnt from 'lmnt-node'; const client = new Lmnt(); const response = await client.speech.generateDetailed({ text: "Uhh, did you see the weather in Palo Alto tomorrow? " + "Yeah, can't believe it's gonna rain, dude. Like what?", voice: 'leah', format: 'mp3', return_timestamps: true, }); writeFileSync('hello.mp3', Buffer.from(response.audio, 'base64')); for (const t of response.timestamps ?? []) { console.log(`${t.start.toFixed(3)}s ${JSON.stringify(t.text)}`); } ``` ```sh cURL curl --request POST \ --url https://api.lmnt.com/v1/ai/speech \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' \ --header 'Content-Type: application/json' \ --data '{ "text": "Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can'\''t believe it'\''s gonna rain, dude. Like what?", "voice": "leah", "format": "mp3", "return_timestamps": true }' \ | jq -r .audio | base64 -d > hello.mp3 ``` ## Generating conversational speech Conversational speech (talking to someone) feels quite different than read speech (audiobooks), and your text prompt has a big impact on the speech generated by the model. See our [text prompting guide](/prompt-engineering/text-prompting) to learn more. --- # Using the Speech Sessions API URL: https://docs.lmnt.com/build-with-lmnt/speech-sessions-api Practical patterns for building realtime speech experiences using your favorite LLM + LMNT. --- LMNT offers two primary surfaces to build with our models, each suited for different use cases. | | Speech API | Speech Sessions API | | --- | --- | --- | | **What it is** | Turn text into speech | Stream text in from an LLM, realtime speech out | | **Best for** | Content with preproduced text like voiceovers, localization, narration, advertisements, audiobooks, etc | Turning your favorite LLM into a realtime voice agent, and keeping your voices consistent as you upgrade LLMs | | **Learn more** | [Speech API docs](/build-with-lmnt/speech-api) | [Speech Sessions API docs](/build-with-lmnt/speech-sessions-api) | This guide covers common patterns for working with the Speech Sessions API, including streaming text from your LLM & streaming speech out, when to flush, handling user interruptions, and getting your LLM to produce conversational text. For complete API specifications, see the [Speech Sessions API reference](/api/speech-sessions/create). ## Basic text streaming in and speech streaming out ```python Python import asyncio from anthropic import AsyncAnthropic from lmnt import AsyncLmnt DEFAULT_PROMPT = 'Read me an excerpt of a short sci-fi story in the public domain.' VOICE_ID = 'elowen' async def main(): client = AsyncLmnt() session = await client.speech.sessions.create(voice=VOICE_ID) await asyncio.gather(reader_task(session), writer_task(session)) await session.close() async def reader_task(session): """Streams audio data from LMNT and writes it to `output.mp3`.""" with open('output.mp3', 'wb') as f: async for message in session: if message.type == 'audio': f.write(message.audio) async def writer_task(session): """Streams text from Claude to LMNT.""" anthropic = AsyncAnthropic() async with anthropic.messages.stream( model='claude-sonnet-4-6', max_tokens=1024, messages=[{'role': 'user', 'content': DEFAULT_PROMPT}], ) as stream: async for text in stream.text_stream: await session.send_text(text) print(text, end='', flush=True) # After `send_finish` is called, the server closes the session # once it has finished generating speech. await session.send_finish() asyncio.run(main()) ``` ```typescript TypeScript import Lmnt from 'lmnt-node'; import Anthropic from '@anthropic-ai/sdk'; import { createWriteStream } from 'fs'; const DEFAULT_PROMPT = 'Read me an excerpt of a short sci-fi story in the public domain.'; const VOICE_ID = 'elowen'; const main = async () => { const lmnt = new Lmnt(); const session = lmnt.speech.sessions.create({ voice: VOICE_ID }); const anthropic = new Anthropic(); const writerTask = async () => { // Streams text from Claude to LMNT. const stream = anthropic.messages.stream({ model: 'claude-sonnet-4-6', max_tokens: 1024, messages: [{ role: 'user', content: DEFAULT_PROMPT }], }); for await (const text of stream.textStream) { session.sendText(text); process.stdout.write(text); } // After `sendFinish` is called, the server closes the session // once it has finished generating speech. session.sendFinish(); }; const readerTask = async () => { // Streams audio data from LMNT and writes it to `output.mp3`. const audioFile = createWriteStream('output.mp3'); for await (const message of session) { if (message.type === 'audio') { audioFile.write(message.audio); } } }; await Promise.all([writerTask(), readerTask()]); session.close(); }; main(); ``` ```python Python import asyncio from google import genai from lmnt import AsyncLmnt DEFAULT_PROMPT = 'Read me an excerpt of a short sci-fi story in the public domain.' VOICE_ID = 'elowen' async def main(): client = AsyncLmnt() session = await client.speech.sessions.create(voice=VOICE_ID) await asyncio.gather(reader_task(session), writer_task(session)) await session.close() async def reader_task(session): """Streams audio data from LMNT and writes it to `output.mp3`.""" with open('output.mp3', 'wb') as f: async for message in session: if message.type == 'audio': f.write(message.audio) async def writer_task(session): """Streams text from Gemini to LMNT.""" ai = genai.Client() response = await ai.aio.models.generate_content_stream( model='gemini-2.5-flash', contents=DEFAULT_PROMPT, ) async for chunk in response: if not chunk.text: continue await session.send_text(chunk.text) print(chunk.text, end='', flush=True) # After `send_finish` is called, the server closes the session # once it has finished generating speech. await session.send_finish() asyncio.run(main()) ``` ```typescript TypeScript import Lmnt from 'lmnt-node'; import { GoogleGenAI } from '@google/genai'; import { createWriteStream } from 'fs'; const DEFAULT_PROMPT = 'Read me an excerpt of a short sci-fi story in the public domain.'; const VOICE_ID = 'elowen'; const main = async () => { const lmnt = new Lmnt(); const session = lmnt.speech.sessions.create({ voice: VOICE_ID }); const ai = new GoogleGenAI(); const writerTask = async () => { // Streams text from Gemini to LMNT. const response = await ai.models.generateContentStream({ model: 'gemini-2.5-flash', contents: DEFAULT_PROMPT, }); for await (const chunk of response) { const text = chunk.text || ''; if (!text) continue; session.sendText(text); process.stdout.write(text); } // After `sendFinish` is called, the server closes the session // once it has finished generating speech. session.sendFinish(); }; const readerTask = async () => { // Streams audio data from LMNT and writes it to `output.mp3`. const audioFile = createWriteStream('output.mp3'); for await (const message of session) { if (message.type === 'audio') { audioFile.write(message.audio); } } }; await Promise.all([writerTask(), readerTask()]); session.close(); }; main(); ``` ```python Python import asyncio from lmnt import AsyncLmnt from openai import AsyncOpenAI DEFAULT_PROMPT = 'Read me an excerpt of a short sci-fi story in the public domain.' VOICE_ID = 'elowen' async def main(): client = AsyncLmnt() session = await client.speech.sessions.create(voice=VOICE_ID) await asyncio.gather(reader_task(session), writer_task(session)) await session.close() async def reader_task(session): """Streams audio data from LMNT and writes it to `output.mp3`.""" with open('output.mp3', 'wb') as f: async for message in session: if message.type == 'audio': f.write(message.audio) async def writer_task(session): """Streams text from OpenAI to LMNT.""" openai = AsyncOpenAI() response = await openai.chat.completions.create( model='gpt-4o-mini', messages=[{'role': 'user', 'content': DEFAULT_PROMPT}], stream=True, ) async for chunk in response: if (not chunk.choices[0] or not chunk.choices[0].delta or not chunk.choices[0].delta.content): continue content = chunk.choices[0].delta.content await session.send_text(content) print(content, end='', flush=True) # After `send_finish` is called, the server closes the session # once it has finished generating speech. await session.send_finish() asyncio.run(main()) ``` ```typescript TypeScript import Lmnt from 'lmnt-node'; import OpenAI from 'openai'; import { createWriteStream } from 'fs'; const DEFAULT_PROMPT = 'Read me an excerpt of a short sci-fi story in the public domain.'; const VOICE_ID = 'elowen'; const main = async () => { const lmnt = new Lmnt(); const session = lmnt.speech.sessions.create({ voice: VOICE_ID }); const openai = new OpenAI(); const writerTask = async () => { // Streams text from OpenAI to LMNT. const chatStream = await openai.chat.completions.create({ model: 'gpt-4o-mini', messages: [{ role: 'user', content: DEFAULT_PROMPT }], stream: true, }); for await (const part of chatStream) { const text = part.choices[0]?.delta?.content || ''; if (!text) continue; session.sendText(text); process.stdout.write(text); } // After `sendFinish` is called, the server closes the session // once it has finished generating speech. session.sendFinish(); }; const readerTask = async () => { // Streams audio data from LMNT and writes it to `output.mp3`. const audioFile = createWriteStream('output.mp3'); for await (const message of session) { if (message.type === 'audio') { audioFile.write(message.audio); } } }; await Promise.all([writerTask(), readerTask()]); session.close(); }; main(); ``` ## Flushing when your LLM has finished a turn As you stream text in, your speech session buffers a small amount of text while it waits for enough context to generate natural-sounding speech. Call `flush` the moment your LLM is done streaming text. The speech session generates all remaining text immediately and the connection stays open, ready for the next turn. ```python Python async for chunk in llm_stream: text = extract_text(chunk) if text: await session.send_text(text) await session.send_flush() ``` ```typescript TypeScript for await (const chunk of llmStream) { const text = extractText(chunk); if (text) { session.sendText(text); } } session.sendFlush(); ``` If you forget to call `flush` (or `finish`), the last bit of text the LLM produced will sit in the buffer indefinitely. Be careful when you send `flush`. If you send `flush` at arbitrary points instead of end of turn, your speech may sound less natural. ## Getting your LLM to sound conversational LLMs default to formal, structured responses that sound robotic when spoken aloud. Prompting them with explicit guidance — that the response will be spoken, that contractions and fillers words belong, that bulleted lists and headers don't — has the biggest impact on how natural your speech sounds. See our [LLM prompting guide](/prompt-engineering/llm-prompting) for a detailed breakdown and a copy-pasteable prompt template to use as a starting point --- # Agentic coding tools URL: https://docs.lmnt.com/build-with-lmnt/agentic-coding-tools Set up Claude Code, Codex, Augment Code, and other agentic coding tools to write LMNT integrations with up-to-date context. --- ## Direct your agent to our LLM-friendly docs view The key is to get your agent cross-referencing the docs when it needs to write LMNT-related code. We've structured our `llms.txt` so your agent can progressively load context as necessary and save you tokens. Your agent should know where it stores its memories, so who better to ask? ```text Please save this knowlege in my CLAUDE.md, AGENTS.md, or equivalent project memory. Whenever a task involves LMNT, first read https://docs.lmnt.com/llms.txt and follow the links there to whichever docs are relevant. ``` ```text Read https://docs.lmnt.com/llms.txt before answering. Follow the links there to whichever docs are relevant to the task. ``` ## Fun prompts to try To give you inspiration, here are some prompts we've had fun with: ```text Read https://docs.lmnt.com/llms.txt before answering. Follow the links there to whichever docs are relevant to the task. Create a rust app that reads the latest top 3 headlines in a newscaster style from https://text.npr.org/ using the 'brandon' voice. ``` ```text Read https://docs.lmnt.com/llms.txt before answering. Follow the links there to whichever docs are relevant to the task. Build a streaming voice agent: pipe GPT-4 into LMNT Speech Sessions with sub-300ms time-to-first-audio. ``` ```text Read https://docs.lmnt.com/llms.txt before answering. Follow the links there to whichever docs are relevant to the task. Clone my voice from a 30-second sample, then turn every Markdown post in my blog into a narrated MP3. ``` ```text Read https://docs.lmnt.com/llms.txt before answering. Follow the links there to whichever docs are relevant to the task. Make a Spanish flashcard CLI that pronounces each word with a native Madrid accent. ``` ```text Read https://docs.lmnt.com/llms.txt before answering. Follow the links there to whichever docs are relevant to the task. Generate a bedtime-story app: kid types a topic, gets back a soothing audio story. ``` --- # Optimizing latency URL: https://docs.lmnt.com/build-with-lmnt/optimizing-latency Tips for getting latency down as low as possible with LMNT's speech models --- ## Best practices Get speech chunks as soon as they're generated. Your reader can write or play them immediately instead of waiting for the whole speech generation to finish first. ```python Python # Use with_streaming_response.generate to get speech as it's generated. with client.speech.with_streaming_response.generate( text=( "Uhh, did you see the weather in Palo Alto tomorrow? " "Yeah, can't believe it's gonna rain, dude. Like what?" ), voice='leah', ) as response: response.stream_to_file('hello.mp3') ``` ```typescript TypeScript // .asResponse() exposes the raw streaming body to pipe to a file or player. const response = await client.speech.generate({ text: "Uhh, did you see the weather in Palo Alto tomorrow? " + "Yeah, can't believe it's gonna rain, dude. Like what?", voice: 'leah', }).asResponse(); await pipeline(response.body, createWriteStream('hello.mp3')); ``` ```sh cURL # Streaming is on by default; --output writes chunks as they arrive. curl --request POST \ --url https://api.lmnt.com/v1/ai/speech/bytes \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' \ --header 'Content-Type: application/json' \ --data '{"text": "Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can'\''t believe it'\''s gonna rain, dude. Like what?", "voice": "leah"}' \ --output hello.mp3 ``` Use asynchronous tasks so handling one chunk doesn't block receiving the next. For example, play or write each chunk in one task while the stream from LMNT continues feeding more in. Our primary servers are located in the United States. Although we support streaming worldwide, users in the U.S. are most likely to experience the lowest latency. If you have any specific geographic constraints, reach out (hello@lmnt.com). If you know what language the text is in, specify it in the `language` parameter. This will skip language detection and generate the speech faster. Speech Sessions are always in streaming mode. Make sure your LLM is in streaming mode too and forward each text fragment to the session immediately. ```python Python async with anthropic.messages.stream( model='claude-sonnet-4-6', max_tokens=1024, messages=[{'role': 'user', 'content': prompt}], ) as stream: async for text in stream.text_stream: await session.send_text(text) ``` ```typescript TypeScript const stream = anthropic.messages.stream({ model: 'claude-sonnet-4-6', max_tokens: 1024, messages: [{ role: 'user', content: prompt }], }); for await (const text of stream.textStream) { session.sendText(text); } ``` ```python Python response = await ai.aio.models.generate_content_stream( model='gemini-2.5-flash', contents=prompt, ) async for chunk in response: if chunk.text: await session.send_text(chunk.text) ``` ```typescript TypeScript const response = await ai.models.generateContentStream({ model: 'gemini-2.5-flash', contents: prompt, }); for await (const chunk of response) { if (chunk.text) { session.sendText(chunk.text); } } ``` ```python Python response = await openai.chat.completions.create( model='gpt-4o-mini', messages=[{'role': 'user', 'content': prompt}], stream=True, ) async for chunk in response: content = chunk.choices[0].delta.content if content: await session.send_text(content) ``` ```typescript TypeScript const chatStream = await openai.chat.completions.create({ model: 'gpt-4o-mini', messages: [{ role: 'user', content: prompt }], stream: true, }); for await (const part of chatStream) { const text = part.choices[0]?.delta?.content || ''; if (text) { session.sendText(text); } } ``` Use asynchronous tasks to stream data concurrently. One writes LLM text fragments into the session, the other reads the generated speech out. See the [Speech Session API guide](/build-with-lmnt/speech-sessions-api). Our primary servers are located in the United States. Although we support streaming worldwide, users in the U.S. are most likely to experience the lowest latency. If you have any specific geographic constraints, reach out (hello@lmnt.com). If you know what language the text is in, specify it in the `language` parameter. This will skip language detection and generate the speech faster. ### Model capabilities --- # Voice cloning URL: https://docs.lmnt.com/build-with-lmnt/voice-cloning Create voices to use with LMNT's models with 5-10 seconds of reference speech --- Voice cloning is the way you create and save voice prompts to use with LMNT's models. LMNT does the hard work to ensure your prompts are ready to serve your traffic with low latency at scale. ## Crafting a good voice prompt Treat the reference speech like you'd treat a prompt to an LLM: clear, focused, and representative of the style of output you want. See our [voice prompting guide](/prompt-engineering/voice-prompting). ## Creating a voice If you're trying things out or only creating a handful of voices, it's easiest to use our [Playground](https://app.lmnt.com/). Otherwise, upload your reference speech prompt through the Voice API and you'll get back a voice object with an `id` you can use in any speech call. ```python Python import asyncio import sys from lmnt import AsyncLmnt async def main(): client = AsyncLmnt() with open(sys.argv[1], 'rb') as audio: voice = await client.voices.create( name='my-voice', file=audio, ) print(f'Created voice: {voice.id}') asyncio.run(main()) ``` ```typescript TypeScript import { createReadStream } from 'fs'; import Lmnt from 'lmnt-node'; const client = new Lmnt(); const voice = await client.voices.create({ name: 'my-voice', file: createReadStream(process.argv[2]), }); console.log(`Created voice: ${voice.id}`); ``` ```sh cURL curl --request POST \ --url https://api.lmnt.com/v1/ai/voice \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' \ --form 'name=my-voice' \ --form 'file=@reference.mp3' ``` ## Next steps Best practices for creating great voice prompts Shape pronunciation, pacing, and emphasis with your input text --- # Accents URL: https://docs.lmnt.com/build-with-lmnt/accents LMNT's speech models can generate any accent; plus - your options to change accents on the fly --- ## Accents from voice prompts LMNT's models excel at voice cloning, so the accents in your voice prompts carry forward to your generated speech. You don't have to do anything special - this works out of the box. Learn more about how to craft great voice prompts in our [Voice prompting guide](/prompt-engineering/voice-prompting). ## Changing accents You can prompt the model to change accents on the fly by changing the `language` option. This results in speech that sounds distinctly like your voice prompt, but also now with an accent of someone who has been speaking that language for a long time. ```python Python import asyncio from lmnt import AsyncLmnt async def main(): client = AsyncLmnt() text = 'Hello, and uh, welcome to the show.' for language in ['en', 'fr', 'de']: async with client.speech.with_streaming_response.generate( text=text, voice='leah', language=language, ) as response: await response.stream_to_file(f'{language}.mp3') asyncio.run(main()) ``` ```typescript TypeScript import { createWriteStream } from 'fs'; import { pipeline } from 'stream/promises'; import Lmnt from 'lmnt-node'; const client = new Lmnt(); const text = 'Hello, and uh, welcome to the show.'; for (const language of ['en', 'fr', 'de'] as const) { const response = await client.speech.generate({ text, voice: 'leah', language, }).asResponse(); await pipeline(response.body, createWriteStream(`${language}.mp3`)); } ``` ```sh cURL TEXT="Hello, and uh, welcome to the show." for LANG in en fr de; do curl --request POST \ --url https://api.lmnt.com/v1/ai/speech/bytes \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' \ --header 'Content-Type: application/json' \ --data "{\"text\": \"$TEXT\", \"voice\": \"leah\", \"language\": \"$LANG\"}" \ --output "$LANG.mp3" done ``` The output: - `en.mp3` → spoken by a native English speaker. - `fr.mp3` → spoken in English with a French accent. - `de.mp3` → spoken in English with a German accent. ## Next steps Generate speech in any of the languages LMNT supports Best practices for crafting voice prompts that carry the accent you want --- # Languages URL: https://docs.lmnt.com/build-with-lmnt/languages LMNT's speech models fluently speak 31 languages with native code-switching. --- ## How native language prompting works By default, the model looks at your text and guesses the native language for your generated speech. But you can explicitly prompt the model for more control. This shapes the accent and how the model handles foreign words. ```python Python import asyncio from lmnt import AsyncLmnt async def main(): client = AsyncLmnt() text = 'Hello, and uh, welcome to the show.' for language in ['en', 'fr', 'de']: async with client.speech.with_streaming_response.generate( text=text, voice='leah', language=language, ) as response: await response.stream_to_file(f'{language}.mp3') asyncio.run(main()) ``` ```typescript TypeScript import { createWriteStream } from 'fs'; import { pipeline } from 'stream/promises'; import Lmnt from 'lmnt-node'; const client = new Lmnt(); const text = 'Hello, and uh, welcome to the show.'; for (const language of ['en', 'fr', 'de'] as const) { const response = await client.speech.generate({ text, voice: 'leah', language, }).asResponse(); await pipeline(response.body, createWriteStream(`${language}.mp3`)); } ``` ```sh cURL TEXT="Hello, and uh, welcome to the show." for LANG in en fr de; do curl --request POST \ --url https://api.lmnt.com/v1/ai/speech/bytes \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' \ --header 'Content-Type: application/json' \ --data "{\"text\": \"$TEXT\", \"voice\": \"leah\", \"language\": \"$LANG\"}" \ --output "$LANG.mp3" done ``` The output: - `en.mp3` → spoken by a native English speaker. - `fr.mp3` → spoken in English with a French accent. - `de.mp3` → spoken in English with a German accent. ## Code switching Code switching is mixing text from two or more of the model's supported languages into the same text prompt, and is fully supported. ```python Python import asyncio from lmnt import AsyncLmnt async def main(): client = AsyncLmnt() text = 'Bonjour! Did you know that mariposa means butterfly in Spanish?' for language in ['en', 'fr', 'es']: async with client.speech.with_streaming_response.generate( text=text, voice='leah', language=language, ) as response: await response.stream_to_file(f'{language}.mp3') asyncio.run(main()) ``` ```typescript TypeScript import { createWriteStream } from 'fs'; import { pipeline } from 'stream/promises'; import Lmnt from 'lmnt-node'; const client = new Lmnt(); const text = 'Bonjour! Did you know that mariposa means butterfly in Spanish?'; for (const language of ['en', 'fr', 'es'] as const) { const response = await client.speech.generate({ text, voice: 'leah', language, }).asResponse(); await pipeline(response.body, createWriteStream(`${language}.mp3`)); } ``` ```sh cURL TEXT="Bonjour! Did you know that mariposa means butterfly in Spanish?" for LANG in en fr es; do curl --request POST \ --url https://api.lmnt.com/v1/ai/speech/bytes \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' \ --header 'Content-Type: application/json' \ --data "{\"text\": \"$TEXT\", \"voice\": \"leah\", \"language\": \"$LANG\"}" \ --output "$LANG.mp3" done ``` The output: - `en.mp3` → English speaker — `Bonjour` and `mariposa` sound foreign. - `fr.mp3` → French speaker — `butterfly` and `mariposa` sound foreign. - `es.mp3` → Spanish speaker — `Bonjour` and `butterfly` sound foreign. ## Use native scripts Our models have been trained on native scripts for all supported languages. For languages with non-Latin scripts, write your text in the language's native script. Romanized or transliterated text may not always be pronounced as you'd expect. ## Supported languages | Language | Code | |------------|------| | Arabic | `ar` | | Assamese | `as` | | Bengali | `bn` | | Chinese | `zh` | | Czech | `cs` | | Danish | `da` | | Dutch | `nl` | | English | `en` | | Finnish | `fi` | | French | `fr` | | German | `de` | | Hindi | `hi` | | Indonesian | `id` | | Italian | `it` | | Japanese | `ja` | | Korean | `ko` | | Malayalam | `ml` | | Marathi | `mr` | | Polish | `pl` | | Portuguese | `pt` | | Russian | `ru` | | Slovak | `sk` | | Spanish | `es` | | Swedish | `sv` | | Tamil | `ta` | | Telugu | `te` | | Thai | `th` | | Turkish | `tr` | | Ukrainian | `uk` | | Urdu | `ur` | | Vietnamese | `vi` | ## Next steps Pair a language with the right accent for a natural sound Turn your voice prompt into a saved voice, ready for low latency generation at scale --- # Word timestamps URL: https://docs.lmnt.com/build-with-lmnt/word-timestamps LMNT's models return word timestamps, enabling you to sync subtitles, lip movement, other modalities, and more with your generated speech. --- If you're producing video content, you often want to show subtitles. Use LMNT to get exact word timing with precisely the words being spoken, instead of relying on external subtitle providers that try to guess and may confuse similar sounding words. ## Getting timestamps with the Speech API ```python Python import asyncio from lmnt import AsyncLmnt async def main(): client = AsyncLmnt() response = await client.speech.generate_detailed( text=( "Uhh, did you see the weather in Palo Alto tomorrow? " "Yeah, can't believe it's gonna rain, dude. Like what?" ), voice='leah', return_timestamps=True, ) for chunk in response.timestamps or []: print(f'"{chunk.text}" starts at {chunk.start:.3f}s and lasts for {chunk.duration:.3f}s') asyncio.run(main()) ``` ```typescript TypeScript import Lmnt from 'lmnt-node'; const client = new Lmnt(); const response = await client.speech.generateDetailed({ text: "Uhh, did you see the weather in Palo Alto tomorrow? " + "Yeah, can't believe it's gonna rain, dude. Like what?", voice: 'leah', return_timestamps: true, }); for (const chunk of response.timestamps ?? []) { console.log(`"${chunk.text}" starts at ${chunk.start.toFixed(3)}s and lasts for ${chunk.duration.toFixed(3)}s`); } ``` ```sh cURL curl --request POST \ --url https://api.lmnt.com/v1/ai/speech \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' \ --header 'Content-Type: application/json' \ --data '{ "text": "Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can'\''t believe it'\''s gonna rain, dude. Like what?", "voice": "leah", "return_timestamps": true }' ``` ## Getting timestamps with the Speech Sessions API In the Speech Sessions API, word timestamps currently take longer to arrive than the generated speech. The generated speech continues to stream to you in realtime. ```python Python import asyncio from lmnt import AsyncLmnt async def main(): client = AsyncLmnt() session = await client.speech.sessions.create( voice='leah', return_timestamps=True, ) await session.send_text("Uhh, did you see the weather in Palo Alto tomorrow? ") await session.send_text("Yeah, can't believe it's gonna rain, dude. Like what?") await session.send_finish() async for message in session: if message.type == 'timestamps': for chunk in message.timestamps or []: print(f'"{chunk.text}" starts at {chunk.start:.3f}s and lasts for {chunk.duration:.3f}s') asyncio.run(main()) ``` ```typescript TypeScript import Lmnt from 'lmnt-node'; const lmnt = new Lmnt(); const session = lmnt.speech.sessions.create({ voice: 'leah', return_timestamps: true, }); session.sendText("Uhh, did you see the weather in Palo Alto tomorrow? "); session.sendText("Yeah, can't believe it's gonna rain, dude. Like what?"); session.sendFinish(); for await (const message of session) { if (message.type === 'timestamps') { for (const chunk of message.timestamps ?? []) { console.log(`"${chunk.text}" starts at ${chunk.start.toFixed(3)}s and lasts for ${chunk.duration.toFixed(3)}s`); } } } ``` ```sh websockets python3 <<'EOF' import asyncio, json, os, websockets async def main(): async with websockets.connect('wss://api.lmnt.com/v1/ai/speech/stream') as ws: await ws.send(json.dumps({ 'type': 'init', 'X-API-Key': os.environ['LMNT_API_KEY'], 'lmnt-version': '1.1', 'voice': 'leah', 'return_timestamps': True, })) await ws.send(json.dumps({ 'type': 'text', 'text': "Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", })) await ws.send(json.dumps({'type': 'finish'})) async for msg in ws: if isinstance(msg, str): data = json.loads(msg) if data.get('type') == 'timestamps': for ts in data['timestamps']: print(f'"{ts["text"]}" starts at {ts["start"]:.3f}s and lasts for {ts["duration"]:.3f}s') asyncio.run(main()) EOF ``` ### Prompt engineering --- # Prompt engineering overview URL: https://docs.lmnt.com/prompt-engineering/overview --- ## Before prompt engineering This guide assumes that you have: 1. A clear definition of the feeling you want generated for your use case. 2. Some reference examples of speech that fit your criteria. If not, we highly suggest you spend some time establishing those first. For voice-related guidance on LMNT's latest models, start here. For text-related guidance on LMNT's latest models, start here. --- ## When to prompt engineer This guide focuses on criteria that are controllable through voice and text prompt engineering. Not every success criteria or failing eval is best solved by prompt engineering — for example, latency can sometimes be fixed by ensuring your integration is streaming all the way to your users. --- ## How to prompt engineer All prompting techniques across voice and text — from accent to style to feeling — are covered in this living reference. See [Voice prompting](/prompt-engineering/voice-prompting) and [Text prompting](/prompt-engineering/text-prompting) to get started. --- --- # Voice prompting URL: https://docs.lmnt.com/prompt-engineering/voice-prompting Comprehensive guide to voice prompt engineering for LMNT's latest models. --- The right voice prompt is the difference between generated speech that is "just ok" and "wow". This living reference covers the elements you should think about and control for when crafting your voice prompts. Find some examples of speech online that give the feeling you're looking for, and use them as a reference as you walk through this guide. --- ## Acoustic basics ### Environment Are there background noises? Is there music? Low frequency hums from household appliances? It's best to pull your voice prompt from speech recorded in an environment that matches the one you want the model to produce. But if that's not possible, you can attempt to clean up in post processing. ### Room size and shape You can hear the room size and shape in an audio recording. A bathroom sounds different from a closet which sounds different from a cathedral. This is the room's impulse response: the pattern of reflections it adds to sound made inside. Hard surfaces bounce audio back, soft ones absorb it, and your ear stitches the reflections into a sense of space. If you want the model to produce speech with a studio sound, prompt with a studio-like recording. If you want the model to produce speech that sounds like it's in a hall, prompt with a recording in a hall. A blanket fort is an easy way to approximate an acoustically-treated room if you don't have access to a professional recording studio. ### Clipping Clipping is what happens when sound exceeds what the microphone can capture: the peaks get sliced off, and the result sounds harsh, crackly, and blown out. Think a weather reporter shouting into a hurricane. Or deep fried memes. When this happens information is gone, and no amount of post-processing will bring it back. If you don't want clipping you need a different recording. ### Sample rate A recording's sample rate caps the frequencies it can carry. There's [math behind it](https://en.wikipedia.org/wiki/Nyquist_rate), but the main thing to remember the max recordable frequency is half the sample rate. This has implications for your voice prompts: * Human speech generally goes up to ~10kHz. This means you need at least a 20kHz sample rate to capture the higher frequency details of sounds like 's' and 'f'. * If your voice prompt was recorded with a lower sample rate like 8kHZ or 16kHZ, it'll come out sounding muffled, thin, and tinny. The model will accept a voice prompt with any sample rate, but you'll get best results with 24kHz or higher. Be careful relying on an audio file's reported sample rate - the audio inside may have actually been originally recorded at a lower sample rate. When that happens you'll see a wall of missing higher frequencies. --- ## Vocal basics When deciding on your voice prompt, it's important to think about who's going to be listening to your generated speech, and what feeling you want to inspire in them. Here are some elements to think about. ### Age Do you want a young voice? An old voice? ### Gender Do you want a female voice? A male voice? A non-binary voice? ### Accents Where is the voice from? Which language(s) are they native speakers of? ### Texture and tone Are they a valley-girl with vocal fry? Are they a weathered rural voice in a political ad that implies "you can trust me, this politician is one of you too?" ### Emotions What emotions do you need? ### Distinct brand voice vs localized voices If you're generating speech in multiple languages, do you want a distinct voice shared across all languages? Or do you want different local voices for each language? --- ## Prosody basics Prosody is the musical component of speech, including rhythm, stress, intonation, and tempo. The model uses the prosody of the speech in your voice prompt (in addition to your text prompt) to help determine the prosody in the speech it generates. ### Spontaneous vs read speech The biggest and most obvious example of prosody you're familiar with is spontaneous speech vs read speech. #### Read speech Read speech is what you hear in audiobooks, scripted ads, and news broadcasts — someone reading words off a page. They can look ahead and know where they're going, so the pacing is even, the intonation predictable, and there are no ums and uhs. It sounds polished and performed. Prompt with read speech when you want narration, voiceover, or anything that should feel composed and authoritative. #### Spontaneous speech Spontaneous speech is what you hear in podcasts, interviews, and ordinary conversation. Pacing is uneven, intonation more dynamic, and the signs of thinking-on-the-fly show up: ums, restarts, breaths, and hesitations. Prompt with spontaneous speech when you want the output to feel conversational — a voice assistant, a character, a friend talking. If your voice prompt has disfluencies like "um" and "uhh" in it, the model is much more likely to automatically add disfluencies even if the text prompt does not explicitly include them. In a recording session and want spontaneous speech? Don't give a script. Have a conversation around a topic instead. ### Use voice prompts from your use case Use the voice prompt to show the model speech that feels right for your use cases. The model uses the text prompt as context to figure out the prosody you want, but the easier you make it for the model to know exactly what you want, the better results you'll get Some examples: * If you're building a customer support agent, use speech from a real support interaction. * If you're creating voiceovers for ads, use speech that feels like a voiceover from an ad. * If you're creating a dramatic narrator, use speech from a dramatic narration. Close your eyes, imagine your use case, and listen to your voice prompt. If you immediately feel the association, you probably have a good prompt. --- ## Next steps Shape pronunciation, pacing, and emphasis with your input text Turn your voice prompt into a saved voice, ready for low latency generation at scale --- # Text prompting URL: https://docs.lmnt.com/prompt-engineering/text-prompting Comprehensive guide to text prompt engineering for LMNT's latest models --- In general, you can put any text into LMNT's speech models and get something usable out. But if you spend a little bit of time improving your text prompt to fit your use case, your generated speech comes out next level. This living reference covers the elements you should think about and control for when crafting your text prompts. Find some examples of speech online that give the feeling you're looking for, and use them as a reference as you walk through this guide. We've bundled up the core guidance in this prompting guide into a starter prompt you can tune as you go. --- ## Text prompting basics ### Punctuation Use punctuation where you want to explicitly direct pacing. Imagine you're writing a script you want the model to perform. ### Paragraph breaks Use paragraph breaks to indicate larger, paragraph-level pauses. The model will pause appropriately, usually a little bit longer than it would between sentences. --- ## Prompting for speaking style ### Spontaneous vs read speech These days when people say generated speech feel robotic, they're not usually talking about the acoustic quality or even the vocal quality. They're usually saying that the generated speech feels out of context. The biggest wrong-context feeling comes from mixing up spontaneous speech vs read speech. #### Read speech Read speech is what you hear in audiobooks, scripted ads, and news broadcasts — someone reading words off a page. They can look ahead and know where they're going, so the pacing is even, the intonation is predictable, and there are no ums and uhs. It sounds polished and performed. #### Spontaneous speech Spontaneous speech is what you hear in podcasts, interviews, and ordinary conversation. Pacing is uneven, intonation more dynamic, and the signs of thinking-on-the-fly show up: ums, restarts, breaths, and hesitations. ### Prompting for spontaneous speech #### Use contractions and casual language Use words like `don't` instead of `do not`, or `I'll` instead of `I will`. #### Use filler words When people need time to think of the next thing to say, they give themselves more time by adding filler words. Add filler words like `um`, `uh`, `well`, `you know`, `I mean`, etc to your prompts. #### Signal pauses to think and hesitations Use `...`, `,` and other punctuation to interrupt the flow. #### Use natural transitions When something requires a mental context switch, add filler sentences in addition to filler words. For example, `So, um... the thing about...` or `Well, actually, that's a great question.` #### Keep text short & light A conversation is a back and forth. People generally don't monologue at each other. --- ## Written form vs spoken form escape hatch Language is written differently than it's spoken. For example, `$1` is actually spoken as `one dollar`. In general, the model does a pretty good job at translating written form to spoken form. But if you're running into trouble, try converting your text into a more explicit spoken form. ### Phone numbers `1-800-555-1234` is spoken more like `one eight hundred; five five five; one two three four` ### Email addresses `alice@example.com` is spoken more like `alice at example dot com` --- ## Next steps Pair your text prompt with a voice prompt that matches your use case Prompt your LLM so its output reads naturally when spoken aloud --- # Getting LLMs to sound human URL: https://docs.lmnt.com/prompt-engineering/llm-prompting Transform robotic LLM responses into natural, engaging speech --- When you connect an LLM to LMNT, the quality of the spoken output depends heavily on how you prompt the LLM. Humans don't talk in walls of text, but LLMs like to produce formal, structured walls of text that sound robotic when spoken aloud. ## When prompting your LLM

This has the biggest impact on naturalness.

LLMs avoid contractions and hesitations. Explicitly instruct them to use these patterns.

Guide the LLM on when to use filler words like "um" and "well" to sound more natural without overusing them.

Add explicit instructions for how to handle other difficult-to-pronounce text like phone numbers, if your use case needs it.

## Sample prompt template Here's a prompt template that you can copy and customize for your use case: ```text wrap Pretend you are a {{insert role}} doing {{insert task}} [SPEAKING STYLE] Your responses will be spoken aloud by a TTS system. Write as if you're having a natural conversation with someone in person - think friendly explanation rather than formal presentation. [NATURAL SPEECH PATTERNS] Use contractions and casual language ("I'll" not "I will") Include natural fillers and hesitations when appropriate: "um," "uh," "well," "so," "let me think," "you know," "I mean" Use thoughtful pauses (...) when you'd naturally pause Use natural transitions between ideas [WHEN TO USE FILLERS] When introducing a complex topic: "So, um... the thing about..." When you need a moment to think: "Let me see... I'd say..." When clarifying or correcting: "Well, actually, what I mean is..." When transitioning topics: "Now, um... moving on to..." [AVOID] Overusing any single filler Formal written language ("furthermore," "in conclusion") Perfect, polished sentences that sound robotic [INSTRUCTIONS] {{insert detailed instructions}} [FINAL CHECK] Before responding, read your answer aloud in your head - does it sound like natural human speech? ``` ## Snippets for difficult-to-pronounce text Some text is difficult to pronounce as-is, like phone numbers. To help the LLM handle these cases, paste these snippets into your prompt as needed. ### Phone numbers ```text wrap [PHONE NUMBER FORMATTING] When mentioning phone numbers, you MUST format them for optimal TTS pronunciation: - Convert standard phone numbers by spelling out digits individually - REMOVE all original parentheses, hyphens, periods, and spaces used for grouping - Insert semicolons (;) to mark natural pause points between logical groups of numbers (e.g., area code; prefix; line number) - SPECIAL CASE: If the number starts with 1-800, write it as "one eight hundred" - Example: "(555) 123-4567" -> "five five five; one two three; four five six seven" - Example: "1-800-555-1234" -> "one eight hundred; five five five; one two three four" ``` ## Before and after example **Without prompting** > "I apologize for the inconvenience you are experiencing with your account. > Please navigate to the account settings page and verify that your payment > information is current and accurate." **With conversational prompting** > "Oh, that's definitely frustrating - I totally get why you'd be concerned > about this. Let me help you sort this out. So, first thing we should check > is... let's take a look at your payment info in settings. Sometimes it's just > a card that needs updating, you know?" ## Common issues and solutions Adjust the role and reduce fillers while keeping contractions and natural transitions. Add more context about the specific conversation scenario and emphasize reading responses aloud. --- ## Next steps Learn more about the details of text prompting in general. Craft a voice prompt that matches the tone, environment, and prosody of your use case ### Integrations #### LiveKit --- # LiveKit + LMNT URL: https://docs.lmnt.com/integrations/livekit/introduction Build production-grade multimodal voice AI agents with LiveKit, a realtime framework for voice, video, and data streaming applications. --- # What is LiveKit? [LiveKit](https://docs.livekit.io/agents/) is a realtime framework for building production-grade multimodal and voice AI agents. It enables Python or Node.js programs to participate as full participants in LiveKit rooms, processing realtime audio, video, and data streams with WebRTC reliability. LiveKit has a built-in [LMNT integration](https://docs.livekit.io/agents/integrations/tts/lmnt/) that makes it easy to create high-performance voice AI agents with LMNT voices. The integration is optimized for realtime voice applications and supports voice clones, multilingual speech, and low-latency streaming. Try our interactive demo to experience a voice agent powered by LiveKit and LMNT. ## Key Features * **Production-Grade Performance**: LiveKit is designed for production deployments with WebRTC for reliable, low-latency communication. The framework handles network resilience, scalability, and real-time audio processing automatically. * **Multimodal AI Agents**: Build agents that can process voice, video, and text simultaneously. LiveKit supports comprehensive STT-LLM-TTS pipelines with state-of-the-art turn detection for natural conversations. * **Tool Use & Multi-Agent Handoff**: Compatible with any Large Language Model (LLM) for tool use and supports complex multi-agent workflows where different agents can handle specialized tasks. * **Open Source**: Fully open source under Apache 2.0 license, providing transparency and flexibility for customization. ## Next Steps Build your first LiveKit agent with LMNT speech generation in just a few minutes. --- # Quickstart URL: https://docs.lmnt.com/integrations/livekit/quickstart Build and deploy your first LiveKit agent using LMNT for speech generation --- In this quickstart, we'll create a voice AI agent using LiveKit that can have real-time conversations with users. This example demonstrates how to integrate LMNT into LiveKit's multimodal agent framework. ## Set up your project ### Create a project directory ```bash mkdir livekit-lmnt-agent && cd livekit-lmnt-agent ``` ### Set up a virtual environment ```bash python -m venv venv source venv/bin/activate ``` ```bash python -m venv venv source venv/Scripts/activate # If using Git Bash # OR .\venv\Scripts\activate # If using Command Prompt # OR .\venv\Scripts\Activate.ps1 # If using PowerShell ``` ### Install dependencies ```bash pip install livekit-agents[lmnt,deepgram,openai,silero,turn-detector] python-dotenv ``` ## Configure the environment Create a file named `.env` in your project directory and add: ```env LMNT_API_KEY=your_lmnt_api_key LIVEKIT_URL=wss://your-livekit-server.com LIVEKIT_API_KEY=your_livekit_api_key LIVEKIT_API_SECRET=your_livekit_api_secret DEEPGRAM_API_KEY=your_deepgram_api_key OPENAI_API_KEY=your_openai_api_key ``` Replace the placeholder values with your actual API keys: - Get your LMNT API key from the [LMNT playground](https://app.lmnt.com/account) - Set up a LiveKit server or use [LiveKit Cloud](https://cloud.livekit.io) - Get your Deepgram API key from [Deepgram Console](https://console.deepgram.com/) - Get your OpenAI API key from [OpenAI Platform](https://platform.openai.com/api-keys) ## Create the agent Create a file named `agent.py`: ```python from dotenv import load_dotenv from livekit import agents from livekit.agents import AgentSession, Agent from livekit.plugins import ( openai, lmnt, deepgram, silero, ) from livekit.plugins.turn_detector.multilingual import MultilingualModel load_dotenv() class VoiceAssistant(Agent): def __init__(self) -> None: super().__init__( instructions=( "You are a helpful voice assistant. " "Keep your responses concise and conversational. " "Avoid using punctuation that doesn't translate well to speech." ) ) async def entrypoint(ctx: agents.JobContext): session = AgentSession( stt=deepgram.STT(model="nova-2", language="en-US"), llm=openai.LLM(model="gpt-4o-mini"), tts=lmnt.TTS( voice="leah", # Voice ID from LMNT library ), vad=silero.VAD.load(), # Voice activity detection turn_detection=MultilingualModel(), # Contextual turn detection preemptive_generation=True, # Preemptive generation for faster response times ) await session.start( room=ctx.room, agent=VoiceAssistant(), ) await session.generate_reply( instructions="Greet the user and ask how you can help them today." ) if __name__ == "__main__": agents.cli.run_app(agents.WorkerOptions(entrypoint_fnc=entrypoint)) ``` ## Run the agent Start your agent: ```bash python agent.py dev ``` The agent will connect to your LiveKit server and wait for participants to join rooms. When someone joins a room, the agent will automatically start a conversation. ## Understanding the code Let's examine the key components: ### Agent Class Definition ```python class VoiceAssistant(Agent): def __init__(self) -> None: super().__init__( instructions=( "You are a helpful voice assistant. " "Keep your responses concise and conversational. " "Avoid using punctuation that doesn't translate well to speech." ) ) ``` The agent class defines the personality and behavior of your voice assistant. ### LMNT TTS Configuration ```python tts=lmnt.TTS( model="blizzard", # High-quality TTS model voice="leah", # Voice ID from LMNT library language="en", # ISO 639-1 language code temperature=0.7, # Speech expressiveness (0.3-1.0) top_p=0.9, # Speech generation stability ) ``` The LMNT TTS service supports these parameters: - `model`: TTS model (default: "blizzard") - `voice`: Voice ID from [LMNT's voice library](https://app.lmnt.com/voice-library) - `language`: Two-letter ISO 639-1 language code - `temperature`: Controls expressiveness - lower values (0.3) for neutral speech, higher (1.0) for dynamic range - `top_p`: Controls stability - lower values for consistency, higher for flexibility ### Agent Session Pipeline ```python session = AgentSession( stt=deepgram.STT(model="nova-2"), # Speech-to-text llm=openai.LLM(model="gpt-4o-mini"), # Language model tts=lmnt.TTS(...), # Text-to-speech with LMNT vad=silero.VAD.load(), # Voice activity detection turn_detection=MultilingualModel(), # Contextual turn detection preemptive_generation=True, # Preemptive generation for faster response times ) ``` This creates a complete STT-LLM-TTS pipeline with: - Speech recognition with Deepgram Nova-2 model - Language generation with OpenAI GPT-4o-mini - Speech generation with LMNT - Voice activity detection for turn-taking ## Customize your agent Try these modifications to enhance your agent: Update the voice ID to use a different LMNT voice: ```python tts=lmnt.TTS( model="blizzard", voice="morgan", # British female voice language="en", temperature=0.7, top_p=0.9, ) ``` Find more voices at [LMNT's voice library](https://app.lmnt.com/voice-library). Configure the TTS for different languages: ```python tts=lmnt.TTS( model="blizzard", voice="your_voice_id", language="es", # Spanish temperature=0.7, top_p=0.9, ) ``` Update your agent instructions to match the target language. Adjust expressiveness and stability: ```python tts=lmnt.TTS( model="blizzard", voice="leah", language="en", temperature=0.4, # More neutral speech top_p=0.7, # More consistent delivery ) ``` - `temperature`: 0.3 (neutral) to 1.0 (expressive) - `top_p`: Lower values for consistency, higher for flexibility Modify the agent instructions to change behavior: ```python class CustomerServiceAgent(Agent): def __init__(self) -> None: super().__init__( instructions=( "You are a helpful customer service agent for a tech company. " "Be friendly, professional, and concise. " "Always ask how you can help and provide clear solutions." ) ) ``` ## Testing your agent To test your agent: 1. Make sure your LiveKit server is running 2. Clone LiveKit's [frontend example](https://github.com/livekit-examples/agent-starter-react) and run it with your livekit room credentials 3. Join a room - your agent will automatically connect and start the conversation 4. Speak naturally and experience real-time voice interactions ## Next steps Learn more about LiveKit Agents framework and advanced features Browse and test different voices for your agent #### Pipecat --- # Pipecat + LMNT URL: https://docs.lmnt.com/integrations/pipecat/introduction Create an end-to-end conversational voice agent with Pipecat, an open-source Python framework built for real-time voice interactions. --- # What is Pipecat? [Pipecat](https://docs.pipecat.ai/getting-started/overview) is a framework designed for building real-time multimodal AI agents. It provides a flexible pipeline architecture that makes it easy to integrate various components like speech recognition, language models, and text-to-speech (TTS) systems. Pipecat has a built-in [LMNT integration](https://docs.pipecat.ai/server/services/tts/lmnt), making it easy to create an end-to-end conversational pipeline around LMNT. In just a few minutes, you can set up an agent using one of your unique voice clones. Try our interactive demo to experience a voice agent powered by Pipecat and LMNT. ## Key Features for TTS Integration * **Real-time Voice Processing**: Pipecat is optimized for low-latency voice interactions, making it ideal for natural-sounding TTS applications. Pipecat processes responses as they stream in and supports interruption, creating fluid, natural interactions without noticeable delays. * **Modular Pipeline Architecture**: Easily integrate LMNT TTS with a variety of other services. Pipecat makes it easy to mix-and-match with a number of integrated LLMs and STT providers, so you can build exactly what you need. * **Cloud Deployment**: Deploy your TTS-enabled agents to [Pipecat Cloud](https://docs.pipecat.daily.co/introduction) for scalable, production-ready voice applications. With this cost-effective managed solution, you can deploy in a manner of minutes. * **WebRTC Support**: Built-in support for real-time audio streaming via WebRTC * **Multimodal Agents**: Support for combining voice, video, and other modalities in a single agent, enabling rich interactive experiences ## Common Use Cases * **Voice Assistants**: Create conversational AI assistants with natural-sounding voices, including clones * **Interactive Voice Response (IVR)**: Build automated phone systems with dynamic voice responses * **Voice-Enabled Applications**: Add voice capabilities to web and mobile applications * **Multilingual Voice Agents**: Support multiple languages and accents --- # Installation & setup URL: https://docs.lmnt.com/integrations/pipecat/installation Get Pipecat and its required services installed on your machine --- ## Prerequisites Pipecat requires Python 3.10 or higher. To check your Python version: ```bash python --version ``` You must also have an LMNT API key. Follow the steps [here](https://app.lmnt.com/settings/api). **Text-to-Speech**: Converts text to natural-sounding speech. ## Accounts you'll need These are the services you need for the quickstart. You can later swap these out as needed. **Speech-to-Text**: Converts audio to text in realtime. **LLM Inference**: Generates streaming text responses based on user input. Explore Pipecat's full list of [supported services](https://docs.pipecat.ai/server/services/supported-services) for more integration options. ## Setting up your project ### Create a virtual environment We recommend using a virtual environment to manage your dependencies: ```bash mkdir pipecat-project cd pipecat-project python3 -m venv env ``` Activate the virtual environment based on your operating system: ```bash source env/bin/activate ``` ```bash source env/Scripts/activate # If using Git Bash # OR .\env\Scripts\activate # If using Command Prompt # OR .\env\Scripts\Activate.ps1 # If using PowerShell ``` ### Install Pipecat The `pipecat-ai` Python module uses optional dependencies to keep your installation lightweight. This approach lets you include only the specific AI libraries you need for your project. To install Pipecat with support for the recommended services above, use this command: ```bash pip install "pipecat-ai[webrtc,deepgram,openai,lmnt]" ``` You can add this to your `requirements.txt` file or include any combination of [supported integrations](https://docs.pipecat.ai/server/services/supported-services) based on your needs. ## Next Steps Now that you have everything set up, proceed to the Quickstart guide to build your first Pipecat application with LMNT. --- # Quickstart URL: https://docs.lmnt.com/integrations/pipecat/quickstart Build and run your first Pipecat application using LMNT --- In this quickstart, we'll create a simple conversational bot that greets users when they join and exits when they leave. This example demonstrates the core components of a Pipecat application with a streamlined setup. ## Set up your project ### Create a project directory ```bash mkdir pipecat-quickstart && cd pipecat-quickstart ``` ### Set up a virtual environment ```bash python3 -m venv env source env/bin/activate ``` ```bash python -m venv env source env/Scripts/activate # If using Git Bash # OR .\env\Scripts\activate # If using Command Prompt # OR .\env\Scripts\Activate.ps1 # If using PowerShell ``` ### Download the example files ```bash curl -O https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/foundational/07k-interruptible-lmnt.py curl -O https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/foundational/run.py ``` ```powershell curl.exe -O https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/foundational/07k-interruptible-lmnt.py curl.exe -O https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/foundational/run.py ``` Download these files and save them to your project directory: * [07k-interruptible-lmnt.py](https://raw.githubusercontent.com/pipecat-ai/pipecat/main/examples/foundational/07k-interruptible-lmnt.py) * [run.py](https://github.com/pipecat-ai/pipecat/blob/main/examples/foundational/run.py) ### Install dependencies ```bash pip install fastapi uvicorn python-dotenv pipecat-ai[webrtc,deepgram,lmnt] pipecat-ai-small-webrtc-prebuilt ``` ## Configure the environment Create a `.env` file with your LMNT API key: ```bash echo "LMNT_API_KEY=your_lmnt_api_key" > .env echo "DEEPGRAM_API_KEY=your_deepgram_api_key" > .env echo "OPENAI_API_KEY=your_openai_api_key" > .env ``` ```bash echo "LMNT_API_KEY=your_lmnt_api_key" > .env echo "DEEPGRAM_API_KEY=your_deepgram_api_key" > .env echo "OPENAI_API_KEY=your_openai_api_key" > .env ``` Create a file named `.env` in your project directory and add: ```env LMNT_API_KEY=your_lmnt_api_key DEEPGRAM_API_KEY=your_deepgram_api_key OPENAI_API_KEY=your_openai_api_key ``` Replace `your_{service}_api_key` with the actual API keys you created during the [installation step](/integrations/pipecat/installation). ## Run the example Start the bot with this command: ```bash python 07k-interruptible-lmnt.py ``` You'll see a URL (typically [http://localhost:7860](http://localhost:7860)) in the console output. Open this URL in your browser to join the session. Try having a conversation with the bot! ## Understanding the code Let's examine the key lmnt component of `07k-interruptible-lmnt`: ```python # Initialize LMNT's text-to-speech service # Using a pre-selected British female voice # You can find other voices at https://app.lmnt.com/voice-library tts = LMNTTTSService( api_key=os.getenv("LMNT_API_KEY"), voice_id="morgan", # British Lady ) ``` ## Customize the example Try these simple modifications to enhance your bot: Visit [LMNT's voice library](https://app.lmnt.com/voice-library) to find a different voice. Then update the `voice_id` parameter: ```python tts = LMNTTTSService( api_key=os.getenv("LMNT_API_KEY"), voice_id="your_new_voice_id", # Replace with a new voice ID ) ``` Change what language the bot generates speech in. Make sure the LLM you use can produce text in that language. ```python tts = LMNTTTSService( api_key=os.getenv("LMNT_API_KEY"), voice_id="your_new_voice_id", # Replace with a new voice ID language="zh" # Use Chinese ) ``` ## Next steps Now that you have seen how to get a simple bot running, proceed to the Pipecat Cloud quickstart to see an example deployment. --- # Pipecat Cloud + LMNT Quickstart URL: https://docs.lmnt.com/integrations/pipecat/cloud-quickstart Deploy your first Pipecat Cloud agent w/ LMNT TTS --- ## Prerequisites ### System requirements * Python 3.10+ * Linux, MacOS, or Windows Subsystem for Linux (WSL) * [Docker](https://www.docker.com) installed on your system * Access to a container registry (e.g., Docker Hub, GitHub Container Registry, etc.) ### Install Docker Docker Desktop is recommended for development environments as it provides a comprehensive GUI and tools. Download and follow the [official installation guide](https://docs.docker.com/desktop/install/mac-install). Using Homebrew (`brew install docker`) only installs the Docker CLI, not the Docker Engine needed to build and run containers. 1. For Windows 10/11 Pro, Enterprise, or Education: * Download [Docker Desktop for Windows](https://docs.docker.com/desktop/install/windows-install/) 2. For Windows Home or older versions: * First install [WSL 2 (Windows Subsystem for Linux)](https://learn.microsoft.com/en-us/windows/wsl/install) * Then install [Docker Desktop for Windows](https://docs.docker.com/desktop/install/windows-install/) Docker provides official installation guides for various Linux distributions: * [Ubuntu](https://docs.docker.com/engine/install/ubuntu/) * [Debian](https://docs.docker.com/engine/install/debian/) * [RHEL](https://docs.docker.com/engine/install/rhel/) * [Fedora](https://docs.docker.com/engine/install/fedora/) * [CentOS](https://docs.docker.com/engine/install/centos/) * [SLES](https://docs.docker.com/engine/install/sles/) After installation, you may want to [configure Docker to run without sudo](https://docs.docker.com/engine/install/linux-postinstall/#manage-docker-as-a-non-root-user). ### Set up a container registry If you don't already have a Docker Hub account, [create one for free](https://hub.docker.com/signup). While other container registries will work, this guide uses Docker Hub in its examples. After creating your account, log in via your terminal: ```bash docker login ``` Learn about different Docker login options in the [official documentation](https://docs.docker.com/reference/cli/docker/login/). Unlike some services, with Docker Hub you don't need to create a repository in advance. When you push your first image later in this guide, a repository will be automatically created with the name you specify. Take note of your Docker Hub username - you'll need it when pushing your agent image in later steps. ### Create service accounts The Pipecat Cloud starter agent uses OpenAI for LLM inference and LMNT for text-to-speech. You will need API keys for both services. * [OpenAI API key](https://platform.openai.com/settings/organization/api-keys) * [LMNT API key](https://app.lmnt.com/account) ## Create your starter project A bare-bones voice AI agent template is available to help you get started. You'll need to set up your Python environment and authenticate with Pipecat Cloud before initializing the starter project. First, create a new directory for your project: ```bash mkdir pipecat-cloud-starter && cd pipecat-cloud-starter ``` ### Configure your Python environment We recommend using a virtual environment to isolate your project dependencies: [uv](https://github.com/astral-sh/uv) is a fast, modern Python package installer and environment manager. ```bash # Install uv pip install uv # Create a virtual environment uv venv # Activate the environment source .venv/bin/activate # On Unix/macOS .venv\Scripts\activate # On Windows ``` Can't `pip install uv`? Follow the [uv installation instructions](https://docs.astral.sh/uv/getting-started/installation/) instead. Python's built-in `venv` module: ```bash # Create a virtual environment python -m venv .venv # Activate the environment source .venv/bin/activate # On Unix/macOS .venv\Scripts\activate # On Windows ``` ### Set up Pipecat Cloud Create an account at [pipecat.daily.co](https://pipecat.daily.co). You must have valid billing information associated with your account in order to deploy agents. With your environment activated, install the Pipecat Cloud CLI: ```bash pip install pipecatcloud ``` The Pipecat Cloud CLI can be used with either `pipecatcloud` or `pcc` as the command prefix. Run `pcc auth login` to authenticate via your browser (if this doesn't work, try `python -m pipecatcloud auth login`.) ### Initialize the starter project Now, clone the starter project. ```bash git clone https://github.com/lmnt-com/pipecat-cloud-starter.git ``` This command downloads a template that includes everything you need to build and deploy your first voice AI agent. **Project structure** The starter project includes these key files: * `bot.py`: Python entry-point containing your Pipecat agent pipeline * `Dockerfile`: Dockerfile for building the agent container * `requirements.txt`: Python dependencies used by your agent code * `pcc-deploy.toml`: Pipecat Cloud deployment configuration file (optional) ### Configure to run locally (optional) You can test your agent locally before deploying to Pipecat Cloud: ```bash # Set environment variables with your API keys export LMNT_API_KEY="your_lmnt_key" export DAILY_API_KEY="your_daily_key" export OPENAI_API_KEY="your_openai_key" ``` Your `DAILY_API_KEY` can be found at [https://pipecat.daily.co](https://pipecat.daily.co) under the `Settings` in the `Daily (WebRTC)` tab. First install requirements: ```bash pip install -r requirements.txt ``` Then, launch the bot.py script locally: ```bash LOCAL_RUN=1 python bot.py ``` ## Deploy the agent Pipecat Cloud expects a built Docker image that includes the agent code and all dependencies. ### Build and push your Docker image **Build your Docker image** From within your project directory, run Docker build: ```shell docker build --platform=linux/arm64 -t lmnt-agent:latest . ``` This command builds a Docker image named `lmnt-agent` with the tag `latest` from the current directory. The `--platform=linux/arm64` flag is required as Pipecat Cloud runs on ARM64 architecture. **Tag your Docker image** Using the Docker image you just built, tag it with your Docker Hub username and a version number. The tag should be `[your-username]/[image-name]:[version-number]`. ```shell docker tag lmnt-agent:latest your-username/lmnt-agent:0.1 ``` **Push your Docker image** After tagging your image, you can push it to your Docker Hub registry: ```shell docker push your-username/lmnt-agent:0.1 ``` While in beta, Pipecat Cloud requires that your agent image is pushed to your own repository, such as Docker Hub. Both public and private repositories are supported. ### Add secrets [Secrets](https://docs.pipecat.daily.co/agents/secrets) are a secure way to manage sensitive information such as API keys, passwords, and other credentials. The starter project requires the following API keys: * [OpenAI ](https://www.openai.com) API key * [LMNT ](https://app.lmnt.com/account) API key * Daily API key (automatically provided through your Pipecat Cloud account) Pipecat Cloud organizes your secrets into secret sets. This allows you to re-use the same set of secrets across multiple agents within your organization. **Creating a secret set from a file (recommended)** The starter project includes an `env.example` file that you can use as a template. Create a copy of this file and add your actual API keys: ```bash # Copy the example file cp env.example .env # Edit the file with your API keys # LMNT_API_KEY=your_lmnt_key # OPENAI_API_KEY=your_openai_key ``` Then, create a secret set from this file: ```bash pcc secrets set lmnt-agent-secrets --file .env ``` If you prefer, you can also create secrets directly via the command line: ```shell pcc secrets set lmnt-agent-secrets \ LMNT_API_KEY=your_lmnt_key \ OPENAI_API_KEY=your_openai_key ``` For more information on managing secrets, please see [Secrets](https://docs.pipecat.daily.co/agents/secrets). ### Create a deployment The CLI `deploy` command requires three key pieces of information: 1. The name of your agent on Pipecat Cloud 2. The repository and tag of your Docker image 3. The secrets set to use for environment variables Let's deploy the agent using your pushed image: ```shell pcc deploy lmnt-agent your-username/lmnt-agent:0.1 --secrets lmnt-agent-secrets ``` When you run this command, you'll be asked to confirm your deployment configuration before proceeding. The starter project includes a `pcc-deploy.toml` file that already has the agent name, image reference, and secret set configured. If you're using this file, you can simply run `pcc deploy` without additional arguments. See the [Deployments](https://docs.pipecat.daily.co/agents/deploy#using-pcc-deploy-toml) section to learn more. If your repository is private you can provide the deploy command access credentials by specifying image pull secrets via the `--credentials` flag: ```shell # Create an image pull secret (you'll be prompted for credentials) pcc secrets image-pull-secret pull-secret https://index.docker.io/v1/ # Then use it in your deployment pcc deploy lmnt-agent your-username/lmnt-agent:0.1 --credentials pull-secret ``` Learn more about [Image Pull Secrets](https://docs.pipecat.daily.co/agents/secrets#image-pull-secrets) including how to create them for Docker Hub. For more deployment configuration options, see the [deploy reference docs](https://docs.pipecat.daily.co/cli-reference/deploy). ### Check the status of your deployment Assuming the deployment was successful, you can check the status of your agent using the CLI: ```shell # Deployment status pcc agent status lmnt-agent # List deployment history pcc agent deployments lmnt-agent ``` ## Scale the deployment Right now, your deployment has been made with the default runtime configuration. This means that your agent defaults to "scale-to-zero", with no minimum agent instances to serve on-demand session requests. If you were to attempt to connect with your agent now, it's likely you'd encounter a cold start while the agent spins up. Cold starts typically take around 10 seconds. To avoid this, you can scale your deployment to a minimum of one instance: ```shell pcc deploy lmnt-agent your-username/lmnt-agent:0.1 --min-agents 1 ``` This will provide you with one warm instance ready to serve any active sessions. By default, idle agent instances are maintained for 5 minutes before being terminated when using scale-to-zero. For more information, please see [Scaling](https://docs.pipecat.daily.co/agents/scaling). Setting `--min-agents` to 1 or greater will incur charges even when the agent is not in use. ## Start an active session Now that your agent has been deployed, you can start an active session to interact with it. Creating active sessions requires passing a valid API key for your namespace or organization. This is used to authenticate the request and prevent unauthorized access. ### Create a public access key ```shell pcc organizations keys create ``` Running this command will ask if you'd like to set this key as your default. Doing so will associate the created key with your current namespace, allowing you to omit the `--api-key` flag in future requests. Next set the key to be used with your agent: ```shell pcc organizations keys use ``` When prompted, select your agent from the list. ### Talk to your agent The starter project is configured to use [Daily](https://www.daily.co) as a WebRTC transport. Pipecat Cloud has a direct integration with Daily, meaning you are issued a Daily API key when you create an account. ```shell pcc agent start lmnt-agent --use-daily ``` This command will start a new active session with your agent. Since you are using Daily, you will be given a URL in the terminal to open in your browser. This will open a new tab where you can interact with your agent. When using the Daily key associated with your Pipecat Cloud account, your Daily voice minutes for one human and one bot are free. Additional charges apply for features like recording, transcription, and PSTN/SIP. See [Daily's pricing page](https://www.daily.co/pricing) for more information. In addition to the CLI approach shown above, you can start sessions programmatically: #### Use the REST API ```bash curl --location --request POST 'https://api.pipecat.daily.co/v1/public/lmnt-agent/start' \ --header 'Authorization: Bearer YOUR_PUBLIC_API_KEY' \ --header 'Content-Type: application/json' \ --data-raw '{ "createDailyRoom": true, "body": {"custom": "data"} }' ``` #### Use the Python SDK ```python import asyncio from pipecatcloud.exception import AgentStartError from pipecatcloud.session import Session, SessionParams async def main(): try: # Create session object session = Session( agent_name="lmnt-agent", api_key=API_KEY, # Replace with your actual API key params=SessionParams( use_daily=True, # Optional: Creates a Daily room daily_room_properties={"start_video_off": False}, data={"key": "value"}, ), ) # Start the session response = await session.start() # Get Daily room URL daily_url = f"{response['dailyRoom']}?t={response['dailyToken']}" print(f"Join Daily room: {daily_url}") except AgentStartError as e: print(f"Error starting agent: {e}") except Exception as e: print(f"Unexpected error: {e}") # Run the async function if __name__ == "__main__": asyncio.run(main()) ``` These approaches are particularly useful when integrating agent sessions into your own applications, such as web interfaces or backend services. For more information on starting sessions, see [Active Sessions](https://docs.pipecat.daily.co/agents/active-sessions). ## Monitor and Troubleshoot Once your agent is deployed, you can use the following commands to monitor its status and troubleshoot any issues: ```shell # Check deployment status pcc agent status lmnt-agent # View deployment logs pcc agent logs lmnt-agent ``` These commands provide visibility into your agent's operation and can help diagnose problems if your agent isn't functioning as expected. ## Next Steps Congratulations! You have successfully deployed your first agent w/ LMNT TTS to Pipecat Cloud. Read on for more details about scaling your deployment and other useful integrations. #### Vapi --- # Vapi + LMNT URL: https://docs.lmnt.com/integrations/vapi/introduction Create an end-to-end conversational voice agent with Vapi, a powerful platform for building and serving voice AI applications with real-time interactions. --- # What is Vapi? [Vapi](https://docs.vapi.ai/quickstart/dashboard) is a comprehensive platform for deploying heavily-customizable voice AI agents (called _assistants_) that support live, two-way conversations. Vapi provides a complete solution for creating voice assistants with control over speech-to-text (STT), language models (LLM), and text-to-speech (TTS) components. As a fully-managed system, Vapi even stores your call logs so you can track how well your agents are performing. Vapi has a built-in [LMNT integration](https://docs.vapi.ai/providers/voice/imnt), allowing you to use your custom LMNT voices on their platform. Just provide Vapi with your LMNT API key, and your voices will be selectable when creating assistants. ## Key Features for TTS Integration * **Real-time Conversations**: Support for live, two-way conversations with low latency and natural interaction. Vapi handles interruptions (letting users cut in), endpointing (quickly detecting when the user is done speaking), emotion detection (passing user emotion to the LLM), and other features to make conversations more natural. * **Phone Integration**: Built-in support for both inbound and outbound phone calls. Vapi will even allot you a free phone number to use. * **Web-based Testing**: Test your voice agents directly from the Vapi dashboard, and create test suites to ensure your agents work as expected. * **Customizable Assistant Behavior**: Fine-tune your assistant's personality, responses, and conversation flow. * **Monitoring**: Detailed logs with audio recordings and transcripts, as well as AI-based call success evaluations. ## Common Use Cases * **Customer Support**: Create AI-powered customer service agents that can handle inquiries and provide support * **Sales Outbound** - Qualifies leads using customizable scripts and schedules appointments with sales representatives based on prospect responses. * **Service Updates/Notifications** - Delivers critical alerts about service changes or account status with options to connect to live agents if needed. * **Appointment Reminders** - Makes personalized reminder calls with appointment details and processes simple rescheduling requests via voice commands. * **Business Voice Assistants** - Conducts routine client check-ins with branded voice identity and gathers preliminary information before human handoff. ## Getting Started To get started with Vapi and LMNT: 1. Create a Vapi account at [dashboard.vapi.ai](https://dashboard.vapi.ai) 2. Add your LMNT API key in the [provider keys tab](https://dashboard.vapi.ai/keys). Your LMNT API key can be found in the [playground](https://app.lmnt.com/account). 3. Create a new assistant using the dashboard 4. In the `Voice Configuration` section of the agent setting page, select `lmnt` as the provider and choose a voice. 5. Click `publish` to save changes. 6. Test your assistant through the dashboard or by phone. ## Next Steps Now that you have gotten a basic agent deployed through the platform, see how you can create a Vapi assistant using LMNT on any web app. --- # Web calling w/ Vapi + LMNT URL: https://docs.lmnt.com/integrations/vapi/quickstart Make a web call to your assistant from the browser --- ## Quickstart Here is a v0 demo for how Vapi can be integrated into a web app to create a voice assistant with an LMNT voice. This example uses the Vapi web SDK to configure a new assistant and start a call with it. To try live and to edit, follow these steps: Grab your **public** API key from the Vapi [dashboard](https://dashboard.vapi.ai/org/api-keys). Make sure your [LMNT API key](https://app.lmnt.com/account) is configured in the [Vapi dashboard](https://dashboard.vapi.ai/keys) as well! Sign into v0 and select **Fork** to create your own copy of the app. In `settings -> environment variables`, create a new environment variable called `NEXT_PUBLIC_VAPI_API_KEY` with your Vapi public API key as the value. ### Understanding the code Let's examine the code in `page.tsx` that configures LMNT. ```TypeScript // Line 228 const assistantOptions = { name: "Pizza Assistant", firstMessage: "Vappy's Pizzeria speaking, how can I help you?", transcriber: { provider: "deepgram", model: "nova-2", language: "en-US", }, voice: { provider: "lmnt", voiceId: "amy", }, ... ``` Calls to the Vapi assistant are made with `vapi.start(assistantOptions)`. When defining `assistantOptions`, LMNT is selected and configured through the `voice` property. Try changing the `voiceId` to start a call with a different voice! LMNT voice ids can be found in the LMNT playground's [voice library](https://app.lmnt.com/voice-library). ## Next Steps Read on for more information on Vapi's browser-based capabilities. --- # Vercel guide URL: https://docs.lmnt.com/integrations/vercel Learn how to use LMNT in your Vercel apps. --- [Vercel](https://vercel.com) is a cloud platform for static sites and serverless functions. It's an easy way to host your LMNT speech-enabled apps. The steps we'll cover in this guide go from adding the integration to your Vercel account, retrieving your LMNT API key, and using LMNT **Speech** and **Voice** in your application development. # Adding the integration 1. Sign into your Vercel account and navigate to the [LMNT Vercel integration page](https://vercel.com/integrations/lmnt). 2. Click the `Add integration` button. 3. Select the Vercel project(s) you want to integrate with LMNT. 4. Sign into your LMNT account in the popup to complete the integration. You're all set! You'll now find a `LMNT_API_KEY` environment variable in your Vercel project(s). # Accessing your LMNT API key In requests to our servers, you'll need to reference the LMNT API key that was added by the integration. Here are some examples of how to do so by runtime language: ```python Python api_key = os.environ.get('LMNT_API_KEY') ``` ```typescript TypeScript const apiKey = process.env.LMNT_API_KEY; ``` ```go Go apiKey := os.Getenv("LMNT_API_KEY") ``` ```ruby Ruby ENV['LMNT_API_KEY'] ``` Learn more about environment variables in Vercel [here](https://vercel.com/docs/environment-variables). # Using LMNT With your key as environment variable, you can now either directly call the LMNT REST and WebSocket APIs or use our SDKs, referenced here: Explore our REST API reference. Try the WebSocket API to stream audio/text. # Adding the integration to additional Vercel projects First, give the integration access permissions to your project(s) 1. Sign into your Vercel account and navigate to the `Integrations` tab. 2. Select `Manage` on the LMNT integration. 3. In your LMNT integration page, select `Manage Access` on the right-hand side of the title row. 4. Select the projects that you would like to add the LMNT integration to (or select `All Projects`). Second, configure the integration to add `LMNT_API_KEY` environment variable to permissions projects. 5. Back in the LMNT integration page, select `Configure` on the right-hand side of the title row. 6. Select the projects that you would like to set up with an LMNT API key environment variable. If you initially gave the LMNT integration access to all your Vercel projects, any new project you create will not get the `LMNT_API_KEY` environment variable automatically. You will need to configure the integration to add the variable to new projects (steps 5 and 6 above). ### Release notes --- # LMNT platform URL: https://docs.lmnt.com/release-notes/overview Record of changes across our models and API surfaces. --- * We've shipped an updated version of Blizzard 2.0 to further improve speaker similarity & pronunciation edge cases. * We've released API 1.1, which cleans up legacy options from our earliest models, overhauls the speech sessions protocol, adds a structured error envelope for all errors, and adds a `request-id` header on every response. * We've improved speech sessions reset latency, which helps when handling user interrupts. * We've shipped updated Python and TypeScript SDKs that support API 1.1. * We've released Blizzard 2.0, which supports 7 new languages (Assamese, Bengali, Danish, Malayalam, Marathi, Tamil, and Telugu), improves latency, and improves pronunciation on edge cases. * We've overhauled our docs with improved guides, clarity, and walkthroughs. * We've improved speech generation time-to-first-byte globally. * We've released Blizzard 1.4, which improves audio quality and fixes edge cases where certain voices could start unprompted screaming. * We've reduced latency when requesting word timestamps. * We've improved voice clone background noise handling. * We've improved voice clone creation latency. * We've released Blizzard 1.3, with new support for Czech, Finnish, and Slovak. * We've released Blizzard 1.2, with new support for Arabic. * We've released v2 of our Python SDK. * We've added .webm & .opus output to speech generation and speech sessions. * We've added support for Urdu. * We've retired professional voice clones. Instant voice clone quality now meets our quality bar to replace professional clones. * We've added .ogg format support for voice cloning. * We've made Blizzard our default model. * We've released Blizzard 1.1. * We've improved word timestamp accuracy. * We've added support for Indonesian, Dutch, Polish, Swedish, Thai, Ukrainian, and Vietnamese. * We've released Blizzard 1.0 with support for English, Spanish, Portuguese, French, German, Chinese, Korean, Hindi, Japanese, Russian, Italian, and Turkish. * We've released version v2.1 of our TypeScript SDK, with speech session support. * We've released v2 of our TypeScript SDK. * We've released our upcoming model Blizzard for early English-only preview in the [Playground](https://app.lmnt.com/playground) and in the API. ## Models ### Models --- # Models overview URL: https://docs.lmnt.com/models/overview LMNT creates state-of-the-art speech models. This guide introduces the available models and compares their performance. --- ## Choosing a model LMNT's major line of models is under the Blizzard family. The current version is **Blizzard 2.0**. Blizzard receives regular updates, and we regularly have preview versions to test new features with enterprise customers before rolling out more widely. If you're ready to get started, [learn how to make your first API call](/quickstart). ### Feature support | Feature | Blizzard 2.0 | | --- | :---: | | API ID | `blizzard` | | Voice cloning | Yes | | Languages | 31 | | Accent control | Yes | | Word timestamps | Yes | | Streaming | Yes | | Speech sessions | Yes | --- ## Get started with LMNT If you're ready to explore generating speech with LMNT and integrating it into your applications, here are good places to start: Explore LMNT's features & development flow. Learn how to make your first API call. Generate speech and clone voices in your browser. If you have any questions or need assistance, don't hesitate to reach out in the [Discord community](https://discord.gg/8ZE5ka4nHg). ## Client SDKs ### Client SDKs --- # Client SDKs URL: https://docs.lmnt.com/api/client-sdks Official SDKs for using LMNT with Python and TypeScript. --- LMNT provides official client SDKs to make it easier to work with the LMNT API. Each SDK provides idiomatic interfaces, type safety, and built-in support for streaming, retries, and error handling. Sync and async clients with type hints and streaming helpers Node.js, Deno, Bun, and browser support ## Quick installation ```bash pip install lmnt ``` ```bash npm install lmnt-node ``` ## Quick start ```python Python import asyncio import os from lmnt import AsyncLmnt async def main(): client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) async with client.speech.with_streaming_response.generate( text="Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", voice='leah', ) as response: await response.stream_to_file('hello.mp3') asyncio.run(main()) ``` ```typescript TypeScript import { createWriteStream } from 'fs'; import { pipeline } from 'stream/promises'; import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const response = await client.speech.generate({ text: "Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", voice: 'leah', }).asResponse(); await pipeline(response.body, createWriteStream('hello.mp3')); ``` ## Requirements | SDK | Minimum version | | ---------- | -------------------------- | | Python | 3.10+ | | TypeScript | 4.5+ (Node.js 20+) | ## GitHub repositories - [lmnt-python](https://github.com/lmnt-com/lmnt-python) - [lmnt-node](https://github.com/lmnt-com/lmnt-node) --- # Python SDK URL: https://docs.lmnt.com/api/sdks/python Install and configure the LMNT Python SDK with sync and async client support. --- The LMNT Python SDK provides convenient access to the LMNT REST API from any Python application. It supports synchronous and asynchronous clients with streaming. For per-method API documentation with code examples, see the [API Reference](/api/overview). This page covers Python-specific SDK features and configuration. ## Installation ```bash pip install lmnt ``` ## Requirements Python 3.10 or later is required. ## Usage ```python import os from lmnt import Lmnt client = Lmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) response = client.speech.generate( text="Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", voice='leah', ) ``` Consider using [python-dotenv](https://pypi.org/project/python-dotenv/) to add `LMNT_API_KEY="my-lmnt-api-key"` to your `.env` file so that your API key isn't stored in source control. ## Async usage Import `AsyncLmnt` instead of `Lmnt` and `await` each call: ```python import os import asyncio from lmnt import AsyncLmnt client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), ) async def main() -> None: response = await client.speech.generate( text="Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", voice='leah', ) asyncio.run(main()) ``` Functionality between the synchronous and asynchronous clients is otherwise identical. ### Using aiohttp for better concurrency By default, the async client uses `httpx`. For improved concurrency you can use `aiohttp` as the HTTP backend. Install with the `aiohttp` extra: ```bash pip install lmnt[aiohttp] ``` Then pass `DefaultAioHttpClient()` when constructing the client: ```python import asyncio from lmnt import AsyncLmnt, DefaultAioHttpClient async def main() -> None: async with AsyncLmnt( api_key='My API Key', http_client=DefaultAioHttpClient(), ) as client: response = await client.speech.generate( text="Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", voice='leah', ) asyncio.run(main()) ``` ## Streaming responses `speech.generate` returns audio bytes. To stream them as they're produced — instead of buffering the full response in memory — use `with_streaming_response`: ```python from lmnt import Lmnt client = Lmnt() with client.speech.with_streaming_response.generate( text="Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", voice='leah', ) as response: response.stream_to_file('hello.mp3') ``` The async client uses the same interface: ```python import asyncio from lmnt import AsyncLmnt client = AsyncLmnt() async def main() -> None: async with client.speech.with_streaming_response.generate( text="Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", voice='leah', ) as response: await response.stream_to_file('hello.mp3') asyncio.run(main()) ``` The context manager is required so that the response is reliably closed. ## File uploads Request parameters that correspond to file uploads can be passed as `bytes`, a [`PathLike`](https://docs.python.org/3/library/os.html#os.PathLike), or a tuple of `(filename, contents, media_type)`. ```python from pathlib import Path from lmnt import Lmnt client = Lmnt() client.voices.create( name='My Voice', file=Path('sample.wav'), ) ``` The async client uses the same interface. If you pass a `PathLike`, the file contents are read asynchronously automatically. ## Handling errors When the library is unable to connect to the API (for example, a network issue or timeout), a subclass of `lmnt.APIConnectionError` is raised. When the API returns a non-success status code (4xx or 5xx), a subclass of `lmnt.APIStatusError` is raised, with `status_code` and `response` properties. All errors inherit from `lmnt.APIError`. ```python import lmnt from lmnt import Lmnt client = Lmnt() try: client.speech.generate( text="Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", voice='leah', ) except lmnt.APIConnectionError as e: print('The server could not be reached') print(e.__cause__) # an underlying Exception, likely raised within httpx except lmnt.RateLimitError as e: print('A 429 status code was received; we should back off a bit.') except lmnt.APIStatusError as e: print('Another non-200-range status code was received') print(e.status_code) print(e.response) ``` Error codes are as follows: | Status Code | Error Type | | ----------- | -------------------------- | | 400 | `BadRequestError` | | 401 | `AuthenticationError` | | 402 | `PaymentRequiredError` | | 403 | `PermissionDeniedError` | | 404 | `NotFoundError` | | 422 | `UnprocessableEntityError` | | 429 | `RateLimitError` | | >=500 | `InternalServerError` | | N/A | `APIConnectionError` | ## Retries Certain errors are automatically retried twice by default, with a short exponential backoff. Connection errors, 408 Request Timeout, 409 Conflict, 429 Rate Limit, and >=500 Internal errors are all retried by default. Use the `max_retries` option to configure or disable this: ```python from lmnt import Lmnt # Configure the default for all requests: client = Lmnt( max_retries=0, # default is 2 ) # Or, configure per-request: client.with_options(max_retries=5).speech.generate( text="Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", voice='leah', ) ``` ## Timeouts By default, requests time out after 1 minute. Configure this with the `timeout` option, which accepts a float or an [`httpx.Timeout`](https://www.python-httpx.org/advanced/timeouts/#fine-tuning-the-configuration): ```python import httpx from lmnt import Lmnt # Configure the default for all requests: client = Lmnt( timeout=20.0, # 20 seconds (default is 1 minute) ) # More granular control: client = Lmnt( timeout=httpx.Timeout(60.0, read=5.0, write=10.0, connect=2.0), ) # Override per-request: client.with_options(timeout=5.0).speech.generate( text="Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", voice='leah', ) ``` On timeout, an `APITimeoutError` is thrown. Requests that time out are [retried twice by default](#retries). ## Type system Nested request parameters are [TypedDicts](https://docs.python.org/3/library/typing.html#typing.TypedDict). Responses are [Pydantic models](https://docs.pydantic.dev) which provide helper methods for serializing back into JSON (`.to_json()`) or a dictionary (`.to_dict()`). Typed requests and responses provide autocomplete and documentation within your editor. To see type errors in VS Code, set `python.analysis.typeCheckingMode` to `basic`. ### Handling null vs missing fields In an API response, a field may be explicitly `null` or missing entirely; in either case, its value is `None`. You can differentiate the two with `.model_fields_set`: ```python if response.my_field is None: if 'my_field' not in response.model_fields_set: print('field was not in the response') else: print('field was null') ``` ## Advanced usage ### Accessing raw response data (e.g., headers) The "raw" response from `httpx` can be accessed by prefixing `.with_raw_response.` to any HTTP method call: ```python from lmnt import Lmnt client = Lmnt() response = client.speech.with_raw_response.generate( text="Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", voice='leah', ) print(response.headers.get('X-My-Header')) speech = response.parse() # the object that `speech.generate()` would have returned print(speech) ``` ### Logging The SDK uses the standard library `logging` module. Enable logging by setting the `LMNT_LOG` environment variable: ```bash export LMNT_LOG=info ``` Use `debug` for more verbose output. ### Making custom/undocumented requests This library is typed for convenient access to the documented API. If you need to access undocumented endpoints, params, or response properties, the library can still be used. #### Undocumented endpoints To make requests to undocumented endpoints, use `client.get`, `client.post`, and other HTTP verbs. Client options like retries are still respected. ```python import httpx response = client.post( '/foo', cast_to=httpx.Response, body={'my_param': True}, ) print(response.headers.get('x-foo')) ``` #### Undocumented request params Use the `extra_query`, `extra_body`, and `extra_headers` request options to send additional parameters. #### Undocumented response properties To access undocumented response properties, read fields like `response.unknown_prop`. The full extra-fields dict is available as `response.model_extra`. ### Configuring the HTTP client You can override the [httpx client](https://www.python-httpx.org/api/#client) — useful for proxies, custom transports, and other [advanced configuration](https://www.python-httpx.org/advanced/clients/): ```python import httpx from lmnt import Lmnt, DefaultHttpxClient client = Lmnt( # Or use the `LMNT_BASE_URL` env var base_url='http://my.test.server.example.com:8083', http_client=DefaultHttpxClient( proxy='http://my.test.proxy.example.com', transport=httpx.HTTPTransport(local_address='0.0.0.0'), ), ) ``` You can also customize per-request: ```python client.with_options(http_client=DefaultHttpxClient(...)) ``` ### Managing HTTP resources By default the library closes underlying HTTP connections whenever the client is [garbage collected](https://docs.python.org/3/reference/datamodel.html#object.__del__). You can manually close the client with `.close()` or use a context manager: ```python from lmnt import Lmnt with Lmnt() as client: # make requests here ... # HTTP client is now closed ``` ## Semantic versioning This package generally follows [SemVer](https://semver.org/spec/v2.0.0.html), though certain backwards-incompatible changes may be released as minor versions: 1. Changes that only affect static types, without breaking runtime behavior. 2. Changes to library internals which are technically public but not intended or documented for external use. 3. Changes that we don't expect to impact the vast majority of users in practice. ### Determining the installed version If you've upgraded but aren't seeing new features, your environment is likely still using an older version. Check at runtime with: ```python import lmnt print(lmnt.__version__) ``` ## Additional resources - [GitHub repository](https://github.com/lmnt-com/lmnt-python) - [API Reference](/api-reference) --- # TypeScript SDK URL: https://docs.lmnt.com/api/sdks/typescript Install and configure the LMNT TypeScript SDK for Node.js, Deno, Bun, and Edge runtimes. --- This library provides convenient access to the LMNT REST API from server-side TypeScript or JavaScript. For per-method API documentation with code examples, see the [API Reference](/api/overview). This page covers TypeScript-specific SDK features and configuration. ## Installation ```bash npm install lmnt-node ``` ## Requirements TypeScript >= 4.5 is supported. The following runtimes are supported: - Node.js 20 LTS or later ([non-EOL](https://endoflife.date/nodejs)) versions. - Deno v1.28.0 or higher. - Bun 1.0 or later. - Cloudflare Workers. - Vercel Edge Runtime. - Jest 28 or greater with the `"node"` environment (`"jsdom"` is not supported at this time). - Nitro v2.6 or greater. - Web browsers (up-to-date Chrome, Firefox, Safari, Edge, and more). Note that React Native is not supported at this time. If you are interested in other runtimes, please [open an issue](https://github.com/lmnt-com/lmnt-node/issues). ## Usage ```typescript import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const response = await client.speech.generate({ text: "Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", voice: 'leah', }); const content = await response.blob(); console.log(content); ``` ## Request and response types This library includes TypeScript definitions for all request params and response fields. You may import and use them like so: ```typescript import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const params: Lmnt.SpeechGenerateParams = { text: "Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", voice: 'leah' }; const response: Response = await client.speech.generate(params); ``` Documentation for each method, request param, and response field is available in docstrings and will appear on hover in most modern editors. ## File uploads Request parameters that correspond to file uploads can be passed in many different forms: - `File` (or an object with the same structure) - a `fetch` `Response` (or an object with the same structure) - an `fs.ReadStream` - the return value of the `toFile` helper ```typescript import fs from 'fs'; import Lmnt, { toFile } from 'lmnt-node'; const client = new Lmnt(); // If you have access to Node `fs` we recommend using `fs.createReadStream()`: await client.voices.create({ name: 'new-voice', file: fs.createReadStream('/path/to/file') }); // Or if you have the web `File` API you can pass a `File` instance: await client.voices.create({ name: 'new-voice', file: new File(['my bytes'], 'file') }); // You can also pass a `fetch` `Response`: await client.voices.create({ name: 'new-voice', file: await fetch('https://somesite/file') }); // Or use the `toFile` helper for `Buffer` / `Uint8Array`: await client.voices.create({ name: 'new-voice', file: await toFile(Buffer.from('my bytes'), 'file') }); await client.voices.create({ name: 'new-voice', file: await toFile(new Uint8Array([0, 1, 2]), 'file') }); ``` ## Handling errors When the library is unable to connect to the API, or if the API returns a non-success status code (4xx or 5xx), a subclass of `APIError` is thrown: ```typescript const response = await client.speech.generate({ text: "Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", voice: 'leah' }).catch(async (err) => { if (err instanceof Lmnt.APIError) { console.log(err.status); // 400 console.log(err.name); // BadRequestError console.log(err.headers); // {server: 'nginx', ...} } else { throw err; } }); ``` Error codes are as follows: | Status Code | Error Type | | ----------- | -------------------------- | | 400 | `BadRequestError` | | 401 | `AuthenticationError` | | 402 | `PaymentRequiredError` | | 403 | `PermissionDeniedError` | | 404 | `NotFoundError` | | 422 | `UnprocessableEntityError` | | 429 | `RateLimitError` | | >=500 | `InternalServerError` | | N/A | `APIConnectionError` | ## Retries Certain errors are automatically retried twice by default, with a short exponential backoff. Connection errors, 408 Request Timeout, 409 Conflict, 429 Rate Limit, and >=500 Internal errors are all retried by default. Use the `maxRetries` option to configure or disable this: ```typescript // Configure the default for all requests: const client = new Lmnt({ maxRetries: 0, // default is 2 }); // Or, configure per-request: await client.speech.generate( { text: "Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", voice: 'leah' }, { maxRetries: 5 }, ); ``` ## Timeouts Requests time out after 1 minute by default. Configure this with the `timeout` option: ```typescript // Configure the default for all requests: const client = new Lmnt({ timeout: 20 * 1000, // 20 seconds (default is 1 minute) }); // Override per-request: await client.speech.generate( { text: "Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", voice: 'leah' }, { timeout: 5 * 1000 }, ); ``` On timeout, an `APIConnectionTimeoutError` is thrown. Requests that time out are [retried twice by default](#retries). ## Advanced usage ### Accessing raw Response data (e.g., headers) The "raw" `Response` returned by `fetch()` can be accessed through the `.asResponse()` method on the `APIPromise` that all methods return. `.asResponse()` returns as soon as the headers for a successful response are received and does not consume the body, so you are free to write custom parsing or streaming logic. You can also use `.withResponse()` to get the raw `Response` along with the parsed data: ```typescript const client = new Lmnt(); const response = await client.speech .generate({ text: "Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", voice: 'leah' }) .asResponse(); console.log(response.headers.get('X-My-Header')); console.log(response.statusText); const { data, response: raw } = await client.speech .generate({ text: "Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", voice: 'leah' }) .withResponse(); console.log(raw.headers.get('X-My-Header')); console.log(data); ``` ### Making custom/undocumented requests This library is typed for convenient access to the documented API. If you need to access undocumented endpoints, params, or response properties, the library can still be used. #### Undocumented endpoints To make requests to undocumented endpoints, you can use `client.get`, `client.post`, and other HTTP verbs. Options on the client, such as retries, will be respected when making these requests. ```typescript await client.post('/some/path', { body: { some_prop: 'foo' }, query: { some_query_arg: 'bar' }, }); ``` #### Undocumented request params To make requests using undocumented parameters, use `// @ts-expect-error` on the undocumented parameter. The library doesn't validate at runtime that the request matches the type, so any extra values you send will be sent as-is. ```typescript client.foo.create({ foo: 'my_param', bar: 12, // @ts-expect-error baz is not yet public baz: 'undocumented option', }); ``` For requests with the `GET` verb, any extra params will be in the query; all other requests will send the extra param in the body. If you want to explicitly send an extra argument, you can do so with the `query`, `body`, and `headers` request options. #### Undocumented response properties To access undocumented response properties, you may access the response object with `// @ts-expect-error` on the response object, or cast the response object to the requisite type. The SDK does not validate or strip extra properties from the response. ### Customizing the fetch client By default, this library uses `node-fetch` in Node, and expects a global `fetch` function in other environments. If you would prefer to use a global, web-standards-compliant `fetch` function even in a Node environment (for example, when running Node with `--experimental-fetch` or using Next.js, which polyfills with `undici`), add the following import before your first import from `lmnt-node`: ```typescript import 'lmnt-node/shims/web'; import Lmnt from 'lmnt-node'; ``` To do the inverse, add `import 'lmnt-node/shims/node'`. You can also provide a custom `fetch` function when instantiating the client, useful for inspecting or altering the `Request` or `Response` before/after each request: ```typescript import { fetch } from 'undici'; import Lmnt from 'lmnt-node'; const client = new Lmnt({ fetch: async (url, init) => { console.log('About to make a request', url, init); const response = await fetch(url, init); console.log('Got response', response); return response; }, }); ``` If given a `DEBUG=true` environment variable, this library logs all requests and responses automatically. This is intended for debugging only and may change in the future without notice. ### Configuring an HTTP(S) Agent (e.g., for proxies) By default, this library uses a stable agent for all http/https requests to reuse TCP connections, eliminating many TCP & TLS handshakes and shaving around 100ms off most requests. To disable or customize this behavior — for example, to route requests through a proxy — pass an `httpAgent`: ```typescript import http from 'http'; import { HttpsProxyAgent } from 'https-proxy-agent'; // Configure the default for all requests: const client = new Lmnt({ httpAgent: new HttpsProxyAgent(process.env.PROXY_URL), }); // Override per-request: await client.speech.generate( { text: "Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can't believe it's gonna rain, dude. Like what?", voice: 'leah' }, { httpAgent: new http.Agent({ keepAlive: false }) }, ); ``` ## Semantic versioning This package generally follows [SemVer](https://semver.org/spec/v2.0.0.html), though certain backwards-incompatible changes may be released as minor versions: 1. Changes that only affect static types, without breaking runtime behavior. 2. Changes to library internals which are technically public but not intended or documented for external use. 3. Changes that we don't expect to impact the vast majority of users in practice. ## Additional resources - [GitHub repository](https://github.com/lmnt-com/lmnt-node) - [API Reference](/api-reference) ## API Reference ### Using the API --- # API overview URL: https://docs.lmnt.com/api/overview --- The LMNT API is a RESTful API at `https://api.lmnt.com` that provides programmatic access to LMNT's text-to-speech and voice cloning models. **New to LMNT?** Start with the [Quickstart](/quickstart) for a hands-on walkthrough, or jump straight to [Generate speech](/api/speech/generate) for the most common endpoint. ## Prerequisites To use the LMNT API, you'll need: - An [LMNT account](https://app.lmnt.com/account) - An [API key](https://app.lmnt.com/settings/api) For a step-by-step walkthrough, see the [Quickstart](/quickstart). ## Available APIs The LMNT API is organized into the following groups: - **[Speech API](/api/speech/generate)**: Generate speech from text, with optional word-level timestamps. - **[Speech Sessions API](/api/speech-sessions/create)**: Stream text in progressively and receive speech as it's generated — best for real-time applications. - **[Voices API](/api/voices/list)**: Create, list, retrieve, update, and delete voices, including instant clones from your own audio. - **[Accounts API](/api/accounts/retrieve)**: Look up usage and plan information for your account. ## Authentication All requests to the LMNT API must include these headers. | Header | Value | Required | |--------|-------|----------| | `X-API-Key` | Your API key from your [LMNT settings](https://app.lmnt.com/settings/api) | Yes | | `lmnt-version` | API version (e.g. `1.0`) | Yes | If you're using one of the [Client SDKs](#client-sdks), the SDK sends these headers for you. For API versioning details, see [API versions](/api/versioning). ### Getting API keys The API is made available via the [LMNT Playground](https://app.lmnt.com). You can use the Playground to try out our models in the browser and then generate API keys in [Account Settings](https://app.lmnt.com/settings/api). ## Client SDKs LMNT provides official SDKs that simplify API integration by handling authentication, request formatting, error handling, and more. **Benefits**: - Automatic header management (`X-API-Key`, `lmnt-version`) - Type-safe request and response handling - Built-in retry logic and error handling - Streaming support - Request timeouts and connection management For a list of client SDKs and their respective installation instructions, see [Client SDKs](/api/client-sdks). ## Next steps Send text, stream audio back. The most common endpoint. Real-time speech generation with progressive text input. Python and TypeScript --- # Errors URL: https://docs.lmnt.com/api/errors --- ## HTTP errors We use standard HTTP status codes to indicate the success or failure of a request: - 400 - `invalid_request_error`: The request is malformed or contains invalid content. - 401 - `authentication_error`: The API key is missing or unrecognized. - 402 - `payment_required_error`: Your plan is out of credits or your subscription is past due. Check your payment details in [Playground](https://app.lmnt.com/settings/billing). - 403 - `permission_error`: The API key doesn't have permission to access this resource. - 404 - `not_found_error`: The requested resource doesn't exist. - 429 - `rate_limit_error`: Your account has hit a rate limit. - 5xx - `internal_server_error`: Something went wrong on our end. Retry with exponential backoff. ## Error shape Errors are always returned as JSON, with a top-level `error` object that always includes a `type` and `message` value. The response also includes a `request_id` field for easier tracking and debugging. For example: ```json { "type": "error", "error": { "type": "not_found_error", "message": "Voice 'foo' not found." }, "request_id": "req_X7cMY9WUno4rzYSUmuigg6" } ``` ## Request id Every API response includes a unique `request-id` header. This header contains a value such as `req_K5kw6MKtCL83A9bNGcrV3w`. When contacting support about a specific request, include this ID to help quickly resolve your issue. The official SDKs provide this value as a property on top-level response objects, containing the value of the `request-id` header: ```python title="Python" from lmnt import Lmnt client = Lmnt() voice_response = client.voices.retrieve('leah') print(voice_response.request_id) ``` ```typescript title="TypeScript" import Lmnt from 'lmnt-node'; const client = new Lmnt(); const voice = await client.voices.retrieve('leah'); console.log(voice.request_id); ``` ## Streaming errors When you consume a streaming response — binary audio chunks from [`POST /v1/ai/speech/bytes`](/api/speech/generate) or messages from a [speech session WebSocket](/api/speech-sessions/create) — an error that occurs **after** the initial `200` (or successful WebSocket handshake) won't be delivered as a JSON error body. - For the streaming bytes endpoint, we terminate the connection and reset the underlying HTTP/2 stream. - For the WebSocket session, we send a close frame with a status code and reason describing the failure. Treat an early end-of-stream as a potential mid-flight error rather than a clean completion. ## Long requests Some networks drop idle TCP connections after a variable period, which can cause long-lived non-streaming requests to fail without a response. To generate a large amount of speech, split it into smaller requests or use one of the streaming endpoints, which deliver speech continuously and keep the connection active. ### Speech --- # Speech URL: https://docs.lmnt.com/api/speech --- {/* Generated by carbonsteel. DO NOT EDIT. */} ## Generate speech Generates speech from text and streams the audio as binary data chunks in real-time as they are generated. This is the recommended endpoint for most text-to-speech use cases. You can either stream the chunks for low-latency playback or collect all chunks to get the complete audio file. {/* @md-only */} ### Body Parameters The text to generate speech from; max 5000 characters per request (including spaces). The voice id of the voice to use; voice ids can be retrieved by calls to `List voices` or `Voice info`. When set to true, the generated speech will also be saved to your [clip library](https://app.lmnt.com/clips) in the LMNT playground. The desired output format of the audio. If you are using a streaming endpoint, you'll generate audio faster by selecting a streamable format since chunks are encoded and returned as they're generated. For non-streamable formats, all speech will be generated before encoding. Streamable formats: - `mp3`: 96kbps MP3 audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. Non-streamable formats: - `aac`: AAC audio codec. - `wav`: 16-bit PCM audio in WAV container. Allowed values: `aac`, `mp3`, `ulaw`, `wav`, `webm`, `pcm_s16le`, `pcm_f32le` The desired language. Two letter ISO 639-1 code. Defaults to auto language detection, but specifying the language is recommended for faster generation. Allowed values: `auto`, `ar`, `as`, `bn`, `cs`, `da`, `de`, `en`, `es`, `fi`, `fr`, `hi`, `id`, `it`, `ja`, `ko`, `ml`, `mr`, `nl`, `pl`, `pt`, `ru`, `sk`, `sv`, `ta`, `te`, `th`, `tr`, `uk`, `ur`, `vi`, `zh` The model to use for speech generation. Learn more about models [here](https://docs.lmnt.com/models/overview). Allowed values: `blizzard` The desired output sample rate in Hz. Defaults to `24000` for all formats except `mulaw` which defaults to `8000`. Allowed values: 8000, 16000, 24000 Influences how expressive and emotionally varied the speech becomes. Lower values (like 0.3) create more neutral, consistent speaking styles. Higher values (like 1.0) allow for more dynamic emotional range and speaking styles. Controls the stability of the generated speech. A lower value (like 0.3) produces more consistent, reliable speech. A higher value (like 0.9) gives more flexibility in how words are spoken, but might occasionally produce unusual intonations or speech patterns. ### Returns Returns a streaming binary response (`binary`). ### Example ```sh curl --request POST \ --url https://api.lmnt.com/v1/ai/speech/bytes \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' \ --header 'Content-Type: application/json' \ --data '{ "text": "Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can'\''t believe it'\''s gonna rain, dude. Like what?", "voice": "leah" }' \ --output hello.mp3 ``` {/* @end */} ## Generate speech with timestamps Generates speech from text and returns a JSON object that contains a base64-encoded audio string and optionally word-level timestamps. This endpoint waits for all speech to be generated before responding, so it is not ideal for latency-sensitive applications. {/* @md-only */} ### Body Parameters The text to generate speech from; max 5000 characters per request (including spaces). The voice id of the voice to use; voice ids can be retrieved by calls to `List voices` or `Voice info`. When set to true, the generated speech will also be saved to your [clip library](https://app.lmnt.com/clips) in the LMNT playground. The desired output format of the audio. If you are using a streaming endpoint, you'll generate audio faster by selecting a streamable format since chunks are encoded and returned as they're generated. For non-streamable formats, all speech will be generated before encoding. Streamable formats: - `mp3`: 96kbps MP3 audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. Non-streamable formats: - `aac`: AAC audio codec. - `wav`: 16-bit PCM audio in WAV container. Allowed values: `aac`, `mp3`, `ulaw`, `wav`, `webm`, `pcm_s16le`, `pcm_f32le` The desired language. Two letter ISO 639-1 code. Defaults to auto language detection, but specifying the language is recommended for faster generation. Allowed values: `auto`, `ar`, `as`, `bn`, `cs`, `da`, `de`, `en`, `es`, `fi`, `fr`, `hi`, `id`, `it`, `ja`, `ko`, `ml`, `mr`, `nl`, `pl`, `pt`, `ru`, `sk`, `sv`, `ta`, `te`, `th`, `tr`, `uk`, `ur`, `vi`, `zh` The model to use for speech generation. Learn more about models [here](https://docs.lmnt.com/models/overview). Allowed values: `blizzard` If set as `true`, the response will contain a `timestamps` array describing where each input element falls in the generated audio. The desired output sample rate in Hz. Defaults to `24000` for all formats except `mulaw` which defaults to `8000`. Allowed values: 8000, 16000, 24000 Influences how expressive and emotionally varied the speech becomes. Lower values (like 0.3) create more neutral, consistent speaking styles. Higher values (like 1.0) allow for more dynamic emotional range and speaking styles. Controls the stability of the generated speech. A lower value (like 0.3) produces more consistent, reliable speech. A higher value (like 0.9) gives more flexibility in how words are spoken, but might occasionally produce unusual intonations or speech patterns. ### Returns The base64-encoded audio file; the format is determined by the `format` parameter. An array describing where each generated input element (words and non-words like spaces, punctuation, etc.) falls in the audio. The generated input element; beginning and ending with a short silence. The spoken duration of the generated input element, in seconds. The start time of the generated input element, in seconds. ### Example ```sh curl --request POST \ --url https://api.lmnt.com/v1/ai/speech \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' \ --header 'Content-Type: application/json' \ --data '{ "text": "Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can'\''t believe it'\''s gonna rain, dude. Like what?", "voice": "leah", "format": "mp3", "return_timestamps": true }' \ | jq -r .audio | base64 -d > hello.mp3 ``` {/* @end */} The voice id of the voice to use; voice ids can be retrieved by calls to `List voices` or `Voice info`. The text to generate speech from; max 5000 characters per request (including spaces). The model to use for speech generation. Learn more about models [here](https://docs.lmnt.com/models/overview). Allowed values: `blizzard` The desired language. Two letter ISO 639-1 code. Defaults to auto language detection, but specifying the language is recommended for faster generation. Allowed values: `auto`, `ar`, `as`, `bn`, `cs`, `da`, `de`, `en`, `es`, `fi`, `fr`, `hi`, `id`, `it`, `ja`, `ko`, `ml`, `mr`, `nl`, `pl`, `pt`, `ru`, `sk`, `sv`, `ta`, `te`, `th`, `tr`, `uk`, `ur`, `vi`, `zh` The desired output format of the audio. If you are using a streaming endpoint, you'll generate audio faster by selecting a streamable format since chunks are encoded and returned as they're generated. For non-streamable formats, all speech will be generated before encoding. Streamable formats: - `mp3`: 96kbps MP3 audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. Non-streamable formats: - `aac`: AAC audio codec. - `wav`: 16-bit PCM audio in WAV container. Allowed values: `aac`, `mp3`, `ulaw`, `wav`, `webm`, `pcm_s16le`, `pcm_f32le` The desired output sample rate in Hz. Defaults to `24000` for all formats except `mulaw` which defaults to `8000`. Allowed values: 8000, 16000, 24000 When set to true, the generated speech will also be saved to your [clip library](https://app.lmnt.com/clips) in the LMNT playground. Controls the stability of the generated speech. A lower value (like 0.3) produces more consistent, reliable speech. A higher value (like 0.9) gives more flexibility in how words are spoken, but might occasionally produce unusual intonations or speech patterns. Influences how expressive and emotionally varied the speech becomes. Lower values (like 0.3) create more neutral, consistent speaking styles. Higher values (like 1.0) allow for more dynamic emotional range and speaking styles. If set as `true`, the response will contain a `timestamps` array describing where each input element falls in the generated audio. The voice id of the voice to use; voice ids can be retrieved by calls to `List voices` or `Voice info`. The text to generate speech from; max 5000 characters per request (including spaces). The model to use for speech generation. Learn more about models [here](https://docs.lmnt.com/models/overview). Allowed values: `blizzard` The desired language. Two letter ISO 639-1 code. Defaults to auto language detection, but specifying the language is recommended for faster generation. Allowed values: `auto`, `ar`, `as`, `bn`, `cs`, `da`, `de`, `en`, `es`, `fi`, `fr`, `hi`, `id`, `it`, `ja`, `ko`, `ml`, `mr`, `nl`, `pl`, `pt`, `ru`, `sk`, `sv`, `ta`, `te`, `th`, `tr`, `uk`, `ur`, `vi`, `zh` The desired output format of the audio. If you are using a streaming endpoint, you'll generate audio faster by selecting a streamable format since chunks are encoded and returned as they're generated. For non-streamable formats, all speech will be generated before encoding. Streamable formats: - `mp3`: 96kbps MP3 audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. Non-streamable formats: - `aac`: AAC audio codec. - `wav`: 16-bit PCM audio in WAV container. Allowed values: `aac`, `mp3`, `ulaw`, `wav`, `webm`, `pcm_s16le`, `pcm_f32le` The desired output sample rate in Hz. Defaults to `24000` for all formats except `mulaw` which defaults to `8000`. Allowed values: 8000, 16000, 24000 When set to true, the generated speech will also be saved to your [clip library](https://app.lmnt.com/clips) in the LMNT playground. Controls the stability of the generated speech. A lower value (like 0.3) produces more consistent, reliable speech. A higher value (like 0.9) gives more flexibility in how words are spoken, but might occasionally produce unusual intonations or speech patterns. Influences how expressive and emotionally varied the speech becomes. Lower values (like 0.3) create more neutral, consistent speaking styles. Higher values (like 1.0) allow for more dynamic emotional range and speaking styles. The generated input element; beginning and ending with a short silence. The spoken duration of the generated input element, in seconds. The start time of the generated input element, in seconds. --- # Speech (Python) URL: https://docs.lmnt.com/api/python/speech --- {/* Generated by carbonsteel. DO NOT EDIT. */} ## Generate speech bytes"} /> Generates speech from text and streams the audio as binary data chunks in real-time as they are generated. This is the recommended endpoint for most text-to-speech use cases. You can either stream the chunks for low-latency playback or collect all chunks to get the complete audio file. {/* @md-only */} ### Parameters The text to generate speech from; max 5000 characters per request (including spaces). The voice id of the voice to use; voice ids can be retrieved by calls to `List voices` or `Voice info`. When set to true, the generated speech will also be saved to your [clip library](https://app.lmnt.com/clips) in the LMNT playground. The desired output format of the audio. If you are using a streaming endpoint, you'll generate audio faster by selecting a streamable format since chunks are encoded and returned as they're generated. For non-streamable formats, all speech will be generated before encoding. Streamable formats: - `mp3`: 96kbps MP3 audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. Non-streamable formats: - `aac`: AAC audio codec. - `wav`: 16-bit PCM audio in WAV container. The desired language. Two letter ISO 639-1 code. Defaults to auto language detection, but specifying the language is recommended for faster generation. The model to use for speech generation. Learn more about models [here](https://docs.lmnt.com/models/overview). The desired output sample rate in Hz. Defaults to `24000` for all formats except `mulaw` which defaults to `8000`. Influences how expressive and emotionally varied the speech becomes. Lower values (like 0.3) create more neutral, consistent speaking styles. Higher values (like 1.0) allow for more dynamic emotional range and speaking styles. Controls the stability of the generated speech. A lower value (like 0.3) produces more consistent, reliable speech. A higher value (like 0.9) gives more flexibility in how words are spoken, but might occasionally produce unusual intonations or speech patterns. ### Returns Returns a streaming binary response (`bytes`). ### Example ```python import asyncio import os from lmnt import AsyncLmnt async def main(): client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) async with client.speech.with_streaming_response.generate( text=( "Uhh, did you see the weather in Palo Alto tomorrow? " "Yeah, can't believe it's gonna rain, dude. Like what?" ), voice='leah', ) as response: await response.stream_to_file('hello.mp3') asyncio.run(main()) ``` {/* @end */} ## Generate speech with timestamps SpeechGenerateDetailedResponse"} /> Generates speech from text and returns a JSON object that contains a base64-encoded audio string and optionally word-level timestamps. This endpoint waits for all speech to be generated before responding, so it is not ideal for latency-sensitive applications. {/* @md-only */} ### Parameters The text to generate speech from; max 5000 characters per request (including spaces). The voice id of the voice to use; voice ids can be retrieved by calls to `List voices` or `Voice info`. When set to true, the generated speech will also be saved to your [clip library](https://app.lmnt.com/clips) in the LMNT playground. The desired output format of the audio. If you are using a streaming endpoint, you'll generate audio faster by selecting a streamable format since chunks are encoded and returned as they're generated. For non-streamable formats, all speech will be generated before encoding. Streamable formats: - `mp3`: 96kbps MP3 audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. Non-streamable formats: - `aac`: AAC audio codec. - `wav`: 16-bit PCM audio in WAV container. The desired language. Two letter ISO 639-1 code. Defaults to auto language detection, but specifying the language is recommended for faster generation. The model to use for speech generation. Learn more about models [here](https://docs.lmnt.com/models/overview). If set as `true`, the response will contain a `timestamps` array describing where each input element falls in the generated audio. The desired output sample rate in Hz. Defaults to `24000` for all formats except `mulaw` which defaults to `8000`. Influences how expressive and emotionally varied the speech becomes. Lower values (like 0.3) create more neutral, consistent speaking styles. Higher values (like 1.0) allow for more dynamic emotional range and speaking styles. Controls the stability of the generated speech. A lower value (like 0.3) produces more consistent, reliable speech. A higher value (like 0.9) gives more flexibility in how words are spoken, but might occasionally produce unusual intonations or speech patterns. ### Returns The base64-encoded audio file; the format is determined by the `format` parameter. An array describing where each generated input element (words and non-words like spaces, punctuation, etc.) falls in the audio. The generated input element; beginning and ending with a short silence. The spoken duration of the generated input element, in seconds. The start time of the generated input element, in seconds. ### Example ```python import asyncio import base64 import os from lmnt import AsyncLmnt async def main(): client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) response = await client.speech.generate_detailed( text=( "Uhh, did you see the weather in Palo Alto tomorrow? " "Yeah, can't believe it's gonna rain, dude. Like what?" ), voice='leah', format='mp3', return_timestamps=True, ) with open('hello.mp3', 'wb') as f: f.write(base64.b64decode(response.audio)) print(response.timestamps) asyncio.run(main()) ``` {/* @end */} The voice id of the voice to use; voice ids can be retrieved by calls to `List voices` or `Voice info`. The text to generate speech from; max 5000 characters per request (including spaces). The model to use for speech generation. Learn more about models [here](https://docs.lmnt.com/models/overview). The desired language. Two letter ISO 639-1 code. Defaults to auto language detection, but specifying the language is recommended for faster generation. The desired output format of the audio. If you are using a streaming endpoint, you'll generate audio faster by selecting a streamable format since chunks are encoded and returned as they're generated. For non-streamable formats, all speech will be generated before encoding. Streamable formats: - `mp3`: 96kbps MP3 audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. Non-streamable formats: - `aac`: AAC audio codec. - `wav`: 16-bit PCM audio in WAV container. The desired output sample rate in Hz. Defaults to `24000` for all formats except `mulaw` which defaults to `8000`. When set to true, the generated speech will also be saved to your [clip library](https://app.lmnt.com/clips) in the LMNT playground. Controls the stability of the generated speech. A lower value (like 0.3) produces more consistent, reliable speech. A higher value (like 0.9) gives more flexibility in how words are spoken, but might occasionally produce unusual intonations or speech patterns. Influences how expressive and emotionally varied the speech becomes. Lower values (like 0.3) create more neutral, consistent speaking styles. Higher values (like 1.0) allow for more dynamic emotional range and speaking styles. If set as `true`, the response will contain a `timestamps` array describing where each input element falls in the generated audio. The voice id of the voice to use; voice ids can be retrieved by calls to `List voices` or `Voice info`. The text to generate speech from; max 5000 characters per request (including spaces). The model to use for speech generation. Learn more about models [here](https://docs.lmnt.com/models/overview). The desired language. Two letter ISO 639-1 code. Defaults to auto language detection, but specifying the language is recommended for faster generation. The desired output format of the audio. If you are using a streaming endpoint, you'll generate audio faster by selecting a streamable format since chunks are encoded and returned as they're generated. For non-streamable formats, all speech will be generated before encoding. Streamable formats: - `mp3`: 96kbps MP3 audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. Non-streamable formats: - `aac`: AAC audio codec. - `wav`: 16-bit PCM audio in WAV container. The desired output sample rate in Hz. Defaults to `24000` for all formats except `mulaw` which defaults to `8000`. When set to true, the generated speech will also be saved to your [clip library](https://app.lmnt.com/clips) in the LMNT playground. Controls the stability of the generated speech. A lower value (like 0.3) produces more consistent, reliable speech. A higher value (like 0.9) gives more flexibility in how words are spoken, but might occasionally produce unusual intonations or speech patterns. Influences how expressive and emotionally varied the speech becomes. Lower values (like 0.3) create more neutral, consistent speaking styles. Higher values (like 1.0) allow for more dynamic emotional range and speaking styles. The generated input element; beginning and ending with a short silence. The spoken duration of the generated input element, in seconds. The start time of the generated input element, in seconds. --- # Speech (TypeScript) URL: https://docs.lmnt.com/api/typescript/speech --- {/* Generated by carbonsteel. DO NOT EDIT. */} ## Generate speech "} /> Generates speech from text and streams the audio as binary data chunks in real-time as they are generated. This is the recommended endpoint for most text-to-speech use cases. You can either stream the chunks for low-latency playback or collect all chunks to get the complete audio file. {/* @md-only */} ### Parameters The text to generate speech from; max 5000 characters per request (including spaces). The voice id of the voice to use; voice ids can be retrieved by calls to `List voices` or `Voice info`. When set to true, the generated speech will also be saved to your [clip library](https://app.lmnt.com/clips) in the LMNT playground. The desired output format of the audio. If you are using a streaming endpoint, you'll generate audio faster by selecting a streamable format since chunks are encoded and returned as they're generated. For non-streamable formats, all speech will be generated before encoding. Streamable formats: - `mp3`: 96kbps MP3 audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. Non-streamable formats: - `aac`: AAC audio codec. - `wav`: 16-bit PCM audio in WAV container. The desired language. Two letter ISO 639-1 code. Defaults to auto language detection, but specifying the language is recommended for faster generation. The model to use for speech generation. Learn more about models [here](https://docs.lmnt.com/models/overview). The desired output sample rate in Hz. Defaults to `24000` for all formats except `mulaw` which defaults to `8000`. Influences how expressive and emotionally varied the speech becomes. Lower values (like 0.3) create more neutral, consistent speaking styles. Higher values (like 1.0) allow for more dynamic emotional range and speaking styles. Controls the stability of the generated speech. A lower value (like 0.3) produces more consistent, reliable speech. A higher value (like 0.9) gives more flexibility in how words are spoken, but might occasionally produce unusual intonations or speech patterns. ### Returns Returns a streaming binary response (`Response`). ### Example ```typescript import { createWriteStream } from 'fs'; import { pipeline } from 'stream/promises'; import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const response = await client.speech.generate({ text: "Uhh, did you see the weather in Palo Alto tomorrow? " + "Yeah, can't believe it's gonna rain, dude. Like what?", voice: 'leah', }).asResponse(); await pipeline(response.body, createWriteStream('hello.mp3')); ``` {/* @end */} ## Generate speech with timestamps "} /> Generates speech from text and returns a JSON object that contains a base64-encoded audio string and optionally word-level timestamps. This endpoint waits for all speech to be generated before responding, so it is not ideal for latency-sensitive applications. {/* @md-only */} ### Parameters The text to generate speech from; max 5000 characters per request (including spaces). The voice id of the voice to use; voice ids can be retrieved by calls to `List voices` or `Voice info`. When set to true, the generated speech will also be saved to your [clip library](https://app.lmnt.com/clips) in the LMNT playground. The desired output format of the audio. If you are using a streaming endpoint, you'll generate audio faster by selecting a streamable format since chunks are encoded and returned as they're generated. For non-streamable formats, all speech will be generated before encoding. Streamable formats: - `mp3`: 96kbps MP3 audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. Non-streamable formats: - `aac`: AAC audio codec. - `wav`: 16-bit PCM audio in WAV container. The desired language. Two letter ISO 639-1 code. Defaults to auto language detection, but specifying the language is recommended for faster generation. The model to use for speech generation. Learn more about models [here](https://docs.lmnt.com/models/overview). If set as `true`, the response will contain a `timestamps` array describing where each input element falls in the generated audio. The desired output sample rate in Hz. Defaults to `24000` for all formats except `mulaw` which defaults to `8000`. Influences how expressive and emotionally varied the speech becomes. Lower values (like 0.3) create more neutral, consistent speaking styles. Higher values (like 1.0) allow for more dynamic emotional range and speaking styles. Controls the stability of the generated speech. A lower value (like 0.3) produces more consistent, reliable speech. A higher value (like 0.9) gives more flexibility in how words are spoken, but might occasionally produce unusual intonations or speech patterns. ### Returns The base64-encoded audio file; the format is determined by the `format` parameter. An array describing where each generated input element (words and non-words like spaces, punctuation, etc.) falls in the audio. The generated input element; beginning and ending with a short silence. The spoken duration of the generated input element, in seconds. The start time of the generated input element, in seconds. ### Example ```typescript import { writeFileSync } from 'fs'; import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const response = await client.speech.generateDetailed({ text: "Uhh, did you see the weather in Palo Alto tomorrow? " + "Yeah, can't believe it's gonna rain, dude. Like what?", voice: 'leah', format: 'mp3', return_timestamps: true, }); writeFileSync('hello.mp3', Buffer.from(response.audio, 'base64')); console.log(response.timestamps); ``` {/* @end */} The voice id of the voice to use; voice ids can be retrieved by calls to `List voices` or `Voice info`. The text to generate speech from; max 5000 characters per request (including spaces). The model to use for speech generation. Learn more about models [here](https://docs.lmnt.com/models/overview). The desired language. Two letter ISO 639-1 code. Defaults to auto language detection, but specifying the language is recommended for faster generation. The desired output format of the audio. If you are using a streaming endpoint, you'll generate audio faster by selecting a streamable format since chunks are encoded and returned as they're generated. For non-streamable formats, all speech will be generated before encoding. Streamable formats: - `mp3`: 96kbps MP3 audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. Non-streamable formats: - `aac`: AAC audio codec. - `wav`: 16-bit PCM audio in WAV container. The desired output sample rate in Hz. Defaults to `24000` for all formats except `mulaw` which defaults to `8000`. When set to true, the generated speech will also be saved to your [clip library](https://app.lmnt.com/clips) in the LMNT playground. Controls the stability of the generated speech. A lower value (like 0.3) produces more consistent, reliable speech. A higher value (like 0.9) gives more flexibility in how words are spoken, but might occasionally produce unusual intonations or speech patterns. Influences how expressive and emotionally varied the speech becomes. Lower values (like 0.3) create more neutral, consistent speaking styles. Higher values (like 1.0) allow for more dynamic emotional range and speaking styles. If set as `true`, the response will contain a `timestamps` array describing where each input element falls in the generated audio. The voice id of the voice to use; voice ids can be retrieved by calls to `List voices` or `Voice info`. The text to generate speech from; max 5000 characters per request (including spaces). The model to use for speech generation. Learn more about models [here](https://docs.lmnt.com/models/overview). The desired language. Two letter ISO 639-1 code. Defaults to auto language detection, but specifying the language is recommended for faster generation. The desired output format of the audio. If you are using a streaming endpoint, you'll generate audio faster by selecting a streamable format since chunks are encoded and returned as they're generated. For non-streamable formats, all speech will be generated before encoding. Streamable formats: - `mp3`: 96kbps MP3 audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. Non-streamable formats: - `aac`: AAC audio codec. - `wav`: 16-bit PCM audio in WAV container. The desired output sample rate in Hz. Defaults to `24000` for all formats except `mulaw` which defaults to `8000`. When set to true, the generated speech will also be saved to your [clip library](https://app.lmnt.com/clips) in the LMNT playground. Controls the stability of the generated speech. A lower value (like 0.3) produces more consistent, reliable speech. A higher value (like 0.9) gives more flexibility in how words are spoken, but might occasionally produce unusual intonations or speech patterns. Influences how expressive and emotionally varied the speech becomes. Lower values (like 0.3) create more neutral, consistent speaking styles. Higher values (like 1.0) allow for more dynamic emotional range and speaking styles. The generated input element; beginning and ending with a short silence. The spoken duration of the generated input element, in seconds. The start time of the generated input element, in seconds. --- # Generate speech URL: https://docs.lmnt.com/api/speech/generate Generates speech from text and streams the audio as binary data chunks in real-time as they are generated. This is the recommended endpoint for most text-to-speech use cases. You can either stream the chunks for low-latency playback or collect all chunks to get the complete audio file. --- {/* Generated by carbonsteel. DO NOT EDIT. */} **post** `/v1/ai/speech/bytes` Generates speech from text and streams the audio as binary data chunks in real-time as they are generated. This is the recommended endpoint for most text-to-speech use cases. You can either stream the chunks for low-latency playback or collect all chunks to get the complete audio file. ## Headers Your API key; get it from your [LMNT settings](https://app.lmnt.com/settings/api). The LMNT API version that this client was built against. Use `1.1` ## Parameters The text to generate speech from; max 5000 characters per request (including spaces). The voice id of the voice to use; voice ids can be retrieved by calls to `List voices` or `Voice info`. When set to true, the generated speech will also be saved to your [clip library](https://app.lmnt.com/clips) in the LMNT playground. The desired output format of the audio. If you are using a streaming endpoint, you'll generate audio faster by selecting a streamable format since chunks are encoded and returned as they're generated. For non-streamable formats, all speech will be generated before encoding. Streamable formats: - `mp3`: 96kbps MP3 audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. Non-streamable formats: - `aac`: AAC audio codec. - `wav`: 16-bit PCM audio in WAV container. Allowed values: `aac`, `mp3`, `ulaw`, `wav`, `webm`, `pcm_s16le`, `pcm_f32le` The desired language. Two letter ISO 639-1 code. Defaults to auto language detection, but specifying the language is recommended for faster generation. Allowed values: `auto`, `ar`, `as`, `bn`, `cs`, `da`, `de`, `en`, `es`, `fi`, `fr`, `hi`, `id`, `it`, `ja`, `ko`, `ml`, `mr`, `nl`, `pl`, `pt`, `ru`, `sk`, `sv`, `ta`, `te`, `th`, `tr`, `uk`, `ur`, `vi`, `zh` The model to use for speech generation. Learn more about models [here](https://docs.lmnt.com/models/overview). Allowed values: `blizzard` The desired output sample rate in Hz. Defaults to `24000` for all formats except `mulaw` which defaults to `8000`. Allowed values: 8000, 16000, 24000 Influences how expressive and emotionally varied the speech becomes. Lower values (like 0.3) create more neutral, consistent speaking styles. Higher values (like 1.0) allow for more dynamic emotional range and speaking styles. Controls the stability of the generated speech. A lower value (like 0.3) produces more consistent, reliable speech. A higher value (like 0.9) gives more flexibility in how words are spoken, but might occasionally produce unusual intonations or speech patterns. ## Returns Returns a streaming binary response (`binary`). ## Example ```sh curl --request POST \ --url https://api.lmnt.com/v1/ai/speech/bytes \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' \ --header 'Content-Type: application/json' \ --data '{ "text": "Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can'\''t believe it'\''s gonna rain, dude. Like what?", "voice": "leah" }' \ --output hello.mp3 ``` --- # Generate speech (Python) URL: https://docs.lmnt.com/api/python/speech/generate Generates speech from text and streams the audio as binary data chunks in real-time as they are generated. This is the recommended endpoint for most text-to-speech use cases. You can either stream the chunks for low-latency playback or collect all chunks to get the complete audio file. --- {/* Generated by carbonsteel. DO NOT EDIT. */} ```python speech.generate(**kwargs: SpeechGenerateParams) -> bytes ``` **post** `/v1/ai/speech/bytes` Generates speech from text and streams the audio as binary data chunks in real-time as they are generated. This is the recommended endpoint for most text-to-speech use cases. You can either stream the chunks for low-latency playback or collect all chunks to get the complete audio file. ## Parameters The text to generate speech from; max 5000 characters per request (including spaces). The voice id of the voice to use; voice ids can be retrieved by calls to `List voices` or `Voice info`. When set to true, the generated speech will also be saved to your [clip library](https://app.lmnt.com/clips) in the LMNT playground. The desired output format of the audio. If you are using a streaming endpoint, you'll generate audio faster by selecting a streamable format since chunks are encoded and returned as they're generated. For non-streamable formats, all speech will be generated before encoding. Streamable formats: - `mp3`: 96kbps MP3 audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. Non-streamable formats: - `aac`: AAC audio codec. - `wav`: 16-bit PCM audio in WAV container. The desired language. Two letter ISO 639-1 code. Defaults to auto language detection, but specifying the language is recommended for faster generation. The model to use for speech generation. Learn more about models [here](https://docs.lmnt.com/models/overview). The desired output sample rate in Hz. Defaults to `24000` for all formats except `mulaw` which defaults to `8000`. Influences how expressive and emotionally varied the speech becomes. Lower values (like 0.3) create more neutral, consistent speaking styles. Higher values (like 1.0) allow for more dynamic emotional range and speaking styles. Controls the stability of the generated speech. A lower value (like 0.3) produces more consistent, reliable speech. A higher value (like 0.9) gives more flexibility in how words are spoken, but might occasionally produce unusual intonations or speech patterns. ## Returns Returns a streaming binary response (`bytes`). ## Example ```python import asyncio import os from lmnt import AsyncLmnt async def main(): client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) async with client.speech.with_streaming_response.generate( text=( "Uhh, did you see the weather in Palo Alto tomorrow? " "Yeah, can't believe it's gonna rain, dude. Like what?" ), voice='leah', ) as response: await response.stream_to_file('hello.mp3') asyncio.run(main()) ``` --- # Generate speech (TypeScript) URL: https://docs.lmnt.com/api/typescript/speech/generate Generates speech from text and streams the audio as binary data chunks in real-time as they are generated. This is the recommended endpoint for most text-to-speech use cases. You can either stream the chunks for low-latency playback or collect all chunks to get the complete audio file. --- {/* Generated by carbonsteel. DO NOT EDIT. */} ```typescript speech.generate(body: SpeechGenerateParams): Promise ``` **post** `/v1/ai/speech/bytes` Generates speech from text and streams the audio as binary data chunks in real-time as they are generated. This is the recommended endpoint for most text-to-speech use cases. You can either stream the chunks for low-latency playback or collect all chunks to get the complete audio file. ## Parameters The text to generate speech from; max 5000 characters per request (including spaces). The voice id of the voice to use; voice ids can be retrieved by calls to `List voices` or `Voice info`. When set to true, the generated speech will also be saved to your [clip library](https://app.lmnt.com/clips) in the LMNT playground. The desired output format of the audio. If you are using a streaming endpoint, you'll generate audio faster by selecting a streamable format since chunks are encoded and returned as they're generated. For non-streamable formats, all speech will be generated before encoding. Streamable formats: - `mp3`: 96kbps MP3 audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. Non-streamable formats: - `aac`: AAC audio codec. - `wav`: 16-bit PCM audio in WAV container. The desired language. Two letter ISO 639-1 code. Defaults to auto language detection, but specifying the language is recommended for faster generation. The model to use for speech generation. Learn more about models [here](https://docs.lmnt.com/models/overview). The desired output sample rate in Hz. Defaults to `24000` for all formats except `mulaw` which defaults to `8000`. Influences how expressive and emotionally varied the speech becomes. Lower values (like 0.3) create more neutral, consistent speaking styles. Higher values (like 1.0) allow for more dynamic emotional range and speaking styles. Controls the stability of the generated speech. A lower value (like 0.3) produces more consistent, reliable speech. A higher value (like 0.9) gives more flexibility in how words are spoken, but might occasionally produce unusual intonations or speech patterns. ## Returns Returns a streaming binary response (`Response`). ## Example ```typescript import { createWriteStream } from 'fs'; import { pipeline } from 'stream/promises'; import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const response = await client.speech.generate({ text: "Uhh, did you see the weather in Palo Alto tomorrow? " + "Yeah, can't believe it's gonna rain, dude. Like what?", voice: 'leah', }).asResponse(); await pipeline(response.body, createWriteStream('hello.mp3')); ``` --- # Generate speech with timestamps URL: https://docs.lmnt.com/api/speech/generate-detailed Generates speech from text and returns a JSON object that contains a base64-encoded audio string and optionally word-level timestamps. This endpoint waits for all speech to be generated before responding, so it is not ideal for latency-sensitive applications. --- {/* Generated by carbonsteel. DO NOT EDIT. */} **post** `/v1/ai/speech` Generates speech from text and returns a JSON object that contains a base64-encoded audio string and optionally word-level timestamps. This endpoint waits for all speech to be generated before responding, so it is not ideal for latency-sensitive applications. ## Headers Your API key; get it from your [LMNT settings](https://app.lmnt.com/settings/api). The LMNT API version that this client was built against. Use `1.1` ## Parameters The text to generate speech from; max 5000 characters per request (including spaces). The voice id of the voice to use; voice ids can be retrieved by calls to `List voices` or `Voice info`. When set to true, the generated speech will also be saved to your [clip library](https://app.lmnt.com/clips) in the LMNT playground. The desired output format of the audio. If you are using a streaming endpoint, you'll generate audio faster by selecting a streamable format since chunks are encoded and returned as they're generated. For non-streamable formats, all speech will be generated before encoding. Streamable formats: - `mp3`: 96kbps MP3 audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. Non-streamable formats: - `aac`: AAC audio codec. - `wav`: 16-bit PCM audio in WAV container. Allowed values: `aac`, `mp3`, `ulaw`, `wav`, `webm`, `pcm_s16le`, `pcm_f32le` The desired language. Two letter ISO 639-1 code. Defaults to auto language detection, but specifying the language is recommended for faster generation. Allowed values: `auto`, `ar`, `as`, `bn`, `cs`, `da`, `de`, `en`, `es`, `fi`, `fr`, `hi`, `id`, `it`, `ja`, `ko`, `ml`, `mr`, `nl`, `pl`, `pt`, `ru`, `sk`, `sv`, `ta`, `te`, `th`, `tr`, `uk`, `ur`, `vi`, `zh` The model to use for speech generation. Learn more about models [here](https://docs.lmnt.com/models/overview). Allowed values: `blizzard` If set as `true`, the response will contain a `timestamps` array describing where each input element falls in the generated audio. The desired output sample rate in Hz. Defaults to `24000` for all formats except `mulaw` which defaults to `8000`. Allowed values: 8000, 16000, 24000 Influences how expressive and emotionally varied the speech becomes. Lower values (like 0.3) create more neutral, consistent speaking styles. Higher values (like 1.0) allow for more dynamic emotional range and speaking styles. Controls the stability of the generated speech. A lower value (like 0.3) produces more consistent, reliable speech. A higher value (like 0.9) gives more flexibility in how words are spoken, but might occasionally produce unusual intonations or speech patterns. ## Returns The base64-encoded audio file; the format is determined by the `format` parameter. An array describing where each generated input element (words and non-words like spaces, punctuation, etc.) falls in the audio. The generated input element; beginning and ending with a short silence. The spoken duration of the generated input element, in seconds. The start time of the generated input element, in seconds. ## Example ```sh curl --request POST \ --url https://api.lmnt.com/v1/ai/speech \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' \ --header 'Content-Type: application/json' \ --data '{ "text": "Uhh, did you see the weather in Palo Alto tomorrow? Yeah, can'\''t believe it'\''s gonna rain, dude. Like what?", "voice": "leah", "format": "mp3", "return_timestamps": true }' \ | jq -r .audio | base64 -d > hello.mp3 ``` --- # Generate speech with timestamps (Python) URL: https://docs.lmnt.com/api/python/speech/generate-detailed Generates speech from text and returns a JSON object that contains a base64-encoded audio string and optionally word-level timestamps. This endpoint waits for all speech to be generated before responding, so it is not ideal for latency-sensitive applications. --- {/* Generated by carbonsteel. DO NOT EDIT. */} ```python speech.generate_detailed(**kwargs: SpeechGenerateDetailedParams) -> SpeechGenerateDetailedResponse ``` **post** `/v1/ai/speech` Generates speech from text and returns a JSON object that contains a base64-encoded audio string and optionally word-level timestamps. This endpoint waits for all speech to be generated before responding, so it is not ideal for latency-sensitive applications. ## Parameters The text to generate speech from; max 5000 characters per request (including spaces). The voice id of the voice to use; voice ids can be retrieved by calls to `List voices` or `Voice info`. When set to true, the generated speech will also be saved to your [clip library](https://app.lmnt.com/clips) in the LMNT playground. The desired output format of the audio. If you are using a streaming endpoint, you'll generate audio faster by selecting a streamable format since chunks are encoded and returned as they're generated. For non-streamable formats, all speech will be generated before encoding. Streamable formats: - `mp3`: 96kbps MP3 audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. Non-streamable formats: - `aac`: AAC audio codec. - `wav`: 16-bit PCM audio in WAV container. The desired language. Two letter ISO 639-1 code. Defaults to auto language detection, but specifying the language is recommended for faster generation. The model to use for speech generation. Learn more about models [here](https://docs.lmnt.com/models/overview). If set as `true`, the response will contain a `timestamps` array describing where each input element falls in the generated audio. The desired output sample rate in Hz. Defaults to `24000` for all formats except `mulaw` which defaults to `8000`. Influences how expressive and emotionally varied the speech becomes. Lower values (like 0.3) create more neutral, consistent speaking styles. Higher values (like 1.0) allow for more dynamic emotional range and speaking styles. Controls the stability of the generated speech. A lower value (like 0.3) produces more consistent, reliable speech. A higher value (like 0.9) gives more flexibility in how words are spoken, but might occasionally produce unusual intonations or speech patterns. ## Returns The base64-encoded audio file; the format is determined by the `format` parameter. An array describing where each generated input element (words and non-words like spaces, punctuation, etc.) falls in the audio. The generated input element; beginning and ending with a short silence. The spoken duration of the generated input element, in seconds. The start time of the generated input element, in seconds. ## Example ```python import asyncio import base64 import os from lmnt import AsyncLmnt async def main(): client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) response = await client.speech.generate_detailed( text=( "Uhh, did you see the weather in Palo Alto tomorrow? " "Yeah, can't believe it's gonna rain, dude. Like what?" ), voice='leah', format='mp3', return_timestamps=True, ) with open('hello.mp3', 'wb') as f: f.write(base64.b64decode(response.audio)) print(response.timestamps) asyncio.run(main()) ``` --- # Generate speech with timestamps (TypeScript) URL: https://docs.lmnt.com/api/typescript/speech/generate-detailed Generates speech from text and returns a JSON object that contains a base64-encoded audio string and optionally word-level timestamps. This endpoint waits for all speech to be generated before responding, so it is not ideal for latency-sensitive applications. --- {/* Generated by carbonsteel. DO NOT EDIT. */} ```typescript speech.generateDetailed(body: SpeechGenerateDetailedParams): Promise ``` **post** `/v1/ai/speech` Generates speech from text and returns a JSON object that contains a base64-encoded audio string and optionally word-level timestamps. This endpoint waits for all speech to be generated before responding, so it is not ideal for latency-sensitive applications. ## Parameters The text to generate speech from; max 5000 characters per request (including spaces). The voice id of the voice to use; voice ids can be retrieved by calls to `List voices` or `Voice info`. When set to true, the generated speech will also be saved to your [clip library](https://app.lmnt.com/clips) in the LMNT playground. The desired output format of the audio. If you are using a streaming endpoint, you'll generate audio faster by selecting a streamable format since chunks are encoded and returned as they're generated. For non-streamable formats, all speech will be generated before encoding. Streamable formats: - `mp3`: 96kbps MP3 audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. Non-streamable formats: - `aac`: AAC audio codec. - `wav`: 16-bit PCM audio in WAV container. The desired language. Two letter ISO 639-1 code. Defaults to auto language detection, but specifying the language is recommended for faster generation. The model to use for speech generation. Learn more about models [here](https://docs.lmnt.com/models/overview). If set as `true`, the response will contain a `timestamps` array describing where each input element falls in the generated audio. The desired output sample rate in Hz. Defaults to `24000` for all formats except `mulaw` which defaults to `8000`. Influences how expressive and emotionally varied the speech becomes. Lower values (like 0.3) create more neutral, consistent speaking styles. Higher values (like 1.0) allow for more dynamic emotional range and speaking styles. Controls the stability of the generated speech. A lower value (like 0.3) produces more consistent, reliable speech. A higher value (like 0.9) gives more flexibility in how words are spoken, but might occasionally produce unusual intonations or speech patterns. ## Returns The base64-encoded audio file; the format is determined by the `format` parameter. An array describing where each generated input element (words and non-words like spaces, punctuation, etc.) falls in the audio. The generated input element; beginning and ending with a short silence. The spoken duration of the generated input element, in seconds. The start time of the generated input element, in seconds. ## Example ```typescript import { writeFileSync } from 'fs'; import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const response = await client.speech.generateDetailed({ text: "Uhh, did you see the weather in Palo Alto tomorrow? " + "Yeah, can't believe it's gonna rain, dude. Like what?", voice: 'leah', format: 'mp3', return_timestamps: true, }); writeFileSync('hello.mp3', Buffer.from(response.audio, 'base64')); console.log(response.timestamps); ``` ### Speech Sessions --- # Speech Sessions URL: https://docs.lmnt.com/api/speech-sessions --- {/* Generated by carbonsteel. DO NOT EDIT. */} ## Create speech session Stream text to our servers and receive generated speech in real-time. Great for latency-sensitive applications and situations where you don't have all the text upfront. {/* @md-only */} ### Init First message sent to server to establish session with configuration details. Discriminator identifying this message type. Your API key obtained from your account page The LMNT API version that this client was built against. The voice ID to use for speech generation, obtained from 'List voices' API The desired output format of the audio. - `mp3`: 96kbps MP3 audio. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. The desired output audio sample rate Controls whether the server will return timestamps for the generated speech ### Send Send text to the server to append into the text stream. The text you send can be split at any point. For example, sending `This is a test of the emergency broadcast system` is semantically equivalent to sending `This is a test of the eme` and `rgency broadcast system` separately. Discriminator identifying this message type. The text to generate speech from Force the server to generate speech for all buffered text in the stream. The server replies with a `flush_complete` carrying a matching `nonce` once it has finished streaming the flushed audio. Be careful when using `flush`. Our models are designed to factor in context when generating speech. When flushing the buffer at arbitrary points, your speech may sound less natural. Discriminator identifying this message type. Client-chosen nonce; the server returns it in the matching `flush_complete`. Drop the server's buffered text without generating speech for it. The server replies with a `reset_complete` carrying the matching `nonce` once the buffer has been cleared. You do not need to wait for `reset_complete` to begin sending more text. Discriminator identifying this message type. Client-chosen nonce; the server returns it in the matching `reset_complete`. Inform the server you're done appending text to this session and want it to close when the server has finished dispatching speech. Discriminator identifying this message type. Signal the server that no more text will be sent. ### Receive First message sent by the server, confirming the session is established. Discriminator identifying this message type. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. Binary audio data returned from the server. Timestamps for the audio chunk that was just streamed, if requested in `init`. Discriminator identifying this message type. Array of timestamp objects, one per generated text token. The timestamps array resets its start time for each chunk of audio. Acknowledgement that a flush command has been completed. The `nonce` matches the one carried by the original flush, allowing you to determine when it has completed. Discriminator identifying this message type. The nonce that was carried by the original `flush`. Acknowledgement that a reset command has been completed. The `nonce` matches the one carried by the original reset, allowing you to discard any remaining in-flight speech before the reset. Discriminator identifying this message type. The nonce that was carried by the original `reset`. Error envelope returned by the server. Connection closes immediately afterward. Discriminator identifying this message type. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. ### Example ```sh python3 <<'EOF' > hello.mp3 import asyncio, json, os, sys, websockets async def main(): async with websockets.connect('wss://api.lmnt.com/v1/ai/speech/stream') as ws: await ws.send(json.dumps({ 'type': 'init', 'X-API-Key': os.environ['LMNT_API_KEY'], 'lmnt-version': '1.1', 'voice': 'leah', 'format': 'mp3', })) await ws.send(json.dumps({'type': 'text', 'text': 'Uhh, '})) await ws.send(json.dumps({'type': 'text', 'text': 'did you '})) await ws.send(json.dumps({'type': 'text', 'text': 'see the '})) await ws.send(json.dumps({'type': 'text', 'text': 'weather in '})) await ws.send(json.dumps({'type': 'text', 'text': 'Palo Alto '})) await ws.send(json.dumps({'type': 'text', 'text': 'tomorrow? '})) await ws.send(json.dumps({'type': 'text', 'text': 'Yeah, '})) await ws.send(json.dumps({'type': 'text', 'text': "can't believe "})) await ws.send(json.dumps({'type': 'text', 'text': "it's gonna rain, "})) await ws.send(json.dumps({'type': 'text', 'text': 'dude. Like what?'})) await ws.send(json.dumps({'type': 'finish'})) async for msg in ws: if isinstance(msg, bytes): sys.stdout.buffer.write(msg) asyncio.run(main()) EOF ``` {/* @end */} Discriminator identifying this message type. Allowed values: `error` Slug identifying the error category. Allowed values: `invalid_request_error`, `authentication_error`, `permission_error`, `not_found_error`, `payment_required_error`, `rate_limit_error`, `internal_server_error` Human-readable error message. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. Slug identifying the error category. Allowed values: `invalid_request_error`, `authentication_error`, `permission_error`, `not_found_error`, `payment_required_error`, `rate_limit_error`, `internal_server_error` Human-readable error message. Discriminator identifying this message type. Signal the server that no more text will be sent. Allowed values: `finish` Discriminator identifying this message type. Allowed values: `flush` Client-chosen nonce; the server returns it in the matching `flush_complete`. Discriminator identifying this message type. Allowed values: `flush_complete` The nonce that was carried by the original `flush`. Discriminator identifying this message type. Allowed values: `init` Your API key obtained from your account page The LMNT API version that this client was built against. The voice ID to use for speech generation, obtained from 'List voices' API The desired output format of the audio. - `mp3`: 96kbps MP3 audio. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. Allowed values: `mp3`, `pcm_s16le`, `pcm_f32le`, `ulaw`, `webm` Allowed values: `auto`, `ar`, `as`, `bn`, `cs`, `da`, `de`, `en`, `es`, `fi`, `fr`, `hi`, `id`, `it`, `ja`, `ko`, `ml`, `mr`, `nl`, `pl`, `pt`, `ru`, `sk`, `sv`, `ta`, `te`, `th`, `tr`, `uk`, `ur`, `vi`, `zh` The desired output audio sample rate Allowed values: 24000, 16000, 8000 Controls whether the server will return timestamps for the generated speech Discriminator identifying this message type. Allowed values: `ready` Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. Discriminator identifying this message type. Allowed values: `reset` Client-chosen nonce; the server returns it in the matching `reset_complete`. Discriminator identifying this message type. Allowed values: `reset_complete` The nonce that was carried by the original `reset`. Discriminator identifying this message type. Allowed values: `text` The text to generate speech from The text segment The time at which the text starts, in seconds The spoken duration of the text segment, in seconds Discriminator identifying this message type. Allowed values: `timestamps` Array of timestamp objects, one per generated text token. The timestamps array resets its start time for each chunk of audio. The text segment The time at which the text starts, in seconds The spoken duration of the text segment, in seconds --- # Speech Sessions (Python) URL: https://docs.lmnt.com/api/python/speech-sessions --- {/* Generated by carbonsteel. DO NOT EDIT. */} ## Create speech session SpeechSession"} /> Stream text to our servers and receive generated speech in real-time. Great for latency-sensitive applications and situations where you don't have all the text upfront. {/* @md-only */} ### Parameters The voice ID to use for speech generation, obtained from 'List voices' API The desired output format of the audio. - `mp3`: 96kbps MP3 audio. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. Controls whether the server will return timestamps for the generated speech The desired output audio sample rate ### Returns First message sent by the server, confirming the session is established. Discriminator identifying this message type. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. Binary audio data returned from the server. Binary audio data returned from the server Timestamps for the audio chunk that was just streamed, if requested in `init`. Discriminator identifying this message type. Array of timestamp objects, one per generated text token. The timestamps array resets its start time for each chunk of audio. The text segment The time at which the text starts, in seconds The spoken duration of the text segment, in seconds Acknowledgement that a flush command has been completed. The `nonce` matches the one carried by the original flush, allowing you to determine when it has completed. Discriminator identifying this message type. The nonce that was carried by the original `flush`. Acknowledgement that a reset command has been completed. The `nonce` matches the one carried by the original reset, allowing you to discard any remaining in-flight speech before the reset. Discriminator identifying this message type. The nonce that was carried by the original `reset`. Error envelope returned by the server. Connection closes immediately afterward. Discriminator identifying this message type. Slug identifying the error category. Human-readable error message. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. ### Send Send text to the server to append into the text stream. The text you send can be split at any point. For example, sending `This is a test of the emergency broadcast system` is semantically equivalent to sending `This is a test of the eme` and `rgency broadcast system` separately. Force the server to generate speech for all buffered text in the stream. The server replies with a `flush_complete` carrying a matching `nonce` once it has finished streaming the flushed audio. Be careful when using `flush`. Our models are designed to factor in context when generating speech. When flushing the buffer at arbitrary points, your speech may sound less natural. Drop the server's buffered text without generating speech for it. The server replies with a `reset_complete` carrying the matching `nonce` once the buffer has been cleared. You do not need to wait for `reset_complete` to begin sending more text. Inform the server you're done appending text to this session and want it to close when the server has finished dispatching speech. ### Receive First message sent by the server, confirming the session is established. Discriminator identifying this message type. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. Binary audio data returned from the server. Binary audio data returned from the server Timestamps for the audio chunk that was just streamed, if requested in `init`. Discriminator identifying this message type. Array of timestamp objects, one per generated text token. The timestamps array resets its start time for each chunk of audio. The text segment The time at which the text starts, in seconds The spoken duration of the text segment, in seconds Acknowledgement that a flush command has been completed. The `nonce` matches the one carried by the original flush, allowing you to determine when it has completed. Discriminator identifying this message type. The nonce that was carried by the original `flush`. Acknowledgement that a reset command has been completed. The `nonce` matches the one carried by the original reset, allowing you to discard any remaining in-flight speech before the reset. Discriminator identifying this message type. The nonce that was carried by the original `reset`. Error envelope returned by the server. Connection closes immediately afterward. Discriminator identifying this message type. Slug identifying the error category. Human-readable error message. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. ### Example ```python import asyncio import os from lmnt import AsyncLmnt async def main(): client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) session = await client.speech.sessions.create( voice='leah', format='mp3', ) async def writer(): await session.send_text('Uhh, ') await session.send_text('did you ') await session.send_text('see the ') await session.send_text('weather in ') await session.send_text('Palo Alto ') await session.send_text('tomorrow? ') await session.send_text('Yeah, ') await session.send_text("can't believe ") await session.send_text("it's gonna rain, ") await session.send_text('dude. Like what?') await session.send_finish() async def reader(): with open('hello.mp3', 'wb') as f: async for message in session: if message.type == 'audio': f.write(message.audio) await asyncio.gather(writer(), reader()) await session.close() asyncio.run(main()) ``` {/* @end */} Discriminator identifying this message type. Slug identifying the error category. Human-readable error message. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. Slug identifying the error category. Human-readable error message. Discriminator identifying this message type. Signal the server that no more text will be sent. Discriminator identifying this message type. Client-chosen nonce; the server returns it in the matching `flush_complete`. Discriminator identifying this message type. The nonce that was carried by the original `flush`. Discriminator identifying this message type. Your API key obtained from your account page The LMNT API version that this client was built against. The voice ID to use for speech generation, obtained from 'List voices' API The desired output format of the audio. - `mp3`: 96kbps MP3 audio. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. The desired output audio sample rate Controls whether the server will return timestamps for the generated speech Discriminator identifying this message type. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. Discriminator identifying this message type. Client-chosen nonce; the server returns it in the matching `reset_complete`. Discriminator identifying this message type. The nonce that was carried by the original `reset`. Discriminator identifying this message type. The text to generate speech from The text segment The time at which the text starts, in seconds The spoken duration of the text segment, in seconds Discriminator identifying this message type. Array of timestamp objects, one per generated text token. The timestamps array resets its start time for each chunk of audio. The text segment The time at which the text starts, in seconds The spoken duration of the text segment, in seconds --- # Speech Sessions (TypeScript) URL: https://docs.lmnt.com/api/typescript/speech-sessions --- {/* Generated by carbonsteel. DO NOT EDIT. */} ## Create speech session Stream text to our servers and receive generated speech in real-time. Great for latency-sensitive applications and situations where you don't have all the text upfront. {/* @md-only */} ### Parameters The voice ID to use for speech generation, obtained from 'List voices' API The desired output format of the audio. - `mp3`: 96kbps MP3 audio. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. Controls whether the server will return timestamps for the generated speech The desired output audio sample rate ### Returns First message sent by the server, confirming the session is established. Discriminator identifying this message type. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. Binary audio data returned from the server. Binary audio data returned from the server Timestamps for the audio chunk that was just streamed, if requested in `init`. Discriminator identifying this message type. Array of timestamp objects, one per generated text token. The timestamps array resets its start time for each chunk of audio. The text segment The time at which the text starts, in seconds The spoken duration of the text segment, in seconds Acknowledgement that a flush command has been completed. The `nonce` matches the one carried by the original flush, allowing you to determine when it has completed. Discriminator identifying this message type. The nonce that was carried by the original `flush`. Acknowledgement that a reset command has been completed. The `nonce` matches the one carried by the original reset, allowing you to discard any remaining in-flight speech before the reset. Discriminator identifying this message type. The nonce that was carried by the original `reset`. Error envelope returned by the server. Connection closes immediately afterward. Discriminator identifying this message type. Slug identifying the error category. Human-readable error message. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. ### Send Send text to the server to append into the text stream. The text you send can be split at any point. For example, sending `This is a test of the emergency broadcast system` is semantically equivalent to sending `This is a test of the eme` and `rgency broadcast system` separately. Force the server to generate speech for all buffered text in the stream. The server replies with a `flush_complete` carrying a matching `nonce` once it has finished streaming the flushed audio. Be careful when using `flush`. Our models are designed to factor in context when generating speech. When flushing the buffer at arbitrary points, your speech may sound less natural. Drop the server's buffered text without generating speech for it. The server replies with a `reset_complete` carrying the matching `nonce` once the buffer has been cleared. You do not need to wait for `reset_complete` to begin sending more text. Inform the server you're done appending text to this session and want it to close when the server has finished dispatching speech. ### Receive First message sent by the server, confirming the session is established. Discriminator identifying this message type. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. Binary audio data returned from the server. Binary audio data returned from the server Timestamps for the audio chunk that was just streamed, if requested in `init`. Discriminator identifying this message type. Array of timestamp objects, one per generated text token. The timestamps array resets its start time for each chunk of audio. The text segment The time at which the text starts, in seconds The spoken duration of the text segment, in seconds Acknowledgement that a flush command has been completed. The `nonce` matches the one carried by the original flush, allowing you to determine when it has completed. Discriminator identifying this message type. The nonce that was carried by the original `flush`. Acknowledgement that a reset command has been completed. The `nonce` matches the one carried by the original reset, allowing you to discard any remaining in-flight speech before the reset. Discriminator identifying this message type. The nonce that was carried by the original `reset`. Error envelope returned by the server. Connection closes immediately afterward. Discriminator identifying this message type. Slug identifying the error category. Human-readable error message. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. ### Example ```typescript import { createWriteStream } from 'fs'; import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const session = client.speech.sessions.create({ voice: 'leah', format: 'mp3', }); async function writer() { session.sendText('Uhh, '); session.sendText('did you '); session.sendText('see the '); session.sendText('weather in '); session.sendText('Palo Alto '); session.sendText('tomorrow? '); session.sendText('Yeah, '); session.sendText("can't believe "); session.sendText("it's gonna rain, "); session.sendText('dude. Like what?'); session.sendFinish(); } async function reader() { const out = createWriteStream('hello.mp3'); for await (const message of session) { if (message.type === 'audio') { out.write(Buffer.from(message.audio as ArrayBuffer)); } } out.end(); } await Promise.all([writer(), reader()]); session.close(); ``` {/* @end */} Discriminator identifying this message type. Slug identifying the error category. Human-readable error message. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. Slug identifying the error category. Human-readable error message. Discriminator identifying this message type. Signal the server that no more text will be sent. Discriminator identifying this message type. Client-chosen nonce; the server returns it in the matching `flush_complete`. Discriminator identifying this message type. The nonce that was carried by the original `flush`. Discriminator identifying this message type. Your API key obtained from your account page The LMNT API version that this client was built against. The voice ID to use for speech generation, obtained from 'List voices' API The desired output format of the audio. - `mp3`: 96kbps MP3 audio. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. The desired output audio sample rate Controls whether the server will return timestamps for the generated speech Discriminator identifying this message type. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. Discriminator identifying this message type. Client-chosen nonce; the server returns it in the matching `reset_complete`. Discriminator identifying this message type. The nonce that was carried by the original `reset`. Discriminator identifying this message type. The text to generate speech from The text segment The time at which the text starts, in seconds The spoken duration of the text segment, in seconds Discriminator identifying this message type. Array of timestamp objects, one per generated text token. The timestamps array resets its start time for each chunk of audio. The text segment The time at which the text starts, in seconds The spoken duration of the text segment, in seconds --- # Create speech session URL: https://docs.lmnt.com/api/speech-sessions/create Stream text to our servers and receive generated speech in real-time. Great for latency-sensitive applications and situations where you don't have all the text upfront. --- {/* Generated by carbonsteel. DO NOT EDIT. */} **wss** `wss://api.lmnt.com/v1/ai/speech/stream` Stream text to our servers and receive generated speech in real-time. Great for latency-sensitive applications and situations where you don't have all the text upfront. ## Init First message sent to server to establish session with configuration details. Discriminator identifying this message type. Your API key obtained from your account page The LMNT API version that this client was built against. The voice ID to use for speech generation, obtained from 'List voices' API The desired output format of the audio. - `mp3`: 96kbps MP3 audio. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. The desired output audio sample rate Controls whether the server will return timestamps for the generated speech ## Send Send text to the server to append into the text stream. The text you send can be split at any point. For example, sending `This is a test of the emergency broadcast system` is semantically equivalent to sending `This is a test of the eme` and `rgency broadcast system` separately. Discriminator identifying this message type. The text to generate speech from Force the server to generate speech for all buffered text in the stream. The server replies with a `flush_complete` carrying a matching `nonce` once it has finished streaming the flushed audio. Be careful when using `flush`. Our models are designed to factor in context when generating speech. When flushing the buffer at arbitrary points, your speech may sound less natural. Discriminator identifying this message type. Client-chosen nonce; the server returns it in the matching `flush_complete`. Drop the server's buffered text without generating speech for it. The server replies with a `reset_complete` carrying the matching `nonce` once the buffer has been cleared. You do not need to wait for `reset_complete` to begin sending more text. Discriminator identifying this message type. Client-chosen nonce; the server returns it in the matching `reset_complete`. Inform the server you're done appending text to this session and want it to close when the server has finished dispatching speech. Discriminator identifying this message type. Signal the server that no more text will be sent. ## Receive First message sent by the server, confirming the session is established. Discriminator identifying this message type. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. Binary audio data returned from the server. Timestamps for the audio chunk that was just streamed, if requested in `init`. Discriminator identifying this message type. Array of timestamp objects, one per generated text token. The timestamps array resets its start time for each chunk of audio. Acknowledgement that a flush command has been completed. The `nonce` matches the one carried by the original flush, allowing you to determine when it has completed. Discriminator identifying this message type. The nonce that was carried by the original `flush`. Acknowledgement that a reset command has been completed. The `nonce` matches the one carried by the original reset, allowing you to discard any remaining in-flight speech before the reset. Discriminator identifying this message type. The nonce that was carried by the original `reset`. Error envelope returned by the server. Connection closes immediately afterward. Discriminator identifying this message type. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. ## Example ```sh python3 <<'EOF' > hello.mp3 import asyncio, json, os, sys, websockets async def main(): async with websockets.connect('wss://api.lmnt.com/v1/ai/speech/stream') as ws: await ws.send(json.dumps({ 'type': 'init', 'X-API-Key': os.environ['LMNT_API_KEY'], 'lmnt-version': '1.1', 'voice': 'leah', 'format': 'mp3', })) await ws.send(json.dumps({'type': 'text', 'text': 'Uhh, '})) await ws.send(json.dumps({'type': 'text', 'text': 'did you '})) await ws.send(json.dumps({'type': 'text', 'text': 'see the '})) await ws.send(json.dumps({'type': 'text', 'text': 'weather in '})) await ws.send(json.dumps({'type': 'text', 'text': 'Palo Alto '})) await ws.send(json.dumps({'type': 'text', 'text': 'tomorrow? '})) await ws.send(json.dumps({'type': 'text', 'text': 'Yeah, '})) await ws.send(json.dumps({'type': 'text', 'text': "can't believe "})) await ws.send(json.dumps({'type': 'text', 'text': "it's gonna rain, "})) await ws.send(json.dumps({'type': 'text', 'text': 'dude. Like what?'})) await ws.send(json.dumps({'type': 'finish'})) async for msg in ws: if isinstance(msg, bytes): sys.stdout.buffer.write(msg) asyncio.run(main()) EOF ``` --- # Create speech session (Python) URL: https://docs.lmnt.com/api/python/speech-sessions/create Stream text to our servers and receive generated speech in real-time. Great for latency-sensitive applications and situations where you don't have all the text upfront. --- {/* Generated by carbonsteel. DO NOT EDIT. */} ```python speech.sessions.create(**kwargs) -> SpeechSession ``` **wss** `wss://api.lmnt.com/v1/ai/speech/stream` Stream text to our servers and receive generated speech in real-time. Great for latency-sensitive applications and situations where you don't have all the text upfront. ## Parameters The voice ID to use for speech generation, obtained from 'List voices' API The desired output format of the audio. - `mp3`: 96kbps MP3 audio. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. Controls whether the server will return timestamps for the generated speech The desired output audio sample rate ## Returns First message sent by the server, confirming the session is established. Discriminator identifying this message type. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. Binary audio data returned from the server. Binary audio data returned from the server Timestamps for the audio chunk that was just streamed, if requested in `init`. Discriminator identifying this message type. Array of timestamp objects, one per generated text token. The timestamps array resets its start time for each chunk of audio. The text segment The time at which the text starts, in seconds The spoken duration of the text segment, in seconds Acknowledgement that a flush command has been completed. The `nonce` matches the one carried by the original flush, allowing you to determine when it has completed. Discriminator identifying this message type. The nonce that was carried by the original `flush`. Acknowledgement that a reset command has been completed. The `nonce` matches the one carried by the original reset, allowing you to discard any remaining in-flight speech before the reset. Discriminator identifying this message type. The nonce that was carried by the original `reset`. Error envelope returned by the server. Connection closes immediately afterward. Discriminator identifying this message type. Slug identifying the error category. Human-readable error message. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. ## Send Send text to the server to append into the text stream. The text you send can be split at any point. For example, sending `This is a test of the emergency broadcast system` is semantically equivalent to sending `This is a test of the eme` and `rgency broadcast system` separately. Force the server to generate speech for all buffered text in the stream. The server replies with a `flush_complete` carrying a matching `nonce` once it has finished streaming the flushed audio. Be careful when using `flush`. Our models are designed to factor in context when generating speech. When flushing the buffer at arbitrary points, your speech may sound less natural. Drop the server's buffered text without generating speech for it. The server replies with a `reset_complete` carrying the matching `nonce` once the buffer has been cleared. You do not need to wait for `reset_complete` to begin sending more text. Inform the server you're done appending text to this session and want it to close when the server has finished dispatching speech. ## Receive First message sent by the server, confirming the session is established. Discriminator identifying this message type. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. Binary audio data returned from the server. Binary audio data returned from the server Timestamps for the audio chunk that was just streamed, if requested in `init`. Discriminator identifying this message type. Array of timestamp objects, one per generated text token. The timestamps array resets its start time for each chunk of audio. The text segment The time at which the text starts, in seconds The spoken duration of the text segment, in seconds Acknowledgement that a flush command has been completed. The `nonce` matches the one carried by the original flush, allowing you to determine when it has completed. Discriminator identifying this message type. The nonce that was carried by the original `flush`. Acknowledgement that a reset command has been completed. The `nonce` matches the one carried by the original reset, allowing you to discard any remaining in-flight speech before the reset. Discriminator identifying this message type. The nonce that was carried by the original `reset`. Error envelope returned by the server. Connection closes immediately afterward. Discriminator identifying this message type. Slug identifying the error category. Human-readable error message. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. ## Example ```python import asyncio import os from lmnt import AsyncLmnt async def main(): client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) session = await client.speech.sessions.create( voice='leah', format='mp3', ) async def writer(): await session.send_text('Uhh, ') await session.send_text('did you ') await session.send_text('see the ') await session.send_text('weather in ') await session.send_text('Palo Alto ') await session.send_text('tomorrow? ') await session.send_text('Yeah, ') await session.send_text("can't believe ") await session.send_text("it's gonna rain, ") await session.send_text('dude. Like what?') await session.send_finish() async def reader(): with open('hello.mp3', 'wb') as f: async for message in session: if message.type == 'audio': f.write(message.audio) await asyncio.gather(writer(), reader()) await session.close() asyncio.run(main()) ``` --- # Create speech session (TypeScript) URL: https://docs.lmnt.com/api/typescript/speech-sessions/create Stream text to our servers and receive generated speech in real-time. Great for latency-sensitive applications and situations where you don't have all the text upfront. --- {/* Generated by carbonsteel. DO NOT EDIT. */} ```typescript speech.sessions.create(body: SpeechSessionParams): SpeechSession ``` **wss** `wss://api.lmnt.com/v1/ai/speech/stream` Stream text to our servers and receive generated speech in real-time. Great for latency-sensitive applications and situations where you don't have all the text upfront. ## Parameters The voice ID to use for speech generation, obtained from 'List voices' API The desired output format of the audio. - `mp3`: 96kbps MP3 audio. - `pcm_s16le`: PCM signed 16-bit little-endian audio. - `pcm_f32le`: PCM 32-bit floating-point little-endian audio. - `ulaw`: 8-bit G711 µ-law audio with a WAV header. - `webm`: WebM format with Opus audio codec. Controls whether the server will return timestamps for the generated speech The desired output audio sample rate ## Returns First message sent by the server, confirming the session is established. Discriminator identifying this message type. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. Binary audio data returned from the server. Binary audio data returned from the server Timestamps for the audio chunk that was just streamed, if requested in `init`. Discriminator identifying this message type. Array of timestamp objects, one per generated text token. The timestamps array resets its start time for each chunk of audio. The text segment The time at which the text starts, in seconds The spoken duration of the text segment, in seconds Acknowledgement that a flush command has been completed. The `nonce` matches the one carried by the original flush, allowing you to determine when it has completed. Discriminator identifying this message type. The nonce that was carried by the original `flush`. Acknowledgement that a reset command has been completed. The `nonce` matches the one carried by the original reset, allowing you to discard any remaining in-flight speech before the reset. Discriminator identifying this message type. The nonce that was carried by the original `reset`. Error envelope returned by the server. Connection closes immediately afterward. Discriminator identifying this message type. Slug identifying the error category. Human-readable error message. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. ## Send Send text to the server to append into the text stream. The text you send can be split at any point. For example, sending `This is a test of the emergency broadcast system` is semantically equivalent to sending `This is a test of the eme` and `rgency broadcast system` separately. Force the server to generate speech for all buffered text in the stream. The server replies with a `flush_complete` carrying a matching `nonce` once it has finished streaming the flushed audio. Be careful when using `flush`. Our models are designed to factor in context when generating speech. When flushing the buffer at arbitrary points, your speech may sound less natural. Drop the server's buffered text without generating speech for it. The server replies with a `reset_complete` carrying the matching `nonce` once the buffer has been cleared. You do not need to wait for `reset_complete` to begin sending more text. Inform the server you're done appending text to this session and want it to close when the server has finished dispatching speech. ## Receive First message sent by the server, confirming the session is established. Discriminator identifying this message type. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. Binary audio data returned from the server. Binary audio data returned from the server Timestamps for the audio chunk that was just streamed, if requested in `init`. Discriminator identifying this message type. Array of timestamp objects, one per generated text token. The timestamps array resets its start time for each chunk of audio. The text segment The time at which the text starts, in seconds The spoken duration of the text segment, in seconds Acknowledgement that a flush command has been completed. The `nonce` matches the one carried by the original flush, allowing you to determine when it has completed. Discriminator identifying this message type. The nonce that was carried by the original `flush`. Acknowledgement that a reset command has been completed. The `nonce` matches the one carried by the original reset, allowing you to discard any remaining in-flight speech before the reset. Discriminator identifying this message type. The nonce that was carried by the original `reset`. Error envelope returned by the server. Connection closes immediately afterward. Discriminator identifying this message type. Slug identifying the error category. Human-readable error message. Per-session request ID. If you're reporting an issue, include this if possible to make debugging easier. ## Example ```typescript import { createWriteStream } from 'fs'; import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const session = client.speech.sessions.create({ voice: 'leah', format: 'mp3', }); async function writer() { session.sendText('Uhh, '); session.sendText('did you '); session.sendText('see the '); session.sendText('weather in '); session.sendText('Palo Alto '); session.sendText('tomorrow? '); session.sendText('Yeah, '); session.sendText("can't believe "); session.sendText("it's gonna rain, "); session.sendText('dude. Like what?'); session.sendFinish(); } async function reader() { const out = createWriteStream('hello.mp3'); for await (const message of session) { if (message.type === 'audio') { out.write(Buffer.from(message.audio as ArrayBuffer)); } } out.end(); } await Promise.all([writer(), reader()]); session.close(); ``` ### Voices --- # Voices URL: https://docs.lmnt.com/api/voices --- {/* Generated by carbonsteel. DO NOT EDIT. */} ## Create voice Submits a request to create a voice with a supplied voice configuration and a batch of input audio data. {/* @md-only */} ### Body Parameters The input audio file to train the voice with, as a binary `wav`, `mp3`, `mp4`, `m4a`, or `webm` attachment. - Max file size: 250 MB. The display name for this voice A text description of this voice. A tag describing the gender of this voice. Has no effect on voice creation. A list of tags to attach to this voice. ### Returns A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Allowed values: `system`, `me`, `other` Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. Allowed values: `instant` A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ### Example ```sh curl --request POST \ --url https://api.lmnt.com/v1/ai/voice \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' \ --form 'name=new-voice' \ --form 'file=@input.mp3' ``` {/* @end */} ## Retrieve voice Returns details of a specific voice. {/* @md-only */} ### Path Parameters The `id` of the voice, which can be retrieved by a call to `List voices`. ### Returns A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Allowed values: `system`, `me`, `other` Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. Allowed values: `instant` A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ### Example ```sh curl \ --url https://api.lmnt.com/v1/ai/voice/leah \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' ``` {/* @end */} ## Update voice Updates metadata for a specific voice. Only provided fields will be changed. {/* @md-only */} ### Path Parameters The `id` of the voice, which can be retrieved by a call to `List voices`. ### Body Parameters A description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The display name for this voice. If `true`, adds this voice to your starred list. Replaces the tags attached to this voice with the given list. ### Returns Voice details A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Allowed values: `system`, `me`, `other` Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. Allowed values: `instant` A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ### Example ```sh curl --request PUT \ --url https://api.lmnt.com/v1/ai/voice/9c4a8f2b3e1d7c40 \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' \ --header 'Content-Type: application/json' \ --data '{"name": "renamed-voice"}' ``` {/* @end */} ## Delete voice Deletes a voice and cancels any pending operations on it. Cannot be undone. {/* @md-only */} ### Path Parameters The `id` of the voice, which can be retrieved by a call to `List voices`. ### Returns ### Example ```sh curl --request DELETE \ --url https://api.lmnt.com/v1/ai/voice/9c4a8f2b3e1d7c40 \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' ``` {/* @end */} ## List voices Returns a list of voices available to you. {/* @md-only */} ### Query Parameters If true, only returns voices that you have starred. Which owner's voices to return. Choose from `system`, `me`, or `all`. ### Returns A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Allowed values: `system`, `me`, `other` Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. Allowed values: `instant` A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ### Example ```sh curl \ --url https://api.lmnt.com/v1/ai/voice/list \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' ``` {/* @end */} A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Allowed values: `system`, `me`, `other` Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. Allowed values: `instant` A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. --- # Voices (Python) URL: https://docs.lmnt.com/api/python/voices --- {/* Generated by carbonsteel. DO NOT EDIT. */} ## Create voice Voice"} /> Submits a request to create a voice with a supplied voice configuration and a batch of input audio data. {/* @md-only */} ### Parameters The input audio file to train the voice with, as a binary `wav`, `mp3`, `mp4`, `m4a`, or `webm` attachment. - Max file size: 250 MB. The display name for this voice A text description of this voice. A tag describing the gender of this voice. Has no effect on voice creation. A list of tags to attach to this voice. ### Returns A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ### Example ```python import asyncio import os import sys from lmnt import AsyncLmnt async def main(): client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) voice = await client.voices.create( file=open(sys.argv[1], 'rb'), name='new-voice', ) print(voice) asyncio.run(main()) ``` {/* @end */} ## Retrieve voice Voice"} /> Returns details of a specific voice. {/* @md-only */} ### Parameters The `id` of the voice, which can be retrieved by a call to `List voices`. ### Returns A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ### Example ```python import asyncio import os from lmnt import AsyncLmnt async def main(): client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) voice = await client.voices.retrieve(id='leah') print(voice) asyncio.run(main()) ``` {/* @end */} ## Update voice VoiceUpdateResponse"} /> Updates metadata for a specific voice. Only provided fields will be changed. {/* @md-only */} ### Parameters The `id` of the voice, which can be retrieved by a call to `List voices`. A description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The display name for this voice. If `true`, adds this voice to your starred list. Replaces the tags attached to this voice with the given list. ### Returns Voice details A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ### Example ```python import asyncio import os from lmnt import AsyncLmnt async def main(): client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) response = await client.voices.update( id='9c4a8f2b3e1d7c40', name='renamed-voice', ) print(response) asyncio.run(main()) ``` {/* @end */} ## Delete voice VoiceDeleteResponse"} /> Deletes a voice and cancels any pending operations on it. Cannot be undone. {/* @md-only */} ### Parameters The `id` of the voice, which can be retrieved by a call to `List voices`. ### Returns ### Example ```python import asyncio import os from lmnt import AsyncLmnt async def main(): client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) response = await client.voices.delete(id='9c4a8f2b3e1d7c40') print(response.success) asyncio.run(main()) ``` {/* @end */} ## List voices List[Voice]"} /> Returns a list of voices available to you. {/* @md-only */} ### Parameters If true, only returns voices that you have starred. Which owner's voices to return. Choose from `system`, `me`, or `all`. ### Returns A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ### Example ```python import asyncio import os from lmnt import AsyncLmnt async def main(): client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) voices = await client.voices.list() print(voices[0]) asyncio.run(main()) ``` {/* @end */} A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. --- # Voices (TypeScript) URL: https://docs.lmnt.com/api/typescript/voices --- {/* Generated by carbonsteel. DO NOT EDIT. */} ## Create voice "} /> Submits a request to create a voice with a supplied voice configuration and a batch of input audio data. {/* @md-only */} ### Parameters The input audio file to train the voice with, as a binary `wav`, `mp3`, `mp4`, `m4a`, or `webm` attachment. - Max file size: 250 MB. The display name for this voice A text description of this voice. A tag describing the gender of this voice. Has no effect on voice creation. A list of tags to attach to this voice. ### Returns A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ### Example ```typescript import fs from 'fs'; import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const voice = await client.voices.create({ file: fs.createReadStream(process.argv[2]), name: 'new-voice', }); console.log(voice); ``` {/* @end */} ## Retrieve voice "} /> Returns details of a specific voice. {/* @md-only */} ### Parameters The `id` of the voice, which can be retrieved by a call to `List voices`. ### Returns A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ### Example ```typescript import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const voice = await client.voices.retrieve('leah'); console.log(voice); ``` {/* @end */} ## Update voice "} /> Updates metadata for a specific voice. Only provided fields will be changed. {/* @md-only */} ### Parameters The `id` of the voice, which can be retrieved by a call to `List voices`. A description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The display name for this voice. If `true`, adds this voice to your starred list. Replaces the tags attached to this voice with the given list. ### Returns Voice details A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ### Example ```typescript import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const response = await client.voices.update('9c4a8f2b3e1d7c40', { name: 'renamed-voice', }); console.log(response); ``` {/* @end */} ## Delete voice "} /> Deletes a voice and cancels any pending operations on it. Cannot be undone. {/* @md-only */} ### Parameters The `id` of the voice, which can be retrieved by a call to `List voices`. ### Returns ### Example ```typescript import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const response = await client.voices.delete('9c4a8f2b3e1d7c40'); console.log(response.success); ``` {/* @end */} ## List voices >"} /> Returns a list of voices available to you. {/* @md-only */} ### Parameters If true, only returns voices that you have starred. Which owner's voices to return. Choose from `system`, `me`, or `all`. ### Returns A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ### Example ```typescript import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const voices = await client.voices.list(); console.log(voices[0]); ``` {/* @end */} A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. --- # Create voice URL: https://docs.lmnt.com/api/voices/create Submits a request to create a voice with a supplied voice configuration and a batch of input audio data. --- {/* Generated by carbonsteel. DO NOT EDIT. */} **post** `/v1/ai/voice` Submits a request to create a voice with a supplied voice configuration and a batch of input audio data. ## Headers Your API key; get it from your [LMNT settings](https://app.lmnt.com/settings/api). The LMNT API version that this client was built against. Use `1.1` ## Parameters The input audio file to train the voice with, as a binary `wav`, `mp3`, `mp4`, `m4a`, or `webm` attachment. - Max file size: 250 MB. The display name for this voice A text description of this voice. A tag describing the gender of this voice. Has no effect on voice creation. A list of tags to attach to this voice. ## Returns A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Allowed values: `system`, `me`, `other` Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. Allowed values: `instant` A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ## Example ```sh curl --request POST \ --url https://api.lmnt.com/v1/ai/voice \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' \ --form 'name=new-voice' \ --form 'file=@input.mp3' ``` --- # Create voice (Python) URL: https://docs.lmnt.com/api/python/voices/create Submits a request to create a voice with a supplied voice configuration and a batch of input audio data. --- {/* Generated by carbonsteel. DO NOT EDIT. */} ```python voices.create(**kwargs: VoiceCreateParams) -> Voice ``` **post** `/v1/ai/voice` Submits a request to create a voice with a supplied voice configuration and a batch of input audio data. ## Parameters The input audio file to train the voice with, as a binary `wav`, `mp3`, `mp4`, `m4a`, or `webm` attachment. - Max file size: 250 MB. The display name for this voice A text description of this voice. A tag describing the gender of this voice. Has no effect on voice creation. A list of tags to attach to this voice. ## Returns A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ## Example ```python import asyncio import os import sys from lmnt import AsyncLmnt async def main(): client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) voice = await client.voices.create( file=open(sys.argv[1], 'rb'), name='new-voice', ) print(voice) asyncio.run(main()) ``` --- # Create voice (TypeScript) URL: https://docs.lmnt.com/api/typescript/voices/create Submits a request to create a voice with a supplied voice configuration and a batch of input audio data. --- {/* Generated by carbonsteel. DO NOT EDIT. */} ```typescript voices.create(body: VoiceCreateParams): Promise ``` **post** `/v1/ai/voice` Submits a request to create a voice with a supplied voice configuration and a batch of input audio data. ## Parameters The input audio file to train the voice with, as a binary `wav`, `mp3`, `mp4`, `m4a`, or `webm` attachment. - Max file size: 250 MB. The display name for this voice A text description of this voice. A tag describing the gender of this voice. Has no effect on voice creation. A list of tags to attach to this voice. ## Returns A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ## Example ```typescript import fs from 'fs'; import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const voice = await client.voices.create({ file: fs.createReadStream(process.argv[2]), name: 'new-voice', }); console.log(voice); ``` --- # List voices URL: https://docs.lmnt.com/api/voices/list Returns a list of voices available to you. --- {/* Generated by carbonsteel. DO NOT EDIT. */} **get** `/v1/ai/voice/list` Returns a list of voices available to you. ## Headers Your API key; get it from your [LMNT settings](https://app.lmnt.com/settings/api). The LMNT API version that this client was built against. Use `1.1` ## Parameters If true, only returns voices that you have starred. Which owner's voices to return. Choose from `system`, `me`, or `all`. ## Returns A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Allowed values: `system`, `me`, `other` Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. Allowed values: `instant` A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ## Example ```sh curl \ --url https://api.lmnt.com/v1/ai/voice/list \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' ``` --- # List voices (Python) URL: https://docs.lmnt.com/api/python/voices/list Returns a list of voices available to you. --- {/* Generated by carbonsteel. DO NOT EDIT. */} ```python voices.list(**kwargs: VoiceListParams) -> List[Voice] ``` **get** `/v1/ai/voice/list` Returns a list of voices available to you. ## Parameters If true, only returns voices that you have starred. Which owner's voices to return. Choose from `system`, `me`, or `all`. ## Returns A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ## Example ```python import asyncio import os from lmnt import AsyncLmnt async def main(): client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) voices = await client.voices.list() print(voices[0]) asyncio.run(main()) ``` --- # List voices (TypeScript) URL: https://docs.lmnt.com/api/typescript/voices/list Returns a list of voices available to you. --- {/* Generated by carbonsteel. DO NOT EDIT. */} ```typescript voices.list(query?: VoiceListParams): Promise> ``` **get** `/v1/ai/voice/list` Returns a list of voices available to you. ## Parameters If true, only returns voices that you have starred. Which owner's voices to return. Choose from `system`, `me`, or `all`. ## Returns A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ## Example ```typescript import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const voices = await client.voices.list(); console.log(voices[0]); ``` --- # Retrieve voice URL: https://docs.lmnt.com/api/voices/retrieve Returns details of a specific voice. --- {/* Generated by carbonsteel. DO NOT EDIT. */} **get** `/v1/ai/voice/{id}` Returns details of a specific voice. ## Headers Your API key; get it from your [LMNT settings](https://app.lmnt.com/settings/api). The LMNT API version that this client was built against. Use `1.1` ## Parameters The `id` of the voice, which can be retrieved by a call to `List voices`. ## Returns A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Allowed values: `system`, `me`, `other` Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. Allowed values: `instant` A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ## Example ```sh curl \ --url https://api.lmnt.com/v1/ai/voice/leah \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' ``` --- # Retrieve voice (Python) URL: https://docs.lmnt.com/api/python/voices/retrieve Returns details of a specific voice. --- {/* Generated by carbonsteel. DO NOT EDIT. */} ```python voices.retrieve(id: str) -> Voice ``` **get** `/v1/ai/voice/{id}` Returns details of a specific voice. ## Parameters The `id` of the voice, which can be retrieved by a call to `List voices`. ## Returns A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ## Example ```python import asyncio import os from lmnt import AsyncLmnt async def main(): client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) voice = await client.voices.retrieve(id='leah') print(voice) asyncio.run(main()) ``` --- # Retrieve voice (TypeScript) URL: https://docs.lmnt.com/api/typescript/voices/retrieve Returns details of a specific voice. --- {/* Generated by carbonsteel. DO NOT EDIT. */} ```typescript voices.retrieve(id: string): Promise ``` **get** `/v1/ai/voice/{id}` Returns details of a specific voice. ## Parameters The `id` of the voice, which can be retrieved by a call to `List voices`. ## Returns A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ## Example ```typescript import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const voice = await client.voices.retrieve('leah'); console.log(voice); ``` --- # Update voice URL: https://docs.lmnt.com/api/voices/update Updates metadata for a specific voice. Only provided fields will be changed. --- {/* Generated by carbonsteel. DO NOT EDIT. */} **put** `/v1/ai/voice/{id}` Updates metadata for a specific voice. Only provided fields will be changed. ## Headers Your API key; get it from your [LMNT settings](https://app.lmnt.com/settings/api). The LMNT API version that this client was built against. Use `1.1` ## Parameters The `id` of the voice, which can be retrieved by a call to `List voices`. A description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The display name for this voice. If `true`, adds this voice to your starred list. Replaces the tags attached to this voice with the given list. ## Returns Voice details A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Allowed values: `system`, `me`, `other` Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. Allowed values: `instant` A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ## Example ```sh curl --request PUT \ --url https://api.lmnt.com/v1/ai/voice/9c4a8f2b3e1d7c40 \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' \ --header 'Content-Type: application/json' \ --data '{"name": "renamed-voice"}' ``` --- # Update voice (Python) URL: https://docs.lmnt.com/api/python/voices/update Updates metadata for a specific voice. Only provided fields will be changed. --- {/* Generated by carbonsteel. DO NOT EDIT. */} ```python voices.update(id: str, **kwargs: VoiceUpdateParams) -> VoiceUpdateResponse ``` **put** `/v1/ai/voice/{id}` Updates metadata for a specific voice. Only provided fields will be changed. ## Parameters The `id` of the voice, which can be retrieved by a call to `List voices`. A description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The display name for this voice. If `true`, adds this voice to your starred list. Replaces the tags attached to this voice with the given list. ## Returns Voice details A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ## Example ```python import asyncio import os from lmnt import AsyncLmnt async def main(): client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) response = await client.voices.update( id='9c4a8f2b3e1d7c40', name='renamed-voice', ) print(response) asyncio.run(main()) ``` --- # Update voice (TypeScript) URL: https://docs.lmnt.com/api/typescript/voices/update Updates metadata for a specific voice. Only provided fields will be changed. --- {/* Generated by carbonsteel. DO NOT EDIT. */} ```typescript voices.update(id: string, body: VoiceUpdateParams): Promise ``` **put** `/v1/ai/voice/{id}` Updates metadata for a specific voice. Only provided fields will be changed. ## Parameters The `id` of the voice, which can be retrieved by a call to `List voices`. A description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The display name for this voice. If `true`, adds this voice to your starred list. Replaces the tags attached to this voice with the given list. ## Returns Voice details A text description of this voice. A tag describing the gender of this voice, e.g. `male`, `female`, `nonbinary`. The unique identifier of this voice. The display name of this voice. The owner of this voice. Whether this voice has been starred by you or not. The state of this voice in the training pipeline (e.g., `ready`, `training`). Tags attached to this voice. The method by which this voice was created. Always `instant`. A URL that returns a preview speech sample of this voice. The file can be played directly in a browser or audio player. ## Example ```typescript import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const response = await client.voices.update('9c4a8f2b3e1d7c40', { name: 'renamed-voice', }); console.log(response); ``` --- # Delete voice URL: https://docs.lmnt.com/api/voices/delete Deletes a voice and cancels any pending operations on it. Cannot be undone. --- {/* Generated by carbonsteel. DO NOT EDIT. */} **delete** `/v1/ai/voice/{id}` Deletes a voice and cancels any pending operations on it. Cannot be undone. ## Headers Your API key; get it from your [LMNT settings](https://app.lmnt.com/settings/api). The LMNT API version that this client was built against. Use `1.1` ## Parameters The `id` of the voice, which can be retrieved by a call to `List voices`. ## Returns ## Example ```sh curl --request DELETE \ --url https://api.lmnt.com/v1/ai/voice/9c4a8f2b3e1d7c40 \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' ``` --- # Delete voice (Python) URL: https://docs.lmnt.com/api/python/voices/delete Deletes a voice and cancels any pending operations on it. Cannot be undone. --- {/* Generated by carbonsteel. DO NOT EDIT. */} ```python voices.delete(id: str) -> VoiceDeleteResponse ``` **delete** `/v1/ai/voice/{id}` Deletes a voice and cancels any pending operations on it. Cannot be undone. ## Parameters The `id` of the voice, which can be retrieved by a call to `List voices`. ## Returns ## Example ```python import asyncio import os from lmnt import AsyncLmnt async def main(): client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) response = await client.voices.delete(id='9c4a8f2b3e1d7c40') print(response.success) asyncio.run(main()) ``` --- # Delete voice (TypeScript) URL: https://docs.lmnt.com/api/typescript/voices/delete Deletes a voice and cancels any pending operations on it. Cannot be undone. --- {/* Generated by carbonsteel. DO NOT EDIT. */} ```typescript voices.delete(id: string): Promise ``` **delete** `/v1/ai/voice/{id}` Deletes a voice and cancels any pending operations on it. Cannot be undone. ## Parameters The `id` of the voice, which can be retrieved by a call to `List voices`. ## Returns ## Example ```typescript import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const response = await client.voices.delete('9c4a8f2b3e1d7c40'); console.log(response.success); ``` ### Accounts --- # Accounts URL: https://docs.lmnt.com/api/accounts --- {/* Generated by carbonsteel. DO NOT EDIT. */} ## Retrieve account Returns details about your account. {/* @md-only */} ### Parameters _No parameters._ ### Returns The maximum number of characters per billing period allowed by your plan. The type of plan you are subscribed to. The number of characters remaining in this billing period. ### Example ```sh curl \ --url https://api.lmnt.com/v1/account \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' ``` {/* @end */} --- # Accounts (Python) URL: https://docs.lmnt.com/api/python/accounts --- {/* Generated by carbonsteel. DO NOT EDIT. */} ## Retrieve account AccountRetrieveResponse"} /> Returns details about your account. {/* @md-only */} ### Parameters _No parameters._ ### Returns The maximum number of characters per billing period allowed by your plan. The type of plan you are subscribed to. The number of characters remaining in this billing period. ### Example ```python import asyncio import os from lmnt import AsyncLmnt async def main(): client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) response = await client.accounts.retrieve() print(response) asyncio.run(main()) ``` {/* @end */} --- # Accounts (TypeScript) URL: https://docs.lmnt.com/api/typescript/accounts --- {/* Generated by carbonsteel. DO NOT EDIT. */} ## Retrieve account "} /> Returns details about your account. {/* @md-only */} ### Parameters _No parameters._ ### Returns The maximum number of characters per billing period allowed by your plan. The type of plan you are subscribed to. The number of characters remaining in this billing period. ### Example ```typescript import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const response = await client.accounts.retrieve(); console.log(response); ``` {/* @end */} --- # Retrieve account URL: https://docs.lmnt.com/api/accounts/retrieve Returns details about your account. --- {/* Generated by carbonsteel. DO NOT EDIT. */} **get** `/v1/account` Returns details about your account. ## Headers Your API key; get it from your [LMNT settings](https://app.lmnt.com/settings/api). The LMNT API version that this client was built against. Use `1.1` ## Parameters _No parameters._ ## Returns The maximum number of characters per billing period allowed by your plan. The type of plan you are subscribed to. The number of characters remaining in this billing period. ## Example ```sh curl \ --url https://api.lmnt.com/v1/account \ --header "X-API-Key: $LMNT_API_KEY" \ --header 'lmnt-version: 1.1' ``` --- # Retrieve account (Python) URL: https://docs.lmnt.com/api/python/accounts/retrieve Returns details about your account. --- {/* Generated by carbonsteel. DO NOT EDIT. */} ```python accounts.retrieve() -> AccountRetrieveResponse ``` **get** `/v1/account` Returns details about your account. ## Parameters _No parameters._ ## Returns The maximum number of characters per billing period allowed by your plan. The type of plan you are subscribed to. The number of characters remaining in this billing period. ## Example ```python import asyncio import os from lmnt import AsyncLmnt async def main(): client = AsyncLmnt( api_key=os.environ.get('LMNT_API_KEY'), # This is the default and can be omitted ) response = await client.accounts.retrieve() print(response) asyncio.run(main()) ``` --- # Retrieve account (TypeScript) URL: https://docs.lmnt.com/api/typescript/accounts/retrieve Returns details about your account. --- {/* Generated by carbonsteel. DO NOT EDIT. */} ```typescript accounts.retrieve(): Promise ``` **get** `/v1/account` Returns details about your account. ## Parameters _No parameters._ ## Returns The maximum number of characters per billing period allowed by your plan. The type of plan you are subscribed to. The number of characters remaining in this billing period. ## Example ```typescript import Lmnt from 'lmnt-node'; const client = new Lmnt({ apiKey: process.env['LMNT_API_KEY'], // This is the default and can be omitted }); const response = await client.accounts.retrieve(); console.log(response); ``` ### Support & configuration --- # Versions URL: https://docs.lmnt.com/api/versioning --- When making API requests, you must send an `lmnt-version` request header. For example, `lmnt-version: 1.1`. If you're using one of our [Client SDKs](/api/client-sdks), this is handled for you automatically. --- For any given version, we will preserve: - Existing input parameters - Existing output fields However, we may do the following: - Add additional optional inputs - Add additional fields to outputs - Change conditions for specific error types - Add new variants to enum-like output values (for example, new streaming event types or new `format` values) Generally, if you are using the API as documented in this reference, we will not break your usage. ## Version history We always recommend using the latest API version whenever possible. Previous versions are considered deprecated and may be unavailable to new users. - `1.1`: - Removed legacy options carried over from our earliest models. - Overhauled [speech sessions](/build-with-lmnt/speech-sessions-api) protocol: - `type`-discriminated control frames. - Distinct `flush_complete` and `reset_complete` acks. - New `ready` frame carrying the session's `request_id`. - Every response now carries a `request-id` header, surfaced as `request_id` on SDK responses and errors. - Errors now use a structured envelope `{ type, error: { type, message }, request_id }`. - `1.0`: Initial version.