POST
/
v1
/
ai
/
voice
curl --request POST \
  --url https://api.lmnt.com/v1/ai/voice \
  --header 'Content-Type: multipart/form-data' \
  --header 'X-API-Key: <api-key>' \
  --form 'metadata={"name": "new-voice", "type": "instant", "enhance": false}' \
  --form 'files=[
  "@/Users/user/file1.wav",
  "@/Users/user/file2.wav"
]'
{
  "description": "a newly created voice",
  "gender": "male",
  "id": "123456789abcdef",
  "name": "new-voice",
  "owner": "me",
  "starred": false,
  "state": "ready",
  "type": "instant"
}

For Professional Voices, at least 5 minutes of source audio is required for a clone; the more, the better, up to 250MB total source file size.

For Instant Voices, as little as 5 seconds of source audio gets you an instant clone.

For more on voices in general, visit our guide.

Authorizations

X-API-Key
string
header
required

Your API key; get it from your LMNT account page.

Body

multipart/form-data
metadata
string
required

Information about the voice you are creating; a stringified JSON object containing the following fields:

  • name required: string; The display name for this voice
  • enhance required: bool; For unclean audio with background noise, applies processing to attempt to improve quality. Default is false as this can also degrade quality in some circumstances.
  • type optional: string; The type of voice to create. Defaults to instant.
  • gender optional: string; A tag describing the gender of this voice. Has no effect on voice creation.
  • description optional: string; A text description of this voice.
files
file[]
required

One or more input audio files to train the voice in the form of binary wav, mp3, mp4, m4a, or webm attachments.

  • Max attached files: 20.
  • Max total file size: 250 MB.
  • Professional voices require at least 5 minutes of source audio to train from.

Response

200
application/json
OK

Voice details

id
string
required

The unique identifier of this voice.

name
string
required

The display name of this voice.

owner
enum<string>
required

The owner of this voice.

Available options:
system,
me,
other
state
string
required

The state of this voice in the training pipeline (e.g., ready, training).

description
string | null

A text description of this voice.

gender
string

A tag describing the gender of this voice, e.g. male, female, nonbinary.

starred
boolean

Whether this voice has been starred by you or not.

type
enum<string>

The method by which this voice was created: instant or professional.

Available options:
instant,
professional

Was this page helpful?