This is the reference for the v1 Lmnt Node SDK. The v2 SDK has a different API and is not compatible with this reference.
The Speech
class is your primary touch-point.
Instantiate a Speech
object with your API key :
import Speech from 'lmnt-node'
const speech = new Speech ( 'LMNT_API_KEY' )
Alternatively, you can set the LMNT_API_KEY
environment variable and omit the constructor argument.
fetchVoices
async fetchVoices(options={})
Returns the voices available for use in speech synthesis calls.
const voices = await speech . fetchVoices ()
Parameters
An optional object containing fields to update.
If true
, only return starred voices; Defaults to false
Specify which voices to return. Choose from system
, me
, or all
. Defaults to all
.
Return value
A list of voice metadata objects. Here’s a sample object:
[
{
"name" : "Morgan" ,
"id" : "morgan" ,
"state" : "ready" ,
"owner" : "system" ,
"starred" : false ,
"gender" : "F" ,
"description" : "UK. Young adult. Conversational"
}
]
fetchVoice
async fetchVoice(voice)
Returns the voice metadata for a single voice.
const voice = await speech . fetchVoice ( 'morgan' )
Parameters
The id of the voice to update. Voice ids can be retrieved from fetchVoices()
.
Return value
The voice metadata object. Here’s a sample object:
{
"name" : "Morgan" ,
"id" : "morgan" ,
"state" : "ready" ,
"owner" : "system" ,
"starred" : false ,
"gender" : "F" ,
"description" : "UK. Young adult. Conversational"
}
createVoice
async createVoice(name, enhance, filenames, options={})
Creates a new voice from a set of audio files. Returns the voice metadata object.
const filenames = [ 'file1.wav' , 'file2.wav' ]
const result = await speech . createVoice ( 'new-voice' , false , filenames )
Parameters
For unclean audio with background noise, applies processing to attempt to improve quality. Not on by default as it can also degrade quality in some circumstances.
A list of filenames to use for the voice.
The type of voice to create. Must be one of instant
or professional
. Defaults to instant
.
The gender of the voice, e.g. male
, female
, nonbinary
. For categorization purposes. Defaults to None
.
A description of the voice. Defaults to None
.
Return value
The voice metadata object. Here’s a sample object:
{
"id" : "123444566422" ,
"name" : "new-voice" ,
"owner" : "me" ,
"state" : "ready" ,
"starred" : false ,
"description" : "Totam necessitatibus saepe repudiandae perferendis. Tempora iure provident. Consequatur debitis assumenda. Earum debitis cum." ,
"type" : "instant" ,
"gender" : "male"
}
updateVoice
async updateVoice(voice, options={})
Updates metadata for a specific voice. A voice that is not owned by you can only have its starred
field updated.
Only provided fields will be changed.
const options = { 'name' : 'new-voice-name' , 'starred' : true }
await speech . updateVoice ( '123444566422' , options )
Parameters
The id of the voice to update. If you don’t know the id, you can get it from list_voices()
.
The properties to update. Only provided fields will be changed.
Whether the voice is starred by you
The gender of the voice, e.g. male
, female
, nonbinary
. For categorization purposes.
A description of the voice.
Return value
The updated voice metadata object.
deleteVoice
async deleteVoice(voice)
Deletes a voice and cancels any pending operations on it. The voice must be owned by you. Cannot be undone.
await speech . deleteVoice ( '123444566422' )
Parameters
The id of the voice to delete. If you don’t know the id, you can get it from list_voices()
.
Return value
A success or error message. Here’s a sample object:
synthesize
async synthesize(text, voice, options={})
Synthesizes speech for a supplied text string.
const synthesis = await speech . synthesize ( 'Hello world!' , 'morgan' )
const audio = synthesis . audio
Parameters
Which voice to render; id is found using the list_voices
call.
Additional options for the synthesis request.
The model to use for synthesis. One of aurora
(default) or blizzard
. Learn more about models here .
aac
, mp3
, wav
; Defaults to mp3
(24kHz 16-bit mono).
The desired language of the synthesized speech. Two letter ISO 639-1 code. One of de
, en
, es
, fr
, pt
, zh
, ko
, hi
. Does not work with professional clones and the blizzard
model.
Produce speech of this length in seconds; maximum 300.0 (5 minutes). Does not work with the blizzard
model.
Whether to include word durations detail in the response.
Whether to include the seed used for synthesis in the response.
The desired output sample rate in Hz, one of: 8000
, 16000
, 24000
; defaults to 24000
for all formats except mulaw
which defaults to 8000
.
Floating point value between 0.25 (slow) and 2.0 (fast).
The seed used to specify a different take; Defaults to a random value.
Return value
The synthesized audio encoded in the requested format as a Buffer object.
durations
array of duration objects
An array of text duration objects. Only returned if return_durations
is True
.
Each object describes the duration of a chunk of text (e.g., words, punctuation, and spaces) with the following keys:
The text for which timing information is being reported.
The time at which text
starts, in seconds from the start of the audio.
The duration of text
, in seconds.
The seed used for synthesis. Only returned if return_seed
is True
.
Here is the schema for the return value:
{
"audio" : binary - audio - file ,
"durations" : [
{
"text" : "string" ,
"start" : 0 ,
"duration" : 0
},
...
],
"seed" : "number"
}
Notes
The mp3
bitrate is 96kbps.
synthesizeStreaming
synthesizeStreaming(voice, options={})
Creates a new, full-duplex streaming session. You can use the returned
connection object to concurrently stream text content to the server and receive
speech data from the server.
Parameters
Which voice to render; id can be found using the fetchVoices
call.
Additional options for the streaming connection.
The desired output audio format. One of:
mp3
: 96kbps MP3 audio. This format is useful for applications that need to play the audio directly to the user.
raw
: 16-bit little-endian linear PCM audio. This format is useful for applications that need to process the audio further, such as adding effects or mixing multiple audio streams.
ulaw
: 8-bit G711 µ-law audio with a WAV header. This format is most useful for telephony applications.
The desired language of the synthesized speech. Two letter ISO 639-1 code. One of de
, en
, es
, fr
, pt
, zh
, ko
, hi
.
The desired output audio sample rate. One of:
24000
: 24kHz audio. This sample rate is useful for applications that need high-quality audio.
16000
: 16kHz audio. This sample rate is useful for applications that need to save bandwidth.
8000
: 8kHz audio. This sample rate is most useful for telephony applications and µ-law encoding.
The speed of the speech. Floating point value between 0.25 (slow) and 2.0 (fast).
Whether to return extra data (durations data and warnings) with each audio chunk.
Return value
A StreamingSynthesisConnection
instance, which you can use to stream data.