This is the reference for the v1 Lmnt Node SDK. The v2 SDK has a different API and is not compatible with this reference.
The Speech
class is your primary touch-point.
Instantiate a Speech
object with your API key :
import Speech from 'lmnt-node'
const speech = new Speech ( 'LMNT_API_KEY' )
Alternatively, you can set the LMNT_API_KEY
environment variable and omit the constructor argument.
fetchVoices
async fetchVoices(options={})
Returns the voices available for use in speech synthesis calls.
const voices = await speech . fetchVoices ()
Parameters
An optional object containing fields to update.
If true
, only return starred voices; Defaults to false
Specify which voices to return. Choose from system
, me
, or all
. Defaults to all
.
Return value
A list of voice metadata objects. Here’s a sample object:
[
{
"name" : "Morgan" ,
"id" : "morgan" ,
"state" : "ready" ,
"owner" : "system" ,
"starred" : false ,
"gender" : "F" ,
"description" : "UK. Young adult. Conversational"
}
]
fetchVoice
async fetchVoice(voice)
Returns the voice metadata for a single voice.
const voice = await speech . fetchVoice ( 'morgan' )
Parameters
The id of the voice to update. Voice ids can be retrieved from fetchVoices()
.
Return value
The voice metadata object. Here’s a sample object:
{
"name" : "Morgan" ,
"id" : "morgan" ,
"state" : "ready" ,
"owner" : "system" ,
"starred" : false ,
"gender" : "F" ,
"description" : "UK. Young adult. Conversational"
}
createVoice
async createVoice(name, enhance, filenames, options={})
Creates a new voice from a set of audio files. Returns the voice metadata object.
const filenames = [ 'file1.wav' , 'file2.wav' ]
const result = await speech . createVoice ( 'new-voice' , false , filenames )
Parameters
For unclean audio with background noise, applies processing to attempt to improve quality. Not on by default as it can also degrade quality in some circumstances.
A list of filenames to use for the voice.
The type of voice to create. Defaults to instant
.
The gender of the voice, e.g. male
, female
, nonbinary
. For categorization purposes. Defaults to None
.
A description of the voice. Defaults to None
.
Return value
The voice metadata object. Here’s a sample object:
{
"id" : "123444566422" ,
"name" : "new-voice" ,
"owner" : "me" ,
"state" : "ready" ,
"starred" : false ,
"description" : "Totam necessitatibus saepe repudiandae perferendis. Tempora iure provident. Consequatur debitis assumenda. Earum debitis cum." ,
"type" : "instant" ,
"gender" : "male"
}
updateVoice
async updateVoice(voice, options={})
Updates metadata for a specific voice. A voice that is not owned by you can only have its starred
field updated.
Only provided fields will be changed.
const options = { 'name' : 'new-voice-name' , 'starred' : true }
await speech . updateVoice ( '123444566422' , options )
Parameters
The id of the voice to update. If you don’t know the id, you can get it from list_voices()
.
The properties to update. Only provided fields will be changed.
Whether the voice is starred by you
The gender of the voice, e.g. male
, female
, nonbinary
. For categorization purposes.
A description of the voice.
Return value
The updated voice metadata object.
deleteVoice
async deleteVoice(voice)
Deletes a voice and cancels any pending operations on it. The voice must be owned by you. Cannot be undone.
await speech . deleteVoice ( '123444566422' )
Parameters
The id of the voice to delete. If you don’t know the id, you can get it from list_voices()
.
Return value
A success or error message. Here’s a sample object:
synthesize
async synthesize(text, voice, options={})
Synthesizes speech for a supplied text string.
const synthesis = await speech . synthesize ( 'Hello world!' , 'morgan' )
const audio = synthesis . audio
Parameters
Which voice to render; id is found using the list_voices
call.
Additional options for the synthesis request.
The model to use for synthesis. One of aurora
(default) or blizzard
. Learn more about models here .
aac
, mp3
, wav
; Defaults to mp3
(24kHz 16-bit mono).
The desired language of the synthesized speech. Two letter ISO 639-1 code.
Whether to include word durations detail in the response.
Whether to include the seed used for synthesis in the response.
The desired output sample rate in Hz, one of: 8000
, 16000
, 24000
; defaults to 24000
for all formats except mulaw
which defaults to 8000
.
The seed used to specify a different take; Defaults to a random value.
Return value
The synthesized audio encoded in the requested format as a Buffer object.
durations
array of duration objects
An array of text duration objects. Only returned if return_durations
is True
.
Each object describes the duration of a chunk of text (e.g., words, punctuation, and spaces) with the following keys:
The text for which timing information is being reported.
The time at which text
starts, in seconds from the start of the audio.
The duration of text
, in seconds.
The seed used for synthesis. Only returned if return_seed
is True
.
Here is the schema for the return value:
{
"audio" : binary - audio - file ,
"durations" : [
{
"text" : "string" ,
"start" : 0 ,
"duration" : 0
},
...
],
"seed" : "number"
}
Notes
The mp3
bitrate is 96kbps.
synthesizeStreaming
synthesizeStreaming(voice, options={})
Creates a new, full-duplex streaming session. You can use the returned
connection object to concurrently stream text content to the server and receive
speech data from the server.
Parameters
Which voice to render; id can be found using the fetchVoices
call.
Additional options for the streaming connection.
The desired output audio format. One of:
mp3
: 96kbps MP3 audio. This format is useful for applications that need to play the audio directly to the user.
raw
: 16-bit little-endian linear PCM audio. This format is useful for applications that need to process the audio further, such as adding effects or mixing multiple audio streams.
ulaw
: 8-bit G711 µ-law audio with a WAV header. This format is most useful for telephony applications.
The desired language of the synthesized speech. Two letter ISO 639-1 code. One of de
, en
, es
, fr
, pt
, zh
, ko
, hi
.
The desired output audio sample rate. One of:
24000
: 24kHz audio. This sample rate is useful for applications that need high-quality audio.
16000
: 16kHz audio. This sample rate is useful for applications that need to save bandwidth.
8000
: 8kHz audio. This sample rate is most useful for telephony applications and µ-law encoding.
Whether to return extra data (durations data and warnings) with each audio chunk.
Return value
A StreamingSynthesisConnection
instance, which you can use to stream data.