For unclean audio with background noise, applies processing to attempt to improve quality. Not on by default as it can also degrade quality in some circumstances.
filenames
string[]
required
A list of filenames to use for the voice.
options
object
type
string
The type of voice to create. Must be one of instant or professional. Defaults to instant.
gender
string
The gender of the voice, e.g. male, female, nonbinary. For categorization purposes. Defaults to None.
The voice metadata object. Here’s a sample object:
{"id":"123444566422","name":"new-voice","owner":"me","state":"ready","starred":false,"description":"Totam necessitatibus saepe repudiandae perferendis. Tempora iure provident. Consequatur debitis assumenda. Earum debitis cum.","type":"instant","gender":"male"}
Creates a new, full-duplex streaming session. You can use the returned
connection object to concurrently stream text content to the server and receive
speech data from the server.
Which voice to render; id can be found using the fetchVoices call.
options
object
Additional options for the streaming connection.
format
string
default: "mp3"
The desired output audio format. One of:
mp3: 96kbps MP3 audio. This format is useful for applications that need to play the audio directly to the user.
raw: 16-bit little-endian linear PCM audio. This format is useful for applications that need to process the audio further, such as adding effects or mixing multiple audio streams.
ulaw: 8-bit G711 µ-law audio with a WAV header. This format is most useful for telephony applications.
language
str
default: "en"
The desired language of the synthesized speech. Two letter ISO 639-1 code. One of de, en, es, fr, pt, zh, ko, hi.
sample_rate
number
default: "24000"
The desired output audio sample rate. One of:
24000: 24kHz audio. This sample rate is useful for applications that need high-quality audio.
16000: 16kHz audio. This sample rate is useful for applications that need to save bandwidth.
8000: 8kHz audio. This sample rate is most useful for telephony applications and µ-law encoding.
speed
number
default: "1.0"
The speed of the speech. Floating point value between 0.25 (slow) and 2.0 (fast).
return_extras
boolean
default: "false"
Whether to return extra data (durations data and warnings) with each audio chunk.