List voices
.mp3
: 96kbps MP3 audio. This format is useful for applications that need to play the audio directly to the user.raw
: 16-bit little-endian linear PCM audio. This format is useful for applications that need to process the audio further, such as adding effects or mixing multiple audio streams.ulaw
: 8-bit G711 µ-law audio with a WAV header. This format is most useful for telephony applications.webm
: WebM format with Opus audio codec. This format is ideal for web browsers and modern streaming applications that prioritize efficient bandwidth usage.24000
: 24kHz audio. This sample rate is useful for applications that need high-quality audio.16000
: 16kHz audio. This sample rate is useful for applications that need to save bandwidth.8000
: 8kHz audio. This sample rate is most useful for telephony applications and µ-law encoding.durations
, buffer_empty
, and warnings
. See the Receiving Extras section for more information.flush
field set to true
:
buffer_empty
field in the extra information. See the Receiving Extras section for more information.
flush
. Our models are designed to factor in context when synthesizing audio. When flushing the buffer at arbitrary points, your speech may sound less natural.eof
field set to true
. This will cause the server to synthesize all the text it has and then close the connection.
flush
field to force the server to synthesize the text it has.return_extras
field to true
in the first message, the server will also send extra information about each synthesized chunk. This information is sent as a serialized JSON object (string) and will be sent before its corresponding audio chunk. The extra information includes:
bytes
and extra data is sent as a string
.error
before closing the connection.