Optimizing for low latency
How to generate speech from text as quickly as possible, under 300ms latency.
Use streaming synthesis
Streaming synthesis via our WebSocket API is the fastest way to generate speech from text. It will return audio as soon as it is available, rather than wait for the entire audio to be generated. All of our client libraries support streaming synthesis as well.
Output wav
format instead of mp3
The wav
format is faster to generate than mp3
. The default output for speech synthesis in our SDKs and REST API is
mp3
, so you will need to explicitly pass in format=wav
in your request to use wav
format.
Was this page helpful?