Optimizing for low latency

Use streaming synthesis

Streaming synthesis via our WebSocket API is the fastest way to generate speech from text. It will return audio as soon as it is available, rather than wait for the entire audio to be generated. All of our client libraries support streaming synthesis as well.

Output `wav` format instead of `mp3`

The wav format is faster to generate than mp3. The default output for speech synthesis in our SDKs and REST API is mp3, so you will need to explicitly pass in format=wav in your request to use wav format.

On this page

Use streaming synthesis
Output wav format instead of mp3

Use streaming synthesis

Output `wav` format instead of `mp3`

On this page

Use streaming synthesis
Output wav format instead of mp3

​Use streaming synthesis

​Output wav format instead of mp3

​Use streaming synthesis

​Output wav format instead of mp3

Use streaming synthesis

Output `wav` format instead of `mp3`

Use streaming synthesis

Output `wav` format instead of `mp3`