Latency is the time delay between a system receiving an input and producing an output (i.e., lower latency means faster responses). Here are some tips for optimizing the latency you experience when using our model:


Use an SDK

Use one of our SDKs to connect to the streaming API. Our SDKs are designed to handle the low-level details of the streaming API, and are optimized for low latency.


Use the real-time streaming API

See the example here (as opposed to the non-streaming API)


Use raw format

Use the raw format. It’s the fastest format we offer and returns 16-bit PCM (little-endian) audio at 24 kHz.


Use async tasks

Use asynchronous tasks to stream data concurrently. See the Streaming example above for a reference implementation.


Keep an open connection

If you want to synthesize speech in chunks (e.g., for a chatbot), keep the connection open and send the chunks as they become available.


Use servers in the U.S.

Our API and GPU servers are located in the United States. Thus, although we support streaming worldwide, users in the U.S. are most likely to experience lowest latency. If you have any specific geographic constraints, reach out (