Optimizing latency
Latency is the time delay between a system receiving an input and producing an output (i.e., lower latency means faster responses). Here are some tips for optimizing the latency you experience when using our model:
Use an SDK
Use one of our SDKs to connect to the streaming API. Our SDKs are designed to handle the low-level details of the streaming API, and are optimized for low latency.
Use the real-time streaming API
See the example here (as opposed to the non-streaming API)
Use raw format
Use the raw
format. It’s the fastest format we offer and returns 16-bit PCM (little-endian) audio at 24 kHz.
Use async tasks
Use asynchronous tasks to stream data concurrently. See the Streaming example above for a reference implementation.
Keep an open connection
If you want to synthesize speech in chunks (e.g., for a chatbot), keep the connection open and send the chunks as they become available.
Use servers in the U.S.
Our API and GPU servers are located in the United States. Thus, although we support streaming worldwide, users in the U.S. are most likely to experience lowest latency. If you have any specific geographic constraints, reach out (hello@lmnt.com)
Was this page helpful?