Optimizing latency
Latency is the time delay between a system receiving an input and producing an output (i.e., lower latency means faster responses). Here are some tips for optimizing the latency you experience when using our model:
Handle audio chunk by chunk
Generating speech returns a stream of audio chunks. You can handle these chunks as they become available, or you can wait for the entire stream to finish before processing the audio.
Use the real-time speech session API
See the example here
Use an SDK
Use one of our SDKs to create a speech session. Our SDKs are designed to handle the low-level details of the speech session API, and are optimized for low latency.
Use raw format
Use the raw
format. It’s the fastest format we offer and returns 16-bit PCM (little-endian) audio at 24 kHz.
Use async tasks
Use asynchronous tasks to stream data concurrently. See the Speech session example above for a reference implementation.
Keep an open connection
If you want to synthesize speech in chunks (e.g., for a chatbot), keep the connection open and send the chunks as they become available.
Use servers in the U.S.
Our API and GPU servers are located in the United States. Thus, although we support streaming worldwide, users in the U.S. are most likely to experience lowest latency. If you have any specific geographic constraints, reach out (hello@lmnt.com)
Was this page helpful?