Latency is the time delay between a system receiving an input and producing an output (i.e., lower latency means faster responses). Here are some tips for optimizing the latency you experience when using our model:

1

Handle audio chunk by chunk

Generating speech returns a stream of audio chunks. You can handle these chunks as they become available, or you can wait for the entire stream to finish before processing the audio.

2

Use the real-time speech session API

See the example here

3

Use an SDK

Use one of our SDKs to create a speech session. Our SDKs are designed to handle the low-level details of the speech session API, and are optimized for low latency.

4

Use raw format

Use the raw format. It’s the fastest format we offer and returns 16-bit PCM (little-endian) audio at 24 kHz.

5

Use async tasks

Use asynchronous tasks to stream data concurrently. See the Speech session example above for a reference implementation.

6

Keep an open connection

If you want to synthesize speech in chunks (e.g., for a chatbot), keep the connection open and send the chunks as they become available.

7

Use servers in the U.S.

Our API and GPU servers are located in the United States. Thus, although we support streaming worldwide, users in the U.S. are most likely to experience lowest latency. If you have any specific geographic constraints, reach out (hello@lmnt.com)

Was this page helpful?