Building with LMNT

Optimizing latency

Tips for getting latency down as low as possible with LMNT's speech models

Best practices

  1. Make sure to use streaming mode

    Get speech chunks as soon as they're generated. Your reader can write or play them immediately instead of waiting for the whole speech generation to finish first.

    # Use with_streaming_response.generate to get speech as it's generated.
    with client.speech.with_streaming_response.generate(
        text='hello world.',
        voice='leah',
    ) as response:
        response.stream_to_file('hello.mp3')
  2. Use async tasks

    Use asynchronous tasks so handling one chunk doesn't block receiving the next. For example, play or write each chunk in one task while the stream from LMNT continues feeding more in.

  3. Colocate with our servers

    Our primary servers are located in the United States. Although we support streaming worldwide, users in the U.S. are most likely to experience the lowest latency.

    If you have any specific geographic constraints, reach out (hello@lmnt.com).

  4. Specify the language

    If you know what language the text is in, specify it in the language parameter. This will skip language detection and generate the speech faster.