Optimizing latency — LMNT Docs

Best practices

Make sure to use streaming mode

Get speech chunks as soon as they're generated. Your reader can write or play them immediately instead of waiting for the whole speech generation to finish first.

# Use with_streaming_response.generate to get speech as it's generated.
with client.speech.with_streaming_response.generate(
  text=(
    "Uhh, did you see the weather in Palo Alto tomorrow? "
    "Yeah, can't believe it's gonna rain, dude. Like what?"
  ),
  voice='leah',
) as response:
  response.stream_to_file('hello.mp3')

Use async tasks
Use asynchronous tasks so handling one chunk doesn't block receiving the next. For example, play or write each chunk in one task while the stream from LMNT continues feeding more in.
Colocate with our servers
Our primary servers are located in the United States. Although we support streaming worldwide, users in the U.S. are most likely to experience the lowest latency.
If you have any specific geographic constraints, reach out (hello@lmnt.com).
Specify the language
If you know what language the text is in, specify it in the language parameter. This will skip language detection and generate the speech faster.