1
Handle audio chunk by chunk
Generating speech with the
POST /v1/ai/speech/bytes
endpoint returns a stream of audio chunks. You can handle these chunks as they become available, or you can wait for the entire stream to finish before processing the audio.2
3
Use an SDK
Our SDKs are designed to handle the low-level details of our API, and are optimized for low latency.
4
Use pcm_s16le or pcm_f32le
Use the
pcm_s16le
or pcm_f32le
format. It’s the fastest format we offer and returns 16-bit or 32-bit raw audio.5
Use async tasks
Use asynchronous tasks to stream data concurrently. See the speech session example above for a reference implementation.
6
Use servers in the U.S.
Our API and GPU servers are located in the United States. Thus, although we support streaming worldwide, users in the U.S. are most likely to experience lowest latency. If you have any specific geographic constraints, reach out (hello@lmnt.com)
7
Specify the language
If you know what language the text is in, specify it in the
language
parameter. This will skip language detection and generate the speech faster.