Connect speech session

Stream text to our servers and receive synthesized speech in real-time. Great for latency-sensitive applications and situations where you don't have all the text upfront.

WSS
/v1/ai/speech/stream

Send

Receive

Receiving audio

Once the server has received enough text, the server will respond with chunks of 96kbps mono MP3 audio with a sampling rate of 24kHz. As more text is streamed to the server, the server will continue to send more audio chunks. The audio chunks are sent as binary data.

To produce the most natural-sounding speech, our API waits for roughly two full sentences to be sent before synthesizing audio. If you want to receive audio before this threshold is met, you can use the flush field to force the server to synthesize the text it has.