Stream text to our servers and receive generated speech in real-time. Great for latency-sensitive applications and situations where you don't have all the text upfront.
Parameters
The voice ID to use for speech generation, obtained from 'List voices' API
The desired output format of the audio.
Controls whether the server will return timestamps for the generated speech
The desired output audio sample rate
Returns
Send
Send text to the server to append into the text stream.
Force the server to generate speech for all buffered text in the stream.
Drop the server's buffered text without generating speech for it.
Inform the server you're done appending text to this session and want it to close when the server has finished dispatching speech.
Receive
First message sent by the server, confirming the session is established.
Binary audio data returned from the server.
Timestamps for the audio chunk that was just streamed, if requested in init.
Acknowledgement that a flush command has been completed.
The nonce matches the one carried by the original flush, allowing you to determine when it has completed.
Acknowledgement that a reset command has been completed.
The nonce matches the one carried by the original reset, allowing you to discard any remaining in-flight speech before the reset.
Error envelope returned by the server. Connection closes immediately afterward.