Stream text to our servers and receive generated speech in real-time. Great for latency-sensitive applications and situations where you don't have all the text upfront.
Init
First message sent to server to establish session with configuration details.
Send
Send text to the server to append into the text stream.
The text you send can be split at any point. For example, sending This is a test of the emergency broadcast system is semantically equivalent to sending This is a test of the eme and rgency broadcast system separately.
Force the server to generate speech for all buffered text in the stream.
The server replies with a flush_complete carrying a matching nonce once it has finished streaming the flushed audio.
Be careful when using flush. Our models are designed to factor in context when generating speech. When flushing the buffer at arbitrary points, your speech may sound less natural.
Drop the server's buffered text without generating speech for it.
The server replies with a reset_complete carrying the matching nonce once the buffer has been cleared.
You do not need to wait for reset_complete to begin sending more text.
Inform the server you're done appending text to this session and want it to close when the server has finished dispatching speech.
Receive
First message sent by the server, confirming the session is established.
Binary audio data returned from the server.
Timestamps for the audio chunk that was just streamed, if requested in init.
Acknowledgement that a flush command has been completed.
The nonce matches the one carried by the original flush, allowing you to determine when it has completed.
Acknowledgement that a reset command has been completed.
The nonce matches the one carried by the original reset, allowing you to discard any remaining in-flight speech before the reset.
Error envelope returned by the server. Connection closes immediately afterward.