Stream text to our servers and receive synthesized speech in real-time. Great for latency-sensitive applications and situations where you don't have all the text upfront.
Init
First message sent to server to establish session with configuration details
Send
The text you send can be split at any point.
For example, sending This is a test of the emergency broadcast system is semantically equivalent to sending This is a test of the eme and rgency broadcast system separately.
You will be notified when the server has finished synthesizing all the text that it has by the buffer_empty field in the extra information, if you have requested extras in your initMessage.
Be careful when using flush. Our models are designed to factor in context when synthesizing audio. When flushing the buffer at arbitrary points, your speech may sound less natural.
Inform the server you're done appending text to this session and want it to close when the server has finished dispatching audio.
Receive
Binary audio data returned from the server
Note that the extra data JSON is always sent before the audio chunk that it corresponds to. Take care to interpret incoming data correctly. Audio is sent as bytes and extra data is sent as a string.
Error message returned by the server