Stream text to our servers and receive synthesized speech in real-time. Great for latency-sensitive applications and situations where you don't have all the text upfront.
Parameters
The voice ID to use for synthesis, obtained from 'List voices' API
The desired output format of the audio.
The desired language. Two letter ISO 639-1 code. Defaults to auto language detection.
Controls whether the server will return extra information about the synthesis
The desired output audio sample rate
Returns
Send
The text you send can be split at any point.
You will be notified when the server has finished synthesizing all the text that it has by the buffer_empty field in the extra information, if you have requested extras in your initMessage.
Reset the current text buffer.
Inform the server you're done appending text to this session and want it to close when the server has finished dispatching audio.
Receive
Yielded for each binary audio frame received from the server.
Note that the extra data JSON is always sent before the audio chunk that it corresponds to. Take care to interpret incoming data correctly. Audio is sent as bytes and extra data is sent as a string.
Error message returned by the server
Acknowledgement for flush/reset commands. Yielded after the server finishes synthesising audio for the corresponding command.