Create speech session

Stream text to our servers and receive synthesized speech in real-time. Great for latency-sensitive applications and situations where you don't have all the text upfront.

Parameters

voice

str

required

The voice ID to use for synthesis, obtained from 'List voices' API

format

Optional[Literal["mp3", "pcm_s16le", "pcm_f32le", "ulaw", "webm"]]

The desired output format of the audio.

language

Optional[Literal["auto", "ar", "de", "en", "es", "fr", "hi", "id", "it", "ja", "ko", "nl", "pl", "pt", "ru", "sv", "th", "tr", "uk", "ur", "vi", "zh"]]

The desired language. Two letter ISO 639-1 code. Defaults to auto language detection.

return_extras

Optional[bool]

Controls whether the server will return extra information about the synthesis

sample_rate

Optional[Literal[24000, 16000, 8000]]

The desired output audio sample rate

Returns

class SpeechSession: …

Send

append_text(text: str)

None

The text you send can be split at any point.

flush()

int

You will be notified when the server has finished synthesizing all the text that it has by the buffer_empty field in the extra information, if you have requested extras in your initMessage.

reset()

int

Reset the current text buffer.

finish()

None

Inform the server you're done appending text to this session and want it to close when the server has finished dispatching audio.

Receive

AudioMessage

Yielded for each binary audio frame received from the server.

ExtrasMessage

Note that the extra data JSON is always sent before the audio chunk that it corresponds to. Take care to interpret incoming data correctly. Audio is sent as bytes and extra data is sent as a string.

ErrorMessage

Error message returned by the server

CompleteMessage

Acknowledgement for flush/reset commands. Yielded after the server finishes synthesising audio for the corresponding command.