Create speech session

Stream text to our servers and receive synthesized speech in real-time. Great for latency-sensitive applications and situations where you don't have all the text upfront.

Parameters

voice

string

required

The voice ID to use for synthesis, obtained from 'List voices' API

format

'mp3' | 'pcm_s16le' | 'pcm_f32le' | 'ulaw' | 'webm'

The desired output format of the audio.

language

'auto' | 'ar' | 'de' | 'en' | 'es' | 'fr' | 'hi' | 'id' | 'it' | 'ja' | 'ko' | 'nl' | 'pl' | 'pt' | 'ru' | 'sv' | 'th' | 'tr' | 'uk' | 'ur' | 'vi' | 'zh'

The desired language. Two letter ISO 639-1 code. Defaults to auto language detection.

return_extras

boolean

Controls whether the server will return extra information about the synthesis

sample_rate

24000 | 16000 | 8000

The desired output audio sample rate

Returns

class SpeechSession: …

Send

appendText(text: string)

void

The text you send can be split at any point.

flush()

number

You will be notified when the server has finished synthesizing all the text that it has by the buffer_empty field in the extra information, if you have requested extras in your initMessage.

reset()

number

Reset the current text buffer.

finish()

void

Inform the server you're done appending text to this session and want it to close when the server has finished dispatching audio.

Receive

AudioMessage

Yielded for each binary audio frame received from the server.

ExtrasMessage

Note that the extra data JSON is always sent before the audio chunk that it corresponds to. Take care to interpret incoming data correctly. Audio is sent as bytes and extra data is sent as a string.

ErrorMessage

Error message returned by the server

CompleteMessage

Acknowledgement for flush/reset commands. Yielded after the server finishes synthesising audio for the corresponding command.