Create speech session

WSS
/v1/ai/speech/stream

Stream text to our servers and receive synthesized speech in real-time. Great for latency-sensitive applications and situations where you don't have all the text upfront.

Init

initMessage
object

First message sent to server to establish session with configuration details

Send

textMessage
object

The text you send can be split at any point.

For example, sending This is a test of the emergency broadcast system is semantically equivalent to sending This is a test of the eme and rgency broadcast system separately.

flushCommand
object

You will be notified when the server has finished synthesizing all the text that it has by the buffer_empty field in the extra information, if you have requested extras in your initMessage.

Be careful when using flush. Our models are designed to factor in context when synthesizing audio. When flushing the buffer at arbitrary points, your speech may sound less natural.

eofCommand
object

Inform the server you're done appending text to this session and want it to close when the server has finished dispatching audio.

Receive

audio
binary

Binary audio data returned from the server

extras
object

Note that the extra data JSON is always sent before the audio chunk that it corresponds to. Take care to interpret incoming data correctly. Audio is sent as bytes and extra data is sent as a string.

error
object

Error message returned by the server