WSS
/v1/ai/speech/streamSend
Receive
Receiving audio
Once the server has received enough text, the server will respond with chunks of 96kbps mono MP3 audio with a sampling rate of 24kHz. As more text is streamed to the server, the server will continue to send more audio chunks. The audio chunks are sent as binary data.
To produce the most natural-sounding speech, our API waits for roughly two full sentences to be sent before synthesizing audio. If you want to receive audio before this threshold is met, you can use the
flush field to force the server to synthesize the text it has.