Streaming example
We’re going to use ChatGPT in this example to showcase our streaming API. This API is designed for real-time applications and is ideal for use-cases like chatbots, video game characters, and voice assistants.
Prerequisites
Make sure you’ve set up your environment as described in the Environment setup page. Since we’re using OpenAI, you’ll also need to have an OpenAI API key.
Overview
At a high level, the streaming API works like this:
- Create a streaming connection with the
synthesize_streaming
method (Python, Node). - You send text to the server using
appendText
and concurrently read synthesized speech from the server (example shown below). - The server buffers the text and synthesizes speech when it has enough.
- Repeat step 2 until you have no more text to send.
- Call
flush
orfinish
to signal to the server that it should synthesize speech for all of the text it still has buffered. - Close the connection by calling
close
.
Concurrent streaming
We’ll use two tasks to handle the streaming data: one to read from ChatGPT and write to LMNT, and another to read from LMNT and write to a file. Both of these tasks are asynchronous and run concurrently.
Calling flush
The server will buffer text you send via appendText
and will start synthesizing speech when enough text has been received. Text will
be segmented on the server at appropriate split points to produce natural-sounding speech. flush
is used to signal to the server that
it should start synthesizing speech with the text it has received so far.
There are typically two reasons to call flush
:
- You want to control when the server synthesizes speech.
- You have no additional text to send, and you want to signal to the server that it should synthesize speech with the all text it has buffered.
The second case is most common in chatbot applications, where you want to synthesize speech for the bot and then wait for more text to arrive after user input.
Calling finish
The finish
call is similar to flush
, but it also signals to the server that it should close the connection after it has finished
synthesizing speech. Calling finish
is optional, and provides an elegant way to break out of the reader tasks’s loop.
Make sure you call either flush
or finish
at the end of your text stream to ensure the server synthesizes all the speech you expected.