We’re going to use ChatGPT in this example to showcase our streaming API. This API is designed for real-time applications and is ideal for use-cases like chatbots, video game characters, and voice assistants.

Prerequisites

Make sure you’ve set up your environment as described in the Environment setup page. Since we’re using OpenAI, you’ll also need to have an OpenAI API key.

Overview

At a high level, the streaming API works like this:

  1. Create a streaming connection with the synthesize_streaming method (Python, Node).
  2. You send text to the server using appendText and concurrently read synthesized speech from the server (example shown below).
  3. The server buffers the text and synthesizes speech when it has enough.
  4. Repeat step 2 until you have no more text to send.
  5. Call flush or finish to signal to the server that it should synthesize speech for all of the text it still has buffered.
  6. Close the connection by calling close.

Concurrent streaming

We’ll use two tasks to handle the streaming data: one to read from ChatGPT and write to LMNT, and another to read from LMNT and write to a file. Both of these tasks are asynchronous and run concurrently.

import asyncio
from lmnt.api import Speech
from openai import AsyncOpenAI

DEFAULT_PROMPT = 'Read me the text of a short sci-fi story in the public domain.'
VOICE_ID = 'lily'

async def main():
  async with Speech() as speech:
    connection = await speech.synthesize_streaming(VOICE_ID)
    t1 = asyncio.create_task(reader_task(connection))
    t2 = asyncio.create_task(writer_task(connection))
    await asyncio.gather(t1, t2)


async def reader_task(connection):
  """Streams audio data from LMNT and writes it to `output.mp3`."""
  with open('output.mp3', 'wb') as f:
    async for message in connection:
      f.write(message['audio'])


async def writer_task(connection):
    """Streams text from ChatGPT to LMNT."""
    client = AsyncOpenAI()
    response = await client.chat.completions.create(
        model='gpt-3.5-turbo',
        messages=[{'role': 'user', 'content': DEFAULT_PROMPT}],
        stream=True)

    async for chunk in response:
        if (not chunk.choices[0] or
            not chunk.choices[0].delta or
            not chunk.choices[0].delta.content):
          continue
        content = chunk.choices[0].delta.content
        await connection.append_text(content)
        print(content, end='', flush=True)

    # After `finish` is called, the server will close the connection
    # when it has finished synthesizing.
    await connection.finish()


asyncio.run(main())

Calling flush

The server will buffer text you send via appendText and will start synthesizing speech when enough text has been received. Text will be segmented on the server at appropriate split points to produce natural-sounding speech. flush is used to signal to the server that it should start synthesizing speech with the text it has received so far.

There are typically two reasons to call flush:

  1. You want to control when the server synthesizes speech.
  2. You have no additional text to send, and you want to signal to the server that it should synthesize speech with the all text it has buffered.

The second case is most common in chatbot applications, where you want to synthesize speech for the bot and then wait for more text to arrive after user input.

Calling finish

The finish call is similar to flush, but it also signals to the server that it should close the connection after it has finished synthesizing speech. Calling finish is optional, and provides an elegant way to break out of the reader tasks’s loop.

Make sure you call either flush or finish at the end of your text stream to ensure the server synthesizes all the speech you expected.