Synchronizing timing
Timing information can also be returned by the API to synchronize speech with other modalities (e.g., text, video) and is available for both standard and streaming requests. Each punctuation sequence and word is associated with a start time and duration (in seconds), which can be used to determine when the text is spoken and how long it lasts. This can be useful for creating captions, subtitles, or aligning speech with other media.
Standard Example
This code sample uses the return_durations
option to fetch timing information and print out its content.
Code:
Response:
Streaming example
The code below depicts how to fetch timing information with a streaming request. It is a simple example, and in practice, you would want to set up reader/writer tasks to handle the text input and synthesis output concurrently (see our Streaming example).
Input code:
The option to return timing information in streaming requests is called return_extras
. This option name is different from standard requests, where it’s called return_durations
.
Output response:
Was this page helpful?