Speech

Generate speech

speech.generate(body: SpeechGenerateParams): Promise<Response>
POST
/v1/ai/speech/bytes

Generates speech from text and streams the audio as binary data chunks in real-time as they are generated.

This is the recommended endpoint for most text-to-speech use cases. You can either stream the chunks for low-latency playback or collect all chunks to get the complete audio file.

Generate speech with timestamps

speech.generateDetailed(body: SpeechGenerateDetailedParams): Promise<SpeechGenerateDetailedResponse>
POST
/v1/ai/speech

Generates speech from text and returns a JSON object that contains a base64-encoded audio string and optionally word-level timestamps. This endpoint waits for the entire synthesis before responding, so it is not ideal for latency-sensitive applications.

Models


class DurationObject: …
class SpeechRequest: …
class StreamSpeechRequest: …