tts

Resemble AI text-to-speech service implementations.

Bases: TTSSettings

Settings for ResembleAITTSService.

class pipecat.services.resembleai.tts.ResembleAITTSService(*, api_key: str, voice_id: str | None = None, url: str = 'wss://websocket.cluster.resemble.ai/stream', precision: str | None = 'PCM_16', output_format: str | None = 'wav', sample_rate: int | None = 22050, settings: ResembleAITTSSettings | None = None, **kwargs)[source]

Bases: WebsocketTTSService

Resemble AI TTS service with WebSocket streaming and word timestamps.

Provides text-to-speech using Resemble AI’s streaming WebSocket API. Supports word-level timestamps and audio context management for handling multiple simultaneous synthesis requests with proper interruption support.

Settings: alias of ResembleAITTSSettings

__init__(*, api_key: str, voice_id: str | None = None, url: str = 'wss://websocket.cluster.resemble.ai/stream', precision: str | None = 'PCM_16', output_format: str | None = 'wav', sample_rate: int | None = 22050, settings: ResembleAITTSSettings | None = None, **kwargs)[source]

Initialize the Resemble AI TTS service.

Parameters:

api_key – Resemble AI API key for authentication.
voice_id –
Voice UUID to use for synthesis.

Deprecated since version 0.0.105: Use settings=ResembleAITTSService.Settings(voice=...) instead.
url – WebSocket URL for Resemble AI TTS API.
precision – PCM bit depth (PCM_32, PCM_24, PCM_16, or MULAW).
output_format – Audio format (wav or mp3).
sample_rate – Audio sample rate (8000, 16000, 22050, 32000, or 44100). Defaults to 22050.
settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.
**kwargs – Additional arguments passed to the parent service.

can_generate_metrics() → bool[source]

Check if this service can generate processing metrics.

Returns:: True, as Resemble AI service supports metrics generation.

async start(frame: StartFrame)[source]

Start the Resemble AI TTS service.

Parameters:: frame – The start frame containing initialization parameters.

async stop(frame: EndFrame)[source]

Stop the Resemble AI TTS service.

Parameters:: frame – The end frame.

async cancel(frame: CancelFrame)[source]

Cancel the Resemble AI TTS service.

Parameters:: frame – The cancel frame.

async on_audio_context_interrupted(context_id: str)[source]: Stop metrics when the bot is interrupted.

async on_audio_context_completed(context_id: str)[source]

Stop metrics after the Resemble AI context finishes playing.

No close message is needed: Resemble AI signals completion with an audio_end message (handled in _process_messages), after which the server-side context is already closed.

async flush_audio(context_id: str | None = None)[source]: Flush any pending audio and finalize the current context.

async run_tts(text: str, context_id: str) → AsyncGenerator[Frame | None, None][source]

Generate speech from text using Resemble AI’s streaming API.

Parameters:

text – The text to synthesize into speech.
context_id – Unique identifier for this TTS context.

Yields:

Frame – Audio frames containing the synthesized speech.