tts

Resemble AI text-to-speech service implementations.

class pipecat.services.resembleai.tts.ResembleAITTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>)[source]

Bases: TTSSettings

Settings for ResembleAITTSService.

class pipecat.services.resembleai.tts.ResembleAITTSService(*, api_key: str, voice_id: str | None = None, url: str = 'wss://websocket.cluster.resemble.ai/stream', precision: str | None = 'PCM_16', output_format: str | None = 'wav', sample_rate: int | None = 22050, settings: ResembleAITTSSettings | None = None, **kwargs)[source]

Bases: WebsocketTTSService

Resemble AI TTS service with WebSocket streaming and word timestamps.

Provides text-to-speech using Resemble AI’s streaming WebSocket API. Supports word-level timestamps and audio context management for handling multiple simultaneous synthesis requests with proper interruption support.

Settings

alias of ResembleAITTSSettings

__init__(*, api_key: str, voice_id: str | None = None, url: str = 'wss://websocket.cluster.resemble.ai/stream', precision: str | None = 'PCM_16', output_format: str | None = 'wav', sample_rate: int | None = 22050, settings: ResembleAITTSSettings | None = None, **kwargs)[source]

Initialize the Resemble AI TTS service.

Parameters:
  • api_key – Resemble AI API key for authentication.

  • voice_id

    Voice UUID to use for synthesis.

    Deprecated since version 0.0.105: Use settings=ResembleAITTSService.Settings(voice=...) instead.

  • url – WebSocket URL for Resemble AI TTS API.

  • precision – PCM bit depth (PCM_32, PCM_24, PCM_16, or MULAW).

  • output_format – Audio format (wav or mp3).

  • sample_rate – Audio sample rate (8000, 16000, 22050, 32000, or 44100). Defaults to 22050.

  • settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.

  • **kwargs – Additional arguments passed to the parent service.

can_generate_metrics() bool[source]

Check if this service can generate processing metrics.

Returns:

True, as Resemble AI service supports metrics generation.

async start(frame: StartFrame)[source]

Start the Resemble AI TTS service.

Parameters:

frame – The start frame containing initialization parameters.

async stop(frame: EndFrame)[source]

Stop the Resemble AI TTS service.

Parameters:

frame – The end frame.

async cancel(frame: CancelFrame)[source]

Cancel the Resemble AI TTS service.

Parameters:

frame – The cancel frame.

async on_audio_context_interrupted(context_id: str)[source]

Stop metrics when the bot is interrupted.

async on_audio_context_completed(context_id: str)[source]

Stop metrics after the Resemble AI context finishes playing.

No close message is needed: Resemble AI signals completion with an audio_end message (handled in _process_messages), after which the server-side context is already closed.

async flush_audio(context_id: str | None = None)[source]

Flush any pending audio and finalize the current context.

async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]

Generate speech from text using Resemble AI’s streaming API.

Parameters:
  • text – The text to synthesize into speech.

  • context_id – Unique identifier for this TTS context.

Yields:

Frame – Audio frames containing the synthesized speech.