tts
Resemble AI text-to-speech service implementations.
- class pipecat.services.resembleai.tts.ResembleAITTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>)[source]
Bases:
TTSSettingsSettings for ResembleAITTSService.
- class pipecat.services.resembleai.tts.ResembleAITTSService(*, api_key: str, voice_id: str | None = None, url: str = 'wss://websocket.cluster.resemble.ai/stream', precision: str | None = 'PCM_16', output_format: str | None = 'wav', sample_rate: int | None = 22050, settings: ResembleAITTSSettings | None = None, **kwargs)[source]
Bases:
WebsocketTTSServiceResemble AI TTS service with WebSocket streaming and word timestamps.
Provides text-to-speech using Resemble AI’s streaming WebSocket API. Supports word-level timestamps and audio context management for handling multiple simultaneous synthesis requests with proper interruption support.
- Settings
alias of
ResembleAITTSSettings
- __init__(*, api_key: str, voice_id: str | None = None, url: str = 'wss://websocket.cluster.resemble.ai/stream', precision: str | None = 'PCM_16', output_format: str | None = 'wav', sample_rate: int | None = 22050, settings: ResembleAITTSSettings | None = None, **kwargs)[source]
Initialize the Resemble AI TTS service.
- Parameters:
api_key – Resemble AI API key for authentication.
voice_id –
Voice UUID to use for synthesis.
Deprecated since version 0.0.105: Use
settings=ResembleAITTSService.Settings(voice=...)instead.url – WebSocket URL for Resemble AI TTS API.
precision – PCM bit depth (PCM_32, PCM_24, PCM_16, or MULAW).
output_format – Audio format (wav or mp3).
sample_rate – Audio sample rate (8000, 16000, 22050, 32000, or 44100). Defaults to 22050.
settings – Runtime-updatable settings. When provided alongside deprecated parameters,
settingsvalues take precedence.**kwargs – Additional arguments passed to the parent service.
- can_generate_metrics() bool[source]
Check if this service can generate processing metrics.
- Returns:
True, as Resemble AI service supports metrics generation.
- async start(frame: StartFrame)[source]
Start the Resemble AI TTS service.
- Parameters:
frame – The start frame containing initialization parameters.
- async stop(frame: EndFrame)[source]
Stop the Resemble AI TTS service.
- Parameters:
frame – The end frame.
- async cancel(frame: CancelFrame)[source]
Cancel the Resemble AI TTS service.
- Parameters:
frame – The cancel frame.
- async on_audio_context_interrupted(context_id: str)[source]
Stop metrics when the bot is interrupted.
- async on_audio_context_completed(context_id: str)[source]
Stop metrics after the Resemble AI context finishes playing.
No close message is needed: Resemble AI signals completion with an
audio_endmessage (handled in_process_messages), after which the server-side context is already closed.
- async flush_audio(context_id: str | None = None)[source]
Flush any pending audio and finalize the current context.
- async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]
Generate speech from text using Resemble AI’s streaming API.
- Parameters:
text – The text to synthesize into speech.
context_id – Unique identifier for this TTS context.
- Yields:
Frame – Audio frames containing the synthesized speech.