tts
XTTS text-to-speech service implementation.
This module provides integration with Coqui XTTS streaming server for text-to-speech synthesis using local Docker deployment.
- pipecat.services.xtts.tts.language_to_xtts_language(language: Language) str | None[source]
Convert a Language enum to XTTS language code.
- Parameters:
language – The Language enum value to convert.
- Returns:
The corresponding XTTS language code, or None if not supported.
- class pipecat.services.xtts.tts.XTTSTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>)[source]
Bases:
TTSSettingsSettings for XTTSService.
- class pipecat.services.xtts.tts.XTTSService(*, voice_id: str | None = None, base_url: str, aiohttp_session: ClientSession, language: Language = Language.EN, sample_rate: int | None = None, settings: XTTSTTSSettings | None = None, **kwargs)[source]
Bases:
TTSServiceCoqui XTTS text-to-speech service.
Provides text-to-speech synthesis using a locally running Coqui XTTS streaming server. Supports multiple languages and voice cloning through studio speakers configuration.
- Settings
alias of
XTTSTTSSettings
- __init__(*, voice_id: str | None = None, base_url: str, aiohttp_session: ClientSession, language: Language = Language.EN, sample_rate: int | None = None, settings: XTTSTTSSettings | None = None, **kwargs)[source]
Initialize the XTTS service.
- Parameters:
voice_id –
ID of the voice/speaker to use for synthesis.
Deprecated since version 0.0.105: Use
settings=XTTSService.Settings(voice=...)instead.base_url – Base URL of the XTTS streaming server.
aiohttp_session – HTTP session for making requests to the server.
language –
Language for synthesis. Defaults to English.
Deprecated since version 0.0.106: Use
settings=XTTSService.Settings(language=...)instead.sample_rate – Audio sample rate. If None, uses default.
settings – Runtime-updatable settings. When provided alongside deprecated parameters,
settingsvalues take precedence.**kwargs – Additional arguments passed to parent TTSService.
- can_generate_metrics() bool[source]
Check if this service can generate processing metrics.
- Returns:
True, as XTTS service supports metrics generation.
- language_to_service_language(language: Language) str | None[source]
Convert a Language enum to XTTS service language format.
- Parameters:
language – The language to convert.
- Returns:
The XTTS-specific language code, or None if not supported.
- async start(frame: StartFrame)[source]
Start the XTTS service and load studio speakers.
- Parameters:
frame – The start frame containing initialization parameters.
- async run_tts(text: str, context_id: str) AsyncGenerator[Frame, None][source]
Generate speech from text using XTTS streaming server.
- Parameters:
text – The text to synthesize into speech.
context_id – The context ID for tracking audio frames.
- Yields:
Frame – Audio frames containing the synthesized speech.