tts

Deepgram text-to-speech service implementation.

This module provides integration with Deepgram’s text-to-speech API for generating speech from text using various voice models.

class pipecat.services.deepgram.tts.DeepgramTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>)[source]

Bases: TTSSettings

Settings for DeepgramTTSService and DeepgramHttpTTSService.

class pipecat.services.deepgram.tts.DeepgramTTSService(*, api_key: str, voice: str | None = None, base_url: str = 'wss://api.deepgram.com', sample_rate: int | None = None, encoding: str = 'linear16', mip_opt_out: bool | None = None, settings: DeepgramTTSSettings | None = None, **kwargs)[source]

Bases: WebsocketTTSService

Deepgram WebSocket-based text-to-speech service.

Provides real-time text-to-speech synthesis using Deepgram’s WebSocket API. Supports streaming audio generation with interruption handling via the Clear message for conversational AI use cases.

Settings

alias of DeepgramTTSSettings

SUPPORTED_ENCODINGS = ('linear16', 'mulaw', 'alaw')
__init__(*, api_key: str, voice: str | None = None, base_url: str = 'wss://api.deepgram.com', sample_rate: int | None = None, encoding: str = 'linear16', mip_opt_out: bool | None = None, settings: DeepgramTTSSettings | None = None, **kwargs)[source]

Initialize the Deepgram WebSocket TTS service.

Parameters:
  • api_key – Deepgram API key for authentication.

  • voice

    Voice model to use for synthesis.

    Deprecated since version 0.0.105: Use settings=DeepgramTTSService.Settings(voice=...) instead.

  • base_url – WebSocket base URL for Deepgram API. Defaults to “wss://api.deepgram.com”.

  • sample_rate – Audio sample rate in Hz. If None, uses service default.

  • encoding – Audio encoding format. Defaults to “linear16”. Must be one of SUPPORTED_ENCODINGS.

  • mip_opt_out – Opt out of the Deepgram Model Improvement Program. See https://dpgr.am/deepgram-mip for pricing impacts before setting to True.

  • settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.

  • **kwargs – Additional arguments passed to parent InterruptibleTTSService class.

Raises:

ValueError – If encoding is not in SUPPORTED_ENCODINGS.

can_generate_metrics() bool[source]

Check if the service can generate metrics.

Returns:

True, as Deepgram WebSocket TTS service supports metrics generation.

async start(frame: StartFrame)[source]

Start the Deepgram WebSocket TTS service.

Parameters:

frame – The start frame containing initialization parameters.

async stop(frame: EndFrame)[source]

Stop the Deepgram WebSocket TTS service.

Parameters:

frame – The end frame.

async cancel(frame: CancelFrame)[source]

Cancel the Deepgram WebSocket TTS service.

Parameters:

frame – The cancel frame.

async on_audio_context_interrupted(context_id: str)[source]

Send Clear message to Deepgram when an audio context is interrupted.

The Clear message will clear Deepgram’s internal text buffer and stop sending audio, allowing for a new response to be generated.

Parameters:

context_id – The ID of the audio context that was interrupted.

async flush_audio(context_id: str | None = None)[source]

Flush any pending audio synthesis by sending Flush command.

This should be called when the LLM finishes a complete response to force generation of audio from Deepgram’s internal text buffer.

async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]

Generate speech from text using Deepgram’s WebSocket TTS API.

Parameters:
  • text – The text to synthesize into speech.

  • context_id – The context ID for tracking audio frames.

Yields:

Frame – Audio frames containing the synthesized speech, plus start/stop frames.

class pipecat.services.deepgram.tts.DeepgramHttpTTSService(*, api_key: str, voice: str | None = None, aiohttp_session: ClientSession, base_url: str = 'https://api.deepgram.com', sample_rate: int | None = None, encoding: str = 'linear16', mip_opt_out: bool | None = None, settings: DeepgramTTSSettings | None = None, **kwargs)[source]

Bases: TTSService

Deepgram HTTP text-to-speech service.

Provides text-to-speech synthesis using Deepgram’s HTTP TTS API. Supports various voice models and audio encoding formats with configurable sample rates and quality settings.

Settings

alias of DeepgramTTSSettings

__init__(*, api_key: str, voice: str | None = None, aiohttp_session: ClientSession, base_url: str = 'https://api.deepgram.com', sample_rate: int | None = None, encoding: str = 'linear16', mip_opt_out: bool | None = None, settings: DeepgramTTSSettings | None = None, **kwargs)[source]

Initialize the Deepgram TTS service.

Parameters:
  • api_key – Deepgram API key for authentication.

  • voice

    Voice model to use for synthesis.

    Deprecated since version 0.0.105: Use settings=DeepgramHttpTTSService.Settings(voice=...) instead.

  • aiohttp_session – Shared aiohttp session for HTTP requests with connection pooling.

  • base_url – Custom base URL for Deepgram API. Defaults to “https://api.deepgram.com”.

  • sample_rate – Audio sample rate in Hz. If None, uses service default.

  • encoding – Audio encoding format. Defaults to “linear16”.

  • mip_opt_out – Opt out of the Deepgram Model Improvement Program. See https://dpgr.am/deepgram-mip for pricing impacts before setting to True.

  • settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.

  • **kwargs – Additional arguments passed to parent TTSService class.

can_generate_metrics() bool[source]

Check if the service can generate metrics.

Returns:

True, as Deepgram TTS service supports metrics generation.

async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]

Generate speech from text using Deepgram’s TTS API.

Parameters:
  • text – The text to synthesize into speech.

  • context_id – The context ID for tracking audio frames.

Yields:

Frame – Audio frames containing the synthesized speech, plus start/stop frames.