tts

Deepgram text-to-speech service for AWS SageMaker.

This module provides a Pipecat TTS service that connects to Deepgram models deployed on AWS SageMaker endpoints. Uses HTTP/2 bidirectional streaming for low-latency real-time speech synthesis with support for interruptions and streaming audio output.

Bases: TTSSettings

Settings for DeepgramSageMakerTTSService.

class pipecat.services.deepgram.sagemaker.tts.DeepgramSageMakerTTSService(*, endpoint_name: str, region: str, voice: str | None = None, sample_rate: int | None = None, encoding: str = 'linear16', settings: DeepgramSageMakerTTSSettings | None = None, **kwargs)[source]

Bases: TTSService

Deepgram text-to-speech service for AWS SageMaker.

Provides real-time speech synthesis using Deepgram models deployed on AWS SageMaker endpoints. Uses HTTP/2 bidirectional streaming for low-latency audio generation with support for interruptions via the Clear message.

Requirements:

AWS credentials configured (via environment variables, AWS CLI, or instance metadata)
A deployed SageMaker endpoint with Deepgram TTS model: https://developers.deepgram.com/docs/deploy-amazon-sagemaker
pipecat-ai[sagemaker] installed

Example:

tts = DeepgramSageMakerTTSService(
    endpoint_name="my-deepgram-tts-endpoint",
    region="us-east-2",
    settings=DeepgramSageMakerTTSService.Settings(
        voice="aura-2-helena-en",
    )
)

Settings: alias of DeepgramSageMakerTTSSettings

__init__(*, endpoint_name: str, region: str, voice: str | None = None, sample_rate: int | None = None, encoding: str = 'linear16', settings: DeepgramSageMakerTTSSettings | None = None, **kwargs)[source]

Initialize the Deepgram SageMaker TTS service.

Parameters:

endpoint_name – Name of the SageMaker endpoint with Deepgram TTS model deployed (e.g., “my-deepgram-tts-endpoint”).
region – AWS region where the endpoint is deployed (e.g., “us-east-2”).
voice –
Voice model to use for synthesis. Defaults to “aura-2-helena-en”.

Deprecated since version 0.0.105: Use settings=DeepgramSageMakerTTSService.Settings(voice=...) instead.
sample_rate – Audio sample rate in Hz. If None, uses the value from StartFrame.
encoding – Audio encoding format. Defaults to “linear16”.
settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.
**kwargs – Additional arguments passed to the parent TTSService.

can_generate_metrics() → bool[source]

Check if this service can generate processing metrics.

Returns:: True, as Deepgram SageMaker TTS service supports metrics generation.

async start(frame: StartFrame)[source]

Start the Deepgram SageMaker TTS service.

Parameters:: frame – The start frame containing initialization parameters.

async stop(frame: EndFrame)[source]

Stop the Deepgram SageMaker TTS service.

Parameters:: frame – The end frame.

async cancel(frame: CancelFrame)[source]

Cancel the Deepgram SageMaker TTS service.

Parameters:: frame – The cancel frame.

async on_audio_context_interrupted(context_id: str)[source]

Called when an audio context is cancelled due to an interruption.

Parameters:: context_id – The ID of the audio context that was interrupted, or None if no context was active at the time.

async flush_audio(context_id: str | None = None)[source]

Flush any pending audio synthesis by sending Flush command.

This should be called when the LLM finishes a complete response to force generation of audio from Deepgram’s internal text buffer.

async run_tts(text: str, context_id: str) → AsyncGenerator[Frame | None, None][source]

Generate speech from text using Deepgram TTS on SageMaker.

Parameters:

text – The text to synthesize into speech.
context_id – The context ID for tracking audio frames.

Yields:

Frame – TTSStartedFrame, then None (audio comes asynchronously via the response processor).