stt

Deepgram speech-to-text service for AWS SageMaker.

This module provides a Pipecat STT service that connects to Deepgram models deployed on AWS SageMaker endpoints. Uses HTTP/2 bidirectional streaming for low-latency real-time transcription with support for interim results, multiple languages, and various Deepgram features.

class pipecat.services.deepgram.sagemaker.stt.DeepgramSageMakerSTTSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, language: Language | str | None | _NotGiven = <factory>, detect_entities: bool | _NotGiven = <factory>, diarize: bool | _NotGiven = <factory>, dictation: bool | _NotGiven = <factory>, endpointing: Any | _NotGiven = <factory>, interim_results: bool | _NotGiven = <factory>, keyterm: Any | _NotGiven = <factory>, keywords: Any | _NotGiven = <factory>, numerals: bool | _NotGiven = <factory>, profanity_filter: bool | _NotGiven = <factory>, punctuate: bool | _NotGiven = <factory>, redact: Any | _NotGiven = <factory>, replace: Any | _NotGiven = <factory>, search: Any | _NotGiven = <factory>, smart_format: bool | _NotGiven = <factory>, utterance_end_ms: int | None | _NotGiven = <factory>)[source]

Bases: DeepgramSTTSettings

Settings for the Deepgram SageMaker STT service.

Inherits all fields from DeepgramSTTService.Settings.

class pipecat.services.deepgram.sagemaker.stt.DeepgramSageMakerSTTService(*, endpoint_name: str, region: str, encoding: str = 'linear16', channels: int = 1, multichannel: bool = False, sample_rate: int | None = None, mip_opt_out: bool | None = None, live_options: LiveOptions | None = None, settings: DeepgramSageMakerSTTSettings | None = None, ttfs_p99_latency: float | None = 0.35, **kwargs)[source]

Bases: STTService

Deepgram speech-to-text service for AWS SageMaker.

Provides real-time speech recognition using Deepgram models deployed on AWS SageMaker endpoints. Uses HTTP/2 bidirectional streaming for low-latency transcription with support for interim results, speaker diarization, and multiple languages.

Requirements:

Example:

stt = DeepgramSageMakerSTTService(
    endpoint_name="my-deepgram-endpoint",
    region="us-east-2",
    settings=DeepgramSageMakerSTTService.Settings(
        model="nova-3",
        language="en",
        interim_results=True,
        punctuate=True,
    ),
)
Settings

alias of DeepgramSageMakerSTTSettings

__init__(*, endpoint_name: str, region: str, encoding: str = 'linear16', channels: int = 1, multichannel: bool = False, sample_rate: int | None = None, mip_opt_out: bool | None = None, live_options: LiveOptions | None = None, settings: DeepgramSageMakerSTTSettings | None = None, ttfs_p99_latency: float | None = 0.35, **kwargs)[source]

Initialize the Deepgram SageMaker STT service.

Parameters:
  • endpoint_name – Name of the SageMaker endpoint with Deepgram model deployed (e.g., “my-deepgram-nova-3-endpoint”).

  • region – AWS region where the endpoint is deployed (e.g., “us-east-2”).

  • encoding – Audio encoding format. Defaults to “linear16”.

  • channels – Number of audio channels. Defaults to 1.

  • multichannel – Transcribe each audio channel independently. Defaults to False.

  • sample_rate – Audio sample rate in Hz. If None, uses the pipeline sample rate.

  • mip_opt_out – Opt out of Deepgram model improvement program.

  • live_options

    Legacy configuration options.

    Deprecated since version 0.0.105: Use settings=DeepgramSageMakerSTTService.Settings(...) for runtime-updatable fields and direct init parameters for connection-level config.

  • settings – Runtime-updatable settings. When provided alongside live_options, settings values take precedence (applied after the live_options merge).

  • ttfs_p99_latency – P99 latency from speech end to final transcript in seconds. Override for your deployment. See https://github.com/pipecat-ai/stt-benchmark

  • **kwargs – Additional arguments passed to the parent STTService.

can_generate_metrics() bool[source]

Check if this service can generate processing metrics.

Returns:

True, as Deepgram SageMaker service supports metrics generation.

async start(frame: StartFrame)[source]

Start the Deepgram SageMaker STT service.

Parameters:

frame – The start frame containing initialization parameters.

async stop(frame: EndFrame)[source]

Stop the Deepgram SageMaker STT service.

Parameters:

frame – The end frame.

async cancel(frame: CancelFrame)[source]

Cancel the Deepgram SageMaker STT service.

Parameters:

frame – The cancel frame.

async run_stt(audio: bytes) AsyncGenerator[Frame | None, None][source]

Send audio data to Deepgram for transcription.

Parameters:

audio – Raw audio bytes to transcribe.

Yields:

Frame – None (transcription results come via BiDi stream callbacks).

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process frames with Deepgram SageMaker-specific handling.

Parameters:
  • frame – The frame to process.

  • direction – The direction of frame processing.