stt

Deepgram Flux speech-to-text service for AWS SageMaker (HTTP/2 BiDi transport).

Bases: DeepgramFluxSTTSettings

Settings for the Deepgram Flux SageMaker STT service.

Inherits all fields from DeepgramFluxSTTSettings.

class pipecat.services.deepgram.flux.sagemaker.stt.DeepgramFluxSageMakerSTTService(*, endpoint_name: str, region: str, encoding: str = 'linear16', sample_rate: int | None = None, mip_opt_out: bool | None = None, tag: list | None = None, should_interrupt: bool = True, settings: DeepgramFluxSageMakerSTTSettings | None = None, **kwargs)[source]

Bases: DeepgramFluxSTTBase

Deepgram Flux speech-to-text service for AWS SageMaker.

Provides real-time speech recognition using Deepgram Flux models deployed on AWS SageMaker endpoints. Uses HTTP/2 bidirectional streaming for low-latency transcription with advanced turn detection (StartOfTurn, EndOfTurn, EagerEndOfTurn, TurnResumed).

Unlike the Nova-based SageMaker STT service, Flux handles turn detection natively, so no external VAD is needed for turn boundaries. Use ExternalUserTurnStrategies in your pipeline.

Requirements:

AWS credentials configured (via environment variables, AWS CLI, or instance metadata)
A deployed SageMaker endpoint with Deepgram Flux model

Event handlers available:

on_connected: Called when the SageMaker session is established
on_disconnected: Called when the session is closed
on_connection_error: Called on connection failure
on_start_of_turn: Deepgram Flux detected start of speech
on_end_of_turn: Deepgram Flux detected end of turn
on_eager_end_of_turn: Deepgram Flux predicted end of turn
on_turn_resumed: User resumed speaking after EagerEndOfTurn
on_update: Interim transcript update during a turn

Example:

stt = DeepgramFluxSageMakerSTTService(
    endpoint_name="my-deepgram-flux-endpoint",
    region="us-east-2",
    settings=DeepgramFluxSageMakerSTTService.Settings(
        model="flux-general-en",
        eot_threshold=0.7,
        eager_eot_threshold=0.5,
    ),
)

Settings: alias of DeepgramFluxSageMakerSTTSettings

__init__(*, endpoint_name: str, region: str, encoding: str = 'linear16', sample_rate: int | None = None, mip_opt_out: bool | None = None, tag: list | None = None, should_interrupt: bool = True, settings: DeepgramFluxSageMakerSTTSettings | None = None, **kwargs)[source]

Initialize the Deepgram Flux SageMaker STT service.

Parameters:

endpoint_name – Name of the SageMaker endpoint with Deepgram Flux model deployed (e.g., “my-deepgram-flux-endpoint”).
region – AWS region where the endpoint is deployed (e.g., “us-east-2”).
encoding – Audio encoding format. Defaults to “linear16”.
sample_rate – Audio sample rate in Hz. If None, uses the pipeline sample rate.
mip_opt_out – Opt out of Deepgram model improvement program.
tag – Tags to label requests for identification during usage reporting.
should_interrupt – Whether to interrupt the bot when Flux detects that the user is speaking. Defaults to True.
settings – Runtime-updatable settings.
**kwargs – Additional arguments passed to the parent STTService.

async run_stt(audio: bytes) → AsyncGenerator[Frame | None, None][source]

Send audio data to Deepgram Flux for transcription.

Parameters:: audio – Raw audio bytes to transcribe.
Yields:: Frame – None (transcription results come via BiDi stream callbacks).