stt

Deepgram Flux speech-to-text service for AWS SageMaker (HTTP/2 BiDi transport).

class pipecat.services.deepgram.flux.sagemaker.stt.DeepgramFluxSageMakerSTTSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, language: Language | str | None | _NotGiven = <factory>, eager_eot_threshold: float | None | _NotGiven = <factory>, eot_threshold: float | None | _NotGiven = <factory>, eot_timeout_ms: int | None | _NotGiven = <factory>, keyterm: list | _NotGiven = <factory>, min_confidence: float | None | _NotGiven = <factory>, language_hints: list[Language] | None | _NotGiven = <factory>)[source]

Bases: DeepgramFluxSTTSettings

Settings for the Deepgram Flux SageMaker STT service.

Inherits all fields from DeepgramFluxSTTSettings.

class pipecat.services.deepgram.flux.sagemaker.stt.DeepgramFluxSageMakerSTTService(*, endpoint_name: str, region: str, encoding: str = 'linear16', sample_rate: int | None = None, mip_opt_out: bool | None = None, tag: list | None = None, should_interrupt: bool = True, settings: DeepgramFluxSageMakerSTTSettings | None = None, **kwargs)[source]

Bases: DeepgramFluxSTTBase

Deepgram Flux speech-to-text service for AWS SageMaker.

Provides real-time speech recognition using Deepgram Flux models deployed on AWS SageMaker endpoints. Uses HTTP/2 bidirectional streaming for low-latency transcription with advanced turn detection (StartOfTurn, EndOfTurn, EagerEndOfTurn, TurnResumed).

Unlike the Nova-based SageMaker STT service, Flux handles turn detection natively, so no external VAD is needed for turn boundaries. Use ExternalUserTurnStrategies in your pipeline.

Requirements:

  • AWS credentials configured (via environment variables, AWS CLI, or instance metadata)

  • A deployed SageMaker endpoint with Deepgram Flux model

Event handlers available:

  • on_connected: Called when the SageMaker session is established

  • on_disconnected: Called when the session is closed

  • on_connection_error: Called on connection failure

  • on_start_of_turn: Deepgram Flux detected start of speech

  • on_end_of_turn: Deepgram Flux detected end of turn

  • on_eager_end_of_turn: Deepgram Flux predicted end of turn

  • on_turn_resumed: User resumed speaking after EagerEndOfTurn

  • on_update: Interim transcript update during a turn

Example:

stt = DeepgramFluxSageMakerSTTService(
    endpoint_name="my-deepgram-flux-endpoint",
    region="us-east-2",
    settings=DeepgramFluxSageMakerSTTService.Settings(
        model="flux-general-en",
        eot_threshold=0.7,
        eager_eot_threshold=0.5,
    ),
)
Settings

alias of DeepgramFluxSageMakerSTTSettings

__init__(*, endpoint_name: str, region: str, encoding: str = 'linear16', sample_rate: int | None = None, mip_opt_out: bool | None = None, tag: list | None = None, should_interrupt: bool = True, settings: DeepgramFluxSageMakerSTTSettings | None = None, **kwargs)[source]

Initialize the Deepgram Flux SageMaker STT service.

Parameters:
  • endpoint_name – Name of the SageMaker endpoint with Deepgram Flux model deployed (e.g., “my-deepgram-flux-endpoint”).

  • region – AWS region where the endpoint is deployed (e.g., “us-east-2”).

  • encoding – Audio encoding format. Defaults to “linear16”.

  • sample_rate – Audio sample rate in Hz. If None, uses the pipeline sample rate.

  • mip_opt_out – Opt out of Deepgram model improvement program.

  • tag – Tags to label requests for identification during usage reporting.

  • should_interrupt – Whether to interrupt the bot when Flux detects that the user is speaking. Defaults to True.

  • settings – Runtime-updatable settings.

  • **kwargs – Additional arguments passed to the parent STTService.

async run_stt(audio: bytes) AsyncGenerator[Frame | None, None][source]

Send audio data to Deepgram Flux for transcription.

Parameters:

audio – Raw audio bytes to transcribe.

Yields:

Frame – None (transcription results come via BiDi stream callbacks).