stt
Deepgram speech-to-text service for AWS SageMaker.
This module provides a Pipecat STT service that connects to Deepgram models deployed on AWS SageMaker endpoints. Uses HTTP/2 bidirectional streaming for low-latency real-time transcription with support for interim results, multiple languages, and various Deepgram features.
- class pipecat.services.deepgram.sagemaker.stt.DeepgramSageMakerSTTSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, language: Language | str | None | _NotGiven = <factory>, detect_entities: bool | _NotGiven = <factory>, diarize: bool | _NotGiven = <factory>, dictation: bool | _NotGiven = <factory>, endpointing: Any | _NotGiven = <factory>, interim_results: bool | _NotGiven = <factory>, keyterm: Any | _NotGiven = <factory>, keywords: Any | _NotGiven = <factory>, numerals: bool | _NotGiven = <factory>, profanity_filter: bool | _NotGiven = <factory>, punctuate: bool | _NotGiven = <factory>, redact: Any | _NotGiven = <factory>, replace: Any | _NotGiven = <factory>, search: Any | _NotGiven = <factory>, smart_format: bool | _NotGiven = <factory>, utterance_end_ms: int | None | _NotGiven = <factory>)[source]
Bases:
DeepgramSTTSettingsSettings for the Deepgram SageMaker STT service.
Inherits all fields from
DeepgramSTTService.Settings.
- class pipecat.services.deepgram.sagemaker.stt.DeepgramSageMakerSTTService(*, endpoint_name: str, region: str, encoding: str = 'linear16', channels: int = 1, multichannel: bool = False, sample_rate: int | None = None, mip_opt_out: bool | None = None, live_options: LiveOptions | None = None, settings: DeepgramSageMakerSTTSettings | None = None, ttfs_p99_latency: float | None = 0.35, **kwargs)[source]
Bases:
STTServiceDeepgram speech-to-text service for AWS SageMaker.
Provides real-time speech recognition using Deepgram models deployed on AWS SageMaker endpoints. Uses HTTP/2 bidirectional streaming for low-latency transcription with support for interim results, speaker diarization, and multiple languages.
Requirements:
AWS credentials configured (via environment variables, AWS CLI, or instance metadata)
A deployed SageMaker endpoint with Deepgram model: https://developers.deepgram.com/docs/deploy-amazon-sagemaker
Example:
stt = DeepgramSageMakerSTTService( endpoint_name="my-deepgram-endpoint", region="us-east-2", settings=DeepgramSageMakerSTTService.Settings( model="nova-3", language="en", interim_results=True, punctuate=True, ), )
- Settings
alias of
DeepgramSageMakerSTTSettings
- __init__(*, endpoint_name: str, region: str, encoding: str = 'linear16', channels: int = 1, multichannel: bool = False, sample_rate: int | None = None, mip_opt_out: bool | None = None, live_options: LiveOptions | None = None, settings: DeepgramSageMakerSTTSettings | None = None, ttfs_p99_latency: float | None = 0.35, **kwargs)[source]
Initialize the Deepgram SageMaker STT service.
- Parameters:
endpoint_name – Name of the SageMaker endpoint with Deepgram model deployed (e.g., “my-deepgram-nova-3-endpoint”).
region – AWS region where the endpoint is deployed (e.g., “us-east-2”).
encoding – Audio encoding format. Defaults to “linear16”.
channels – Number of audio channels. Defaults to 1.
multichannel – Transcribe each audio channel independently. Defaults to False.
sample_rate – Audio sample rate in Hz. If None, uses the pipeline sample rate.
mip_opt_out – Opt out of Deepgram model improvement program.
live_options –
Legacy configuration options.
Deprecated since version 0.0.105: Use
settings=DeepgramSageMakerSTTService.Settings(...)for runtime-updatable fields and direct init parameters for connection-level config.settings – Runtime-updatable settings. When provided alongside
live_options,settingsvalues take precedence (applied after thelive_optionsmerge).ttfs_p99_latency – P99 latency from speech end to final transcript in seconds. Override for your deployment. See https://github.com/pipecat-ai/stt-benchmark
**kwargs – Additional arguments passed to the parent STTService.
- can_generate_metrics() bool[source]
Check if this service can generate processing metrics.
- Returns:
True, as Deepgram SageMaker service supports metrics generation.
- async start(frame: StartFrame)[source]
Start the Deepgram SageMaker STT service.
- Parameters:
frame – The start frame containing initialization parameters.
- async stop(frame: EndFrame)[source]
Stop the Deepgram SageMaker STT service.
- Parameters:
frame – The end frame.
- async cancel(frame: CancelFrame)[source]
Cancel the Deepgram SageMaker STT service.
- Parameters:
frame – The cancel frame.
- async run_stt(audio: bytes) AsyncGenerator[Frame | None, None][source]
Send audio data to Deepgram for transcription.
- Parameters:
audio – Raw audio bytes to transcribe.
- Yields:
Frame – None (transcription results come via BiDi stream callbacks).
- async process_frame(frame: Frame, direction: FrameDirection)[source]
Process frames with Deepgram SageMaker-specific handling.
- Parameters:
frame – The frame to process.
direction – The direction of frame processing.