stt

Deepgram speech-to-text service implementation.

Bases: object

Deepgram live transcription options.

Compatibility wrapper that mirrors the LiveOptions class removed in deepgram-sdk v6.

Deprecated since version 0.0.105: Use settings=DeepgramSTTService.Settings(...) for runtime-updatable fields and direct __init__ parameters for connection-level config instead.

Initialize live transcription options.

Parameters:

callback – Callback URL for async transcription delivery.
callback_method – HTTP method to use for the callback ("GET" or "POST").
channels – Number of audio channels.
detect_entities – Enable named entity detection.
diarize – Enable speaker diarization.
dictation – Enable dictation mode (converts commands to punctuation).
encoding – Audio encoding (e.g. "linear16").
endpointing – Endpointing sensitivity in ms, or False to disable.
extra – Additional key-value metadata to attach to the transcription (str or list).
interim_results – Whether to emit interim transcriptions.
keyterm – Keyterms to boost (str or list of str).
keywords – Keywords to boost (str or list of str).
language – BCP-47 language tag (e.g. "en-US").
mip_opt_out – Opt out of model improvement program.
model – Deepgram model name (e.g. "nova-3-general").
multichannel – Enable per-channel transcription for multi-channel audio.
numerals – Convert spoken numbers to numerals.
profanity_filter – Filter profanity from transcripts.
punctuate – Add punctuation to transcripts.
redact – Redact sensitive information (str or list of redaction types).
replace – Word replacement rules (str or list).
sample_rate – Audio sample rate in Hz.
search – Search terms to highlight (str or list of str).
smart_format – Apply smart formatting to transcripts.
tag – Custom billing tag (str or list of str).
utterance_end_ms – Silence duration in ms before an utterance-end event.
version – Model version (e.g. "latest").
**kwargs – Any additional Deepgram query parameters.

to_dict() → dict[source]: Return a dict of all non-None options.

Bases: STTSettings

Settings for DeepgramSTTService.

model and language are inherited from STTSettings / ServiceSettings. Additional Deepgram connection params may be passed in through extra (also inherited).

Parameters:

detect_entities – Enable named entity detection.
diarize – Enable speaker diarization.
dictation – Enable dictation mode (converts commands to punctuation).
endpointing – Endpointing sensitivity in ms, or False to disable.
interim_results – Whether to emit interim transcriptions.
keyterm – Keyterms to boost (str or list of str).
keywords – Keywords to boost (str or list of str).
numerals – Convert spoken numbers to numerals.
profanity_filter – Filter profanity from transcripts.
punctuate – Add punctuation to transcripts.
redact – Redact sensitive information (str or list of redaction types).
replace – Word replacement rules (str or list).
search – Search terms to highlight (str or list of str).
smart_format – Apply smart formatting to transcripts.
utterance_end_ms – Silence duration in ms before an utterance-end event.

detect_entities: bool | _NotGiven

diarize: bool | _NotGiven

dictation: bool | _NotGiven

endpointing: Any | _NotGiven

interim_results: bool | _NotGiven

keyterm: Any | _NotGiven

keywords: Any | _NotGiven

numerals: bool | _NotGiven

profanity_filter: bool | _NotGiven

punctuate: bool | _NotGiven

redact: Any | _NotGiven

replace: Any | _NotGiven

search: Any | _NotGiven

smart_format: bool | _NotGiven

utterance_end_ms: int | None | _NotGiven

class pipecat.services.deepgram.stt.DeepgramSTTService(*, api_key: str, base_url: str = '', encoding: str = 'linear16', channels: int = 1, multichannel: bool = False, sample_rate: int | None = None, callback: str | None = None, callback_method: str | None = None, tag: Any | None = None, mip_opt_out: bool | None = None, live_options: LiveOptions | None = None, addons: dict | None = None, settings: DeepgramSTTSettings | None = None, ttfs_p99_latency: float | None = 0.35, **kwargs)[source]

Bases: STTService

Deepgram speech-to-text service.

Provides real-time speech recognition using Deepgram’s WebSocket API. Supports configurable models, languages, and various audio processing options.

Settings: alias of DeepgramSTTSettings

__init__(*, api_key: str, base_url: str = '', encoding: str = 'linear16', channels: int = 1, multichannel: bool = False, sample_rate: int | None = None, callback: str | None = None, callback_method: str | None = None, tag: Any | None = None, mip_opt_out: bool | None = None, live_options: LiveOptions | None = None, addons: dict | None = None, settings: DeepgramSTTSettings | None = None, ttfs_p99_latency: float | None = 0.35, **kwargs)[source]

Initialize the Deepgram STT service.

Parameters:

api_key – Deepgram API key for authentication.
base_url – Custom Deepgram API base URL.
encoding – Audio encoding format. Defaults to “linear16”.
channels – Number of audio channels. Defaults to 1.
multichannel – Transcribe each audio channel independently. Defaults to False.
sample_rate – Audio sample rate in Hz. If None, uses the pipeline sample rate.
callback – Callback URL for async transcription delivery.
callback_method – HTTP method for the callback ("GET" or "POST").
tag – Custom billing tag.
mip_opt_out – Opt out of Deepgram model improvement program.
live_options –
Legacy configuration options.

Deprecated since version 0.0.105: Use settings=DeepgramSTTService.Settings(...) for runtime-updatable fields and direct init parameters for connection-level config.
addons – Additional Deepgram features to enable.
settings – Runtime-updatable settings. When provided alongside live_options, settings values take precedence (applied after the live_options merge).
ttfs_p99_latency – P99 latency from speech end to final transcript in seconds. Override for your deployment. See https://github.com/pipecat-ai/stt-benchmark
**kwargs – Additional arguments passed to the parent STTService.

can_generate_metrics() → bool[source]

Check if this service can generate processing metrics.

Returns:: True, as Deepgram service supports metrics generation.

async start(frame: StartFrame)[source]

Start the Deepgram STT service.

Parameters:: frame – The start frame containing initialization parameters.

async stop(frame: EndFrame)[source]

Stop the Deepgram STT service.

Parameters:: frame – The end frame.

async cancel(frame: CancelFrame)[source]

Cancel the Deepgram STT service.

Parameters:: frame – The cancel frame.

async run_stt(audio: bytes) → AsyncGenerator[Frame | None, None][source]

Send audio data to Deepgram for transcription.

Parameters:: audio – Raw audio bytes to transcribe.
Yields:: Frame – None (transcription results come via WebSocket callbacks).

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process frames with Deepgram-specific handling.

Parameters:

frame – The frame to process.
direction – The direction of frame processing.