stt
Deepgram speech-to-text service implementation.
- class pipecat.services.deepgram.stt.LiveOptions(*, callback: str | None = None, callback_method: str | None = None, channels: int | None = None, detect_entities: bool | None = None, diarize: bool | None = None, dictation: bool | None = None, encoding: str | None = None, endpointing: Any | None = None, extra: Any | None = None, interim_results: bool | None = None, keyterm: Any | None = None, keywords: Any | None = None, language: str | None = None, mip_opt_out: bool | None = None, model: str | None = None, multichannel: bool | None = None, numerals: bool | None = None, profanity_filter: bool | None = None, punctuate: bool | None = None, redact: Any | None = None, replace: Any | None = None, sample_rate: int | None = None, search: Any | None = None, smart_format: bool | None = None, tag: Any | None = None, utterance_end_ms: int | None = None, version: str | None = None, **kwargs)[source]
Bases:
objectDeepgram live transcription options.
Compatibility wrapper that mirrors the
LiveOptionsclass removed in deepgram-sdk v6.Deprecated since version 0.0.105: Use
settings=DeepgramSTTService.Settings(...)for runtime-updatable fields and direct__init__parameters for connection-level config instead.- __init__(*, callback: str | None = None, callback_method: str | None = None, channels: int | None = None, detect_entities: bool | None = None, diarize: bool | None = None, dictation: bool | None = None, encoding: str | None = None, endpointing: Any | None = None, extra: Any | None = None, interim_results: bool | None = None, keyterm: Any | None = None, keywords: Any | None = None, language: str | None = None, mip_opt_out: bool | None = None, model: str | None = None, multichannel: bool | None = None, numerals: bool | None = None, profanity_filter: bool | None = None, punctuate: bool | None = None, redact: Any | None = None, replace: Any | None = None, sample_rate: int | None = None, search: Any | None = None, smart_format: bool | None = None, tag: Any | None = None, utterance_end_ms: int | None = None, version: str | None = None, **kwargs)[source]
Initialize live transcription options.
- Parameters:
callback – Callback URL for async transcription delivery.
callback_method – HTTP method to use for the callback (
"GET"or"POST").channels – Number of audio channels.
detect_entities – Enable named entity detection.
diarize – Enable speaker diarization.
dictation – Enable dictation mode (converts commands to punctuation).
encoding – Audio encoding (e.g.
"linear16").endpointing – Endpointing sensitivity in ms, or
Falseto disable.extra – Additional key-value metadata to attach to the transcription (str or list).
interim_results – Whether to emit interim transcriptions.
keyterm – Keyterms to boost (str or list of str).
keywords – Keywords to boost (str or list of str).
language – BCP-47 language tag (e.g.
"en-US").mip_opt_out – Opt out of model improvement program.
model – Deepgram model name (e.g.
"nova-3-general").multichannel – Enable per-channel transcription for multi-channel audio.
numerals – Convert spoken numbers to numerals.
profanity_filter – Filter profanity from transcripts.
punctuate – Add punctuation to transcripts.
redact – Redact sensitive information (str or list of redaction types).
replace – Word replacement rules (str or list).
sample_rate – Audio sample rate in Hz.
search – Search terms to highlight (str or list of str).
smart_format – Apply smart formatting to transcripts.
tag – Custom billing tag (str or list of str).
utterance_end_ms – Silence duration in ms before an utterance-end event.
version – Model version (e.g.
"latest").**kwargs – Any additional Deepgram query parameters.
- class pipecat.services.deepgram.stt.DeepgramSTTSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, language: Language | str | None | _NotGiven = <factory>, detect_entities: bool | _NotGiven = <factory>, diarize: bool | _NotGiven = <factory>, dictation: bool | _NotGiven = <factory>, endpointing: Any | _NotGiven = <factory>, interim_results: bool | _NotGiven = <factory>, keyterm: Any | _NotGiven = <factory>, keywords: Any | _NotGiven = <factory>, numerals: bool | _NotGiven = <factory>, profanity_filter: bool | _NotGiven = <factory>, punctuate: bool | _NotGiven = <factory>, redact: Any | _NotGiven = <factory>, replace: Any | _NotGiven = <factory>, search: Any | _NotGiven = <factory>, smart_format: bool | _NotGiven = <factory>, utterance_end_ms: int | None | _NotGiven = <factory>)[source]
Bases:
STTSettingsSettings for DeepgramSTTService.
modelandlanguageare inherited fromSTTSettings/ServiceSettings. Additional Deepgram connection params may be passed in throughextra(also inherited).- Parameters:
detect_entities – Enable named entity detection.
diarize – Enable speaker diarization.
dictation – Enable dictation mode (converts commands to punctuation).
endpointing – Endpointing sensitivity in ms, or
Falseto disable.interim_results – Whether to emit interim transcriptions.
keyterm – Keyterms to boost (str or list of str).
keywords – Keywords to boost (str or list of str).
numerals – Convert spoken numbers to numerals.
profanity_filter – Filter profanity from transcripts.
punctuate – Add punctuation to transcripts.
redact – Redact sensitive information (str or list of redaction types).
replace – Word replacement rules (str or list).
search – Search terms to highlight (str or list of str).
smart_format – Apply smart formatting to transcripts.
utterance_end_ms – Silence duration in ms before an utterance-end event.
- detect_entities: bool | _NotGiven
- diarize: bool | _NotGiven
- dictation: bool | _NotGiven
- endpointing: Any | _NotGiven
- interim_results: bool | _NotGiven
- keyterm: Any | _NotGiven
- keywords: Any | _NotGiven
- numerals: bool | _NotGiven
- profanity_filter: bool | _NotGiven
- punctuate: bool | _NotGiven
- redact: Any | _NotGiven
- replace: Any | _NotGiven
- search: Any | _NotGiven
- smart_format: bool | _NotGiven
- utterance_end_ms: int | None | _NotGiven
- class pipecat.services.deepgram.stt.DeepgramSTTService(*, api_key: str, base_url: str = '', encoding: str = 'linear16', channels: int = 1, multichannel: bool = False, sample_rate: int | None = None, callback: str | None = None, callback_method: str | None = None, tag: Any | None = None, mip_opt_out: bool | None = None, live_options: LiveOptions | None = None, addons: dict | None = None, settings: DeepgramSTTSettings | None = None, ttfs_p99_latency: float | None = 0.35, **kwargs)[source]
Bases:
STTServiceDeepgram speech-to-text service.
Provides real-time speech recognition using Deepgram’s WebSocket API. Supports configurable models, languages, and various audio processing options.
- Settings
alias of
DeepgramSTTSettings
- __init__(*, api_key: str, base_url: str = '', encoding: str = 'linear16', channels: int = 1, multichannel: bool = False, sample_rate: int | None = None, callback: str | None = None, callback_method: str | None = None, tag: Any | None = None, mip_opt_out: bool | None = None, live_options: LiveOptions | None = None, addons: dict | None = None, settings: DeepgramSTTSettings | None = None, ttfs_p99_latency: float | None = 0.35, **kwargs)[source]
Initialize the Deepgram STT service.
- Parameters:
api_key – Deepgram API key for authentication.
base_url – Custom Deepgram API base URL.
encoding – Audio encoding format. Defaults to “linear16”.
channels – Number of audio channels. Defaults to 1.
multichannel – Transcribe each audio channel independently. Defaults to False.
sample_rate – Audio sample rate in Hz. If None, uses the pipeline sample rate.
callback – Callback URL for async transcription delivery.
callback_method – HTTP method for the callback (
"GET"or"POST").tag – Custom billing tag.
mip_opt_out – Opt out of Deepgram model improvement program.
live_options –
Legacy configuration options.
Deprecated since version 0.0.105: Use
settings=DeepgramSTTService.Settings(...)for runtime-updatable fields and direct init parameters for connection-level config.addons – Additional Deepgram features to enable.
settings – Runtime-updatable settings. When provided alongside
live_options,settingsvalues take precedence (applied after thelive_optionsmerge).ttfs_p99_latency – P99 latency from speech end to final transcript in seconds. Override for your deployment. See https://github.com/pipecat-ai/stt-benchmark
**kwargs – Additional arguments passed to the parent STTService.
- can_generate_metrics() bool[source]
Check if this service can generate processing metrics.
- Returns:
True, as Deepgram service supports metrics generation.
- async start(frame: StartFrame)[source]
Start the Deepgram STT service.
- Parameters:
frame – The start frame containing initialization parameters.
- async stop(frame: EndFrame)[source]
Stop the Deepgram STT service.
- Parameters:
frame – The end frame.
- async cancel(frame: CancelFrame)[source]
Cancel the Deepgram STT service.
- Parameters:
frame – The cancel frame.
- async run_stt(audio: bytes) AsyncGenerator[Frame | None, None][source]
Send audio data to Deepgram for transcription.
- Parameters:
audio – Raw audio bytes to transcribe.
- Yields:
Frame – None (transcription results come via WebSocket callbacks).
- async process_frame(frame: Frame, direction: FrameDirection)[source]
Process frames with Deepgram-specific handling.
- Parameters:
frame – The frame to process.
direction – The direction of frame processing.