stt

Deepgram speech-to-text service implementation.

class pipecat.services.deepgram.stt.LiveOptions(*, callback: str | None = None, callback_method: str | None = None, channels: int | None = None, detect_entities: bool | None = None, diarize: bool | None = None, dictation: bool | None = None, encoding: str | None = None, endpointing: Any | None = None, extra: Any | None = None, interim_results: bool | None = None, keyterm: Any | None = None, keywords: Any | None = None, language: str | None = None, mip_opt_out: bool | None = None, model: str | None = None, multichannel: bool | None = None, numerals: bool | None = None, profanity_filter: bool | None = None, punctuate: bool | None = None, redact: Any | None = None, replace: Any | None = None, sample_rate: int | None = None, search: Any | None = None, smart_format: bool | None = None, tag: Any | None = None, utterance_end_ms: int | None = None, version: str | None = None, **kwargs)[source]

Bases: object

Deepgram live transcription options.

Compatibility wrapper that mirrors the LiveOptions class removed in deepgram-sdk v6.

Deprecated since version 0.0.105: Use settings=DeepgramSTTService.Settings(...) for runtime-updatable fields and direct __init__ parameters for connection-level config instead.

__init__(*, callback: str | None = None, callback_method: str | None = None, channels: int | None = None, detect_entities: bool | None = None, diarize: bool | None = None, dictation: bool | None = None, encoding: str | None = None, endpointing: Any | None = None, extra: Any | None = None, interim_results: bool | None = None, keyterm: Any | None = None, keywords: Any | None = None, language: str | None = None, mip_opt_out: bool | None = None, model: str | None = None, multichannel: bool | None = None, numerals: bool | None = None, profanity_filter: bool | None = None, punctuate: bool | None = None, redact: Any | None = None, replace: Any | None = None, sample_rate: int | None = None, search: Any | None = None, smart_format: bool | None = None, tag: Any | None = None, utterance_end_ms: int | None = None, version: str | None = None, **kwargs)[source]

Initialize live transcription options.

Parameters:
  • callback – Callback URL for async transcription delivery.

  • callback_method – HTTP method to use for the callback ("GET" or "POST").

  • channels – Number of audio channels.

  • detect_entities – Enable named entity detection.

  • diarize – Enable speaker diarization.

  • dictation – Enable dictation mode (converts commands to punctuation).

  • encoding – Audio encoding (e.g. "linear16").

  • endpointing – Endpointing sensitivity in ms, or False to disable.

  • extra – Additional key-value metadata to attach to the transcription (str or list).

  • interim_results – Whether to emit interim transcriptions.

  • keyterm – Keyterms to boost (str or list of str).

  • keywords – Keywords to boost (str or list of str).

  • language – BCP-47 language tag (e.g. "en-US").

  • mip_opt_out – Opt out of model improvement program.

  • model – Deepgram model name (e.g. "nova-3-general").

  • multichannel – Enable per-channel transcription for multi-channel audio.

  • numerals – Convert spoken numbers to numerals.

  • profanity_filter – Filter profanity from transcripts.

  • punctuate – Add punctuation to transcripts.

  • redact – Redact sensitive information (str or list of redaction types).

  • replace – Word replacement rules (str or list).

  • sample_rate – Audio sample rate in Hz.

  • search – Search terms to highlight (str or list of str).

  • smart_format – Apply smart formatting to transcripts.

  • tag – Custom billing tag (str or list of str).

  • utterance_end_ms – Silence duration in ms before an utterance-end event.

  • version – Model version (e.g. "latest").

  • **kwargs – Any additional Deepgram query parameters.

to_dict() dict[source]

Return a dict of all non-None options.

class pipecat.services.deepgram.stt.DeepgramSTTSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, language: Language | str | None | _NotGiven = <factory>, detect_entities: bool | _NotGiven = <factory>, diarize: bool | _NotGiven = <factory>, dictation: bool | _NotGiven = <factory>, endpointing: Any | _NotGiven = <factory>, interim_results: bool | _NotGiven = <factory>, keyterm: Any | _NotGiven = <factory>, keywords: Any | _NotGiven = <factory>, numerals: bool | _NotGiven = <factory>, profanity_filter: bool | _NotGiven = <factory>, punctuate: bool | _NotGiven = <factory>, redact: Any | _NotGiven = <factory>, replace: Any | _NotGiven = <factory>, search: Any | _NotGiven = <factory>, smart_format: bool | _NotGiven = <factory>, utterance_end_ms: int | None | _NotGiven = <factory>)[source]

Bases: STTSettings

Settings for DeepgramSTTService.

model and language are inherited from STTSettings / ServiceSettings. Additional Deepgram connection params may be passed in through extra (also inherited).

Parameters:
  • detect_entities – Enable named entity detection.

  • diarize – Enable speaker diarization.

  • dictation – Enable dictation mode (converts commands to punctuation).

  • endpointing – Endpointing sensitivity in ms, or False to disable.

  • interim_results – Whether to emit interim transcriptions.

  • keyterm – Keyterms to boost (str or list of str).

  • keywords – Keywords to boost (str or list of str).

  • numerals – Convert spoken numbers to numerals.

  • profanity_filter – Filter profanity from transcripts.

  • punctuate – Add punctuation to transcripts.

  • redact – Redact sensitive information (str or list of redaction types).

  • replace – Word replacement rules (str or list).

  • search – Search terms to highlight (str or list of str).

  • smart_format – Apply smart formatting to transcripts.

  • utterance_end_ms – Silence duration in ms before an utterance-end event.

detect_entities: bool | _NotGiven
diarize: bool | _NotGiven
dictation: bool | _NotGiven
endpointing: Any | _NotGiven
interim_results: bool | _NotGiven
keyterm: Any | _NotGiven
keywords: Any | _NotGiven
numerals: bool | _NotGiven
profanity_filter: bool | _NotGiven
punctuate: bool | _NotGiven
redact: Any | _NotGiven
replace: Any | _NotGiven
search: Any | _NotGiven
smart_format: bool | _NotGiven
utterance_end_ms: int | None | _NotGiven
class pipecat.services.deepgram.stt.DeepgramSTTService(*, api_key: str, base_url: str = '', encoding: str = 'linear16', channels: int = 1, multichannel: bool = False, sample_rate: int | None = None, callback: str | None = None, callback_method: str | None = None, tag: Any | None = None, mip_opt_out: bool | None = None, live_options: LiveOptions | None = None, addons: dict | None = None, settings: DeepgramSTTSettings | None = None, ttfs_p99_latency: float | None = 0.35, **kwargs)[source]

Bases: STTService

Deepgram speech-to-text service.

Provides real-time speech recognition using Deepgram’s WebSocket API. Supports configurable models, languages, and various audio processing options.

Settings

alias of DeepgramSTTSettings

__init__(*, api_key: str, base_url: str = '', encoding: str = 'linear16', channels: int = 1, multichannel: bool = False, sample_rate: int | None = None, callback: str | None = None, callback_method: str | None = None, tag: Any | None = None, mip_opt_out: bool | None = None, live_options: LiveOptions | None = None, addons: dict | None = None, settings: DeepgramSTTSettings | None = None, ttfs_p99_latency: float | None = 0.35, **kwargs)[source]

Initialize the Deepgram STT service.

Parameters:
  • api_key – Deepgram API key for authentication.

  • base_url – Custom Deepgram API base URL.

  • encoding – Audio encoding format. Defaults to “linear16”.

  • channels – Number of audio channels. Defaults to 1.

  • multichannel – Transcribe each audio channel independently. Defaults to False.

  • sample_rate – Audio sample rate in Hz. If None, uses the pipeline sample rate.

  • callback – Callback URL for async transcription delivery.

  • callback_method – HTTP method for the callback ("GET" or "POST").

  • tag – Custom billing tag.

  • mip_opt_out – Opt out of Deepgram model improvement program.

  • live_options

    Legacy configuration options.

    Deprecated since version 0.0.105: Use settings=DeepgramSTTService.Settings(...) for runtime-updatable fields and direct init parameters for connection-level config.

  • addons – Additional Deepgram features to enable.

  • settings – Runtime-updatable settings. When provided alongside live_options, settings values take precedence (applied after the live_options merge).

  • ttfs_p99_latency – P99 latency from speech end to final transcript in seconds. Override for your deployment. See https://github.com/pipecat-ai/stt-benchmark

  • **kwargs – Additional arguments passed to the parent STTService.

can_generate_metrics() bool[source]

Check if this service can generate processing metrics.

Returns:

True, as Deepgram service supports metrics generation.

async start(frame: StartFrame)[source]

Start the Deepgram STT service.

Parameters:

frame – The start frame containing initialization parameters.

async stop(frame: EndFrame)[source]

Stop the Deepgram STT service.

Parameters:

frame – The end frame.

async cancel(frame: CancelFrame)[source]

Cancel the Deepgram STT service.

Parameters:

frame – The cancel frame.

async run_stt(audio: bytes) AsyncGenerator[Frame | None, None][source]

Send audio data to Deepgram for transcription.

Parameters:

audio – Raw audio bytes to transcribe.

Yields:

Frame – None (transcription results come via WebSocket callbacks).

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process frames with Deepgram-specific handling.

Parameters:
  • frame – The frame to process.

  • direction – The direction of frame processing.