stt

Deepgram Flux speech-to-text service implementation (WebSocket transport).

class pipecat.services.deepgram.flux.stt.DeepgramFluxSTTService(*, api_key: str, url: str = 'wss://api.deepgram.com/v2/listen', sample_rate: int | None = None, mip_opt_out: bool | None = None, model: str | None = None, flux_encoding: str = 'linear16', tag: list | None = None, params: InputParams | None = None, should_interrupt: bool = True, settings: DeepgramFluxSTTSettings | None = None, **kwargs)[source]

Bases: DeepgramFluxSTTBase, WebsocketService

Deepgram Flux speech-to-text service.

Provides real-time speech recognition using Deepgram’s WebSocket API with Flux capabilities. Supports configurable models, VAD events, and various audio processing options including advanced turn detection and EagerEndOfTurn events for improved conversational AI performance.

For multilingual use, set model="flux-general-multi" and pass language_hints to bias detection toward specific languages. Hints can be updated mid-stream via STTUpdateSettingsFrame (e.g. to implement a detect-then-lock flow). TranscriptionFrame.language reflects whichever language Flux detected for each turn.

Event handlers available (in addition to base events):

on_start_of_turn(service, transcript): Deepgram detected start of speech
on_end_of_turn(service, transcript): Deepgram detected end of turn (EOT)
on_eager_end_of_turn(service, transcript): Deepgram predicted end of turn (EagerEOT)
on_turn_resumed(service): User resumed speaking after EagerEOT

Example:

@stt.event_handler("on_end_of_turn")
async def on_end_of_turn(service, transcript):
    ...

Settings: alias of DeepgramFluxSTTSettings

class InputParams(*, eager_eot_threshold: float | None = None, eot_threshold: float | None = None, eot_timeout_ms: int | None = None, keyterm: list = [], mip_opt_out: bool | None = None, tag: list = [], min_confidence: float | None = None)[source]

Bases: BaseModel

Configuration parameters for Deepgram Flux API.

Deprecated since version 0.0.105: Use settings=DeepgramFluxSTTService.Settings(...) instead.

Parameters:

eager_eot_threshold – Optional. EagerEndOfTurn/TurnResumed are off by default. You can turn them on by setting eager_eot_threshold to a valid value. Lower values = more aggressive EagerEndOfTurning (faster response, more LLM calls). Higher values = more conservative EagerEndOfTurning (slower response, fewer LLM calls).
eot_threshold – Optional. End-of-turn confidence required to finish a turn (default 0.7). Lower values = turns end sooner (more interruptions, faster responses). Higher values = turns end later (fewer interruptions, more complete utterances).
eot_timeout_ms – Optional. Time in milliseconds after speech to finish a turn regardless of EOT confidence (default 5000).
keyterm – List of keyterms to boost recognition accuracy for specialized terminology.
mip_opt_out – Optional. Opts out requests from the Deepgram Model Improvement Program (default False).
tag – List of tags to label requests for identification during usage reporting.
min_confidence – Optional. Minimum confidence required confidence to create a TranscriptionFrame

eager_eot_threshold: float | None

eot_threshold: float | None

eot_timeout_ms: int | None

keyterm: list

mip_opt_out: bool | None

tag: list

min_confidence: float | None

__init__(*, api_key: str, url: str = 'wss://api.deepgram.com/v2/listen', sample_rate: int | None = None, mip_opt_out: bool | None = None, model: str | None = None, flux_encoding: str = 'linear16', tag: list | None = None, params: InputParams | None = None, should_interrupt: bool = True, settings: DeepgramFluxSTTSettings | None = None, **kwargs)[source]

Initialize the Deepgram Flux STT service.

Parameters:

api_key – Deepgram API key for authentication. Required for API access.
url – WebSocket URL for the Deepgram Flux API. Defaults to the preview endpoint.
sample_rate – Audio sample rate in Hz. If None, uses the pipeline sample rate.
mip_opt_out – Opt out of the Deepgram Model Improvement Program.
model –
Deepgram Flux model to use for transcription.

Deprecated since version 0.0.105: Use settings=DeepgramFluxSTTService.Settings(model=...) instead.
flux_encoding – Audio encoding format required by Flux API. Must be “linear16”. Raw signed little-endian 16-bit PCM encoding.
tag – Tags to label requests for identification during usage reporting.
params –
InputParams instance containing detailed API configuration options.

Deprecated since version 0.0.105: Use settings=DeepgramFluxSTTService.Settings(...) instead.
should_interrupt – Determine whether the bot should be interrupted when Flux detects that the user is speaking.
settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.
**kwargs – Additional arguments passed to the parent classes.

Examples

Basic usage with default parameters:

stt = DeepgramFluxSTTService(api_key="your-api-key")

Advanced usage with custom parameters:

stt = DeepgramFluxSTTService(
    api_key="your-api-key",
    settings=DeepgramFluxSTTService.Settings(
        model="flux-general-en",
        eager_eot_threshold=0.5,
        eot_threshold=0.8,
        keyterm=["AI", "machine learning", "neural network"],
        tag=["production", "voice-agent"],
    ),
)

Multilingual usage with language hints:

stt = DeepgramFluxSTTService(
    api_key="your-api-key",
    settings=DeepgramFluxSTTService.Settings(
        model="flux-general-multi",
        language_hints=[Language.EN, Language.ES],
    ),
)

async run_stt(audio: bytes) → AsyncGenerator[Frame | None, None][source]

Send audio data to Deepgram Flux for transcription.

Transmits raw audio bytes to the Deepgram Flux API for real-time speech recognition. Transcription results are received asynchronously through WebSocket callbacks and processed in the background.

Parameters:

audio – Raw audio bytes in linear16 format (signed little-endian 16-bit PCM).

Yields:

Frame –

None (transcription results are delivered via WebSocket callbacks: rather than as return values from this method).

Raises:

Exception – If the WebSocket connection is not established or if there are issues sending the audio data.

Bases: STTSettings

Settings for DeepgramFluxSTTService.

Parameters:

eager_eot_threshold – EagerEndOfTurn/TurnResumed threshold. Off by default. Lower values = more aggressive (faster response, more LLM calls). Higher values = more conservative (slower response, fewer LLM calls).
eot_threshold – End-of-turn confidence required to finish a turn (default 0.7).
eot_timeout_ms – Time in ms after speech to finish a turn regardless of EOT confidence (default 5000).
keyterm – Keyterms to boost recognition accuracy for specialized terminology.
min_confidence – Minimum confidence required to create a TranscriptionFrame.
language_hints – Languages to bias transcription toward. Only honored by the flux-general-multi model. An empty list clears any active hints; None/NOT_GIVEN means no hints (auto-detect). Can be updated mid-stream via STTUpdateSettingsFrame.

eager_eot_threshold: float | None | _NotGiven

eot_threshold: float | None | _NotGiven

eot_timeout_ms: int | None | _NotGiven

keyterm: list | _NotGiven

min_confidence: float | None | _NotGiven

language_hints: list[Language] | None | _NotGiven

class pipecat.services.deepgram.flux.stt.FluxEventType(*values)[source]

Bases: StrEnum

Deepgram Flux TurnInfo event types.

These events are contained within TurnInfo messages and indicate different stages of speech processing and turn detection.

START_OF_TURN = 'StartOfTurn'

TURN_RESUMED = 'TurnResumed'

END_OF_TURN = 'EndOfTurn'

EAGER_END_OF_TURN = 'EagerEndOfTurn'

UPDATE = 'Update'

class pipecat.services.deepgram.flux.stt.FluxMessageType(*values)[source]

Bases: StrEnum

Deepgram Flux WebSocket message types.

These are the top-level message types that can be received from the Deepgram Flux WebSocket connection.

RECEIVE_CONNECTED = 'Connected'

RECEIVE_FATAL_ERROR = 'Error'

TURN_INFO = 'TurnInfo'

CONFIGURE_SUCCESS = 'ConfigureSuccess'

CONFIGURE_FAILURE = 'ConfigureFailure'