stt

Deepgram Flux speech-to-text service implementation (WebSocket transport).

class pipecat.services.deepgram.flux.stt.DeepgramFluxSTTService(*, api_key: str, url: str = 'wss://api.deepgram.com/v2/listen', sample_rate: int | None = None, mip_opt_out: bool | None = None, model: str | None = None, flux_encoding: str = 'linear16', tag: list | None = None, params: InputParams | None = None, should_interrupt: bool = True, settings: DeepgramFluxSTTSettings | None = None, **kwargs)[source]

Bases: DeepgramFluxSTTBase, WebsocketService

Deepgram Flux speech-to-text service.

Provides real-time speech recognition using Deepgram’s WebSocket API with Flux capabilities. Supports configurable models, VAD events, and various audio processing options including advanced turn detection and EagerEndOfTurn events for improved conversational AI performance.

For multilingual use, set model="flux-general-multi" and pass language_hints to bias detection toward specific languages. Hints can be updated mid-stream via STTUpdateSettingsFrame (e.g. to implement a detect-then-lock flow). TranscriptionFrame.language reflects whichever language Flux detected for each turn.

Event handlers available (in addition to base events):

  • on_start_of_turn(service, transcript): Deepgram detected start of speech

  • on_end_of_turn(service, transcript): Deepgram detected end of turn (EOT)

  • on_eager_end_of_turn(service, transcript): Deepgram predicted end of turn (EagerEOT)

  • on_turn_resumed(service): User resumed speaking after EagerEOT

Example:

@stt.event_handler("on_end_of_turn")
async def on_end_of_turn(service, transcript):
    ...
Settings

alias of DeepgramFluxSTTSettings

class InputParams(*, eager_eot_threshold: float | None = None, eot_threshold: float | None = None, eot_timeout_ms: int | None = None, keyterm: list = [], mip_opt_out: bool | None = None, tag: list = [], min_confidence: float | None = None)[source]

Bases: BaseModel

Configuration parameters for Deepgram Flux API.

Deprecated since version 0.0.105: Use settings=DeepgramFluxSTTService.Settings(...) instead.

Parameters:
  • eager_eot_threshold – Optional. EagerEndOfTurn/TurnResumed are off by default. You can turn them on by setting eager_eot_threshold to a valid value. Lower values = more aggressive EagerEndOfTurning (faster response, more LLM calls). Higher values = more conservative EagerEndOfTurning (slower response, fewer LLM calls).

  • eot_threshold – Optional. End-of-turn confidence required to finish a turn (default 0.7). Lower values = turns end sooner (more interruptions, faster responses). Higher values = turns end later (fewer interruptions, more complete utterances).

  • eot_timeout_ms – Optional. Time in milliseconds after speech to finish a turn regardless of EOT confidence (default 5000).

  • keyterm – List of keyterms to boost recognition accuracy for specialized terminology.

  • mip_opt_out – Optional. Opts out requests from the Deepgram Model Improvement Program (default False).

  • tag – List of tags to label requests for identification during usage reporting.

  • min_confidence – Optional. Minimum confidence required confidence to create a TranscriptionFrame

eager_eot_threshold: float | None
eot_threshold: float | None
eot_timeout_ms: int | None
keyterm: list
mip_opt_out: bool | None
tag: list
min_confidence: float | None
__init__(*, api_key: str, url: str = 'wss://api.deepgram.com/v2/listen', sample_rate: int | None = None, mip_opt_out: bool | None = None, model: str | None = None, flux_encoding: str = 'linear16', tag: list | None = None, params: InputParams | None = None, should_interrupt: bool = True, settings: DeepgramFluxSTTSettings | None = None, **kwargs)[source]

Initialize the Deepgram Flux STT service.

Parameters:
  • api_key – Deepgram API key for authentication. Required for API access.

  • url – WebSocket URL for the Deepgram Flux API. Defaults to the preview endpoint.

  • sample_rate – Audio sample rate in Hz. If None, uses the pipeline sample rate.

  • mip_opt_out – Opt out of the Deepgram Model Improvement Program.

  • model

    Deepgram Flux model to use for transcription.

    Deprecated since version 0.0.105: Use settings=DeepgramFluxSTTService.Settings(model=...) instead.

  • flux_encoding – Audio encoding format required by Flux API. Must be “linear16”. Raw signed little-endian 16-bit PCM encoding.

  • tag – Tags to label requests for identification during usage reporting.

  • params

    InputParams instance containing detailed API configuration options.

    Deprecated since version 0.0.105: Use settings=DeepgramFluxSTTService.Settings(...) instead.

  • should_interrupt – Determine whether the bot should be interrupted when Flux detects that the user is speaking.

  • settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.

  • **kwargs – Additional arguments passed to the parent classes.

Examples

Basic usage with default parameters:

stt = DeepgramFluxSTTService(api_key="your-api-key")

Advanced usage with custom parameters:

stt = DeepgramFluxSTTService(
    api_key="your-api-key",
    settings=DeepgramFluxSTTService.Settings(
        model="flux-general-en",
        eager_eot_threshold=0.5,
        eot_threshold=0.8,
        keyterm=["AI", "machine learning", "neural network"],
        tag=["production", "voice-agent"],
    ),
)

Multilingual usage with language hints:

stt = DeepgramFluxSTTService(
    api_key="your-api-key",
    settings=DeepgramFluxSTTService.Settings(
        model="flux-general-multi",
        language_hints=[Language.EN, Language.ES],
    ),
)
async run_stt(audio: bytes) AsyncGenerator[Frame | None, None][source]

Send audio data to Deepgram Flux for transcription.

Transmits raw audio bytes to the Deepgram Flux API for real-time speech recognition. Transcription results are received asynchronously through WebSocket callbacks and processed in the background.

Parameters:

audio – Raw audio bytes in linear16 format (signed little-endian 16-bit PCM).

Yields:

Frame

None (transcription results are delivered via WebSocket callbacks

rather than as return values from this method).

Raises:

Exception – If the WebSocket connection is not established or if there are issues sending the audio data.

class pipecat.services.deepgram.flux.stt.DeepgramFluxSTTSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, language: Language | str | None | _NotGiven = <factory>, eager_eot_threshold: float | None | _NotGiven = <factory>, eot_threshold: float | None | _NotGiven = <factory>, eot_timeout_ms: int | None | _NotGiven = <factory>, keyterm: list | _NotGiven = <factory>, min_confidence: float | None | _NotGiven = <factory>, language_hints: list[Language] | None | _NotGiven = <factory>)[source]

Bases: STTSettings

Settings for DeepgramFluxSTTService.

Parameters:
  • eager_eot_threshold – EagerEndOfTurn/TurnResumed threshold. Off by default. Lower values = more aggressive (faster response, more LLM calls). Higher values = more conservative (slower response, fewer LLM calls).

  • eot_threshold – End-of-turn confidence required to finish a turn (default 0.7).

  • eot_timeout_ms – Time in ms after speech to finish a turn regardless of EOT confidence (default 5000).

  • keyterm – Keyterms to boost recognition accuracy for specialized terminology.

  • min_confidence – Minimum confidence required to create a TranscriptionFrame.

  • language_hints – Languages to bias transcription toward. Only honored by the flux-general-multi model. An empty list clears any active hints; None/NOT_GIVEN means no hints (auto-detect). Can be updated mid-stream via STTUpdateSettingsFrame.

eager_eot_threshold: float | None | _NotGiven
eot_threshold: float | None | _NotGiven
eot_timeout_ms: int | None | _NotGiven
keyterm: list | _NotGiven
min_confidence: float | None | _NotGiven
language_hints: list[Language] | None | _NotGiven
class pipecat.services.deepgram.flux.stt.FluxEventType(*values)[source]

Bases: StrEnum

Deepgram Flux TurnInfo event types.

These events are contained within TurnInfo messages and indicate different stages of speech processing and turn detection.

START_OF_TURN = 'StartOfTurn'
TURN_RESUMED = 'TurnResumed'
END_OF_TURN = 'EndOfTurn'
EAGER_END_OF_TURN = 'EagerEndOfTurn'
UPDATE = 'Update'
class pipecat.services.deepgram.flux.stt.FluxMessageType(*values)[source]

Bases: StrEnum

Deepgram Flux WebSocket message types.

These are the top-level message types that can be received from the Deepgram Flux WebSocket connection.

RECEIVE_CONNECTED = 'Connected'
RECEIVE_FATAL_ERROR = 'Error'
TURN_INFO = 'TurnInfo'
CONFIGURE_SUCCESS = 'ConfigureSuccess'
CONFIGURE_FAILURE = 'ConfigureFailure'