stt
Deepgram Flux speech-to-text service implementation (WebSocket transport).
- class pipecat.services.deepgram.flux.stt.DeepgramFluxSTTService(*, api_key: str, url: str = 'wss://api.deepgram.com/v2/listen', sample_rate: int | None = None, mip_opt_out: bool | None = None, model: str | None = None, flux_encoding: str = 'linear16', tag: list | None = None, params: InputParams | None = None, should_interrupt: bool = True, settings: DeepgramFluxSTTSettings | None = None, **kwargs)[source]
Bases:
DeepgramFluxSTTBase,WebsocketServiceDeepgram Flux speech-to-text service.
Provides real-time speech recognition using Deepgram’s WebSocket API with Flux capabilities. Supports configurable models, VAD events, and various audio processing options including advanced turn detection and EagerEndOfTurn events for improved conversational AI performance.
For multilingual use, set
model="flux-general-multi"and passlanguage_hintsto bias detection toward specific languages. Hints can be updated mid-stream viaSTTUpdateSettingsFrame(e.g. to implement a detect-then-lock flow).TranscriptionFrame.languagereflects whichever language Flux detected for each turn.Event handlers available (in addition to base events):
on_start_of_turn(service, transcript): Deepgram detected start of speech
on_end_of_turn(service, transcript): Deepgram detected end of turn (EOT)
on_eager_end_of_turn(service, transcript): Deepgram predicted end of turn (EagerEOT)
on_turn_resumed(service): User resumed speaking after EagerEOT
Example:
@stt.event_handler("on_end_of_turn") async def on_end_of_turn(service, transcript): ...
- Settings
alias of
DeepgramFluxSTTSettings
- class InputParams(*, eager_eot_threshold: float | None = None, eot_threshold: float | None = None, eot_timeout_ms: int | None = None, keyterm: list = [], mip_opt_out: bool | None = None, tag: list = [], min_confidence: float | None = None)[source]
Bases:
BaseModelConfiguration parameters for Deepgram Flux API.
Deprecated since version 0.0.105: Use
settings=DeepgramFluxSTTService.Settings(...)instead.- Parameters:
eager_eot_threshold – Optional. EagerEndOfTurn/TurnResumed are off by default. You can turn them on by setting eager_eot_threshold to a valid value. Lower values = more aggressive EagerEndOfTurning (faster response, more LLM calls). Higher values = more conservative EagerEndOfTurning (slower response, fewer LLM calls).
eot_threshold – Optional. End-of-turn confidence required to finish a turn (default 0.7). Lower values = turns end sooner (more interruptions, faster responses). Higher values = turns end later (fewer interruptions, more complete utterances).
eot_timeout_ms – Optional. Time in milliseconds after speech to finish a turn regardless of EOT confidence (default 5000).
keyterm – List of keyterms to boost recognition accuracy for specialized terminology.
mip_opt_out – Optional. Opts out requests from the Deepgram Model Improvement Program (default False).
tag – List of tags to label requests for identification during usage reporting.
min_confidence – Optional. Minimum confidence required confidence to create a TranscriptionFrame
- eager_eot_threshold: float | None
- eot_threshold: float | None
- eot_timeout_ms: int | None
- keyterm: list
- mip_opt_out: bool | None
- tag: list
- min_confidence: float | None
- __init__(*, api_key: str, url: str = 'wss://api.deepgram.com/v2/listen', sample_rate: int | None = None, mip_opt_out: bool | None = None, model: str | None = None, flux_encoding: str = 'linear16', tag: list | None = None, params: InputParams | None = None, should_interrupt: bool = True, settings: DeepgramFluxSTTSettings | None = None, **kwargs)[source]
Initialize the Deepgram Flux STT service.
- Parameters:
api_key – Deepgram API key for authentication. Required for API access.
url – WebSocket URL for the Deepgram Flux API. Defaults to the preview endpoint.
sample_rate – Audio sample rate in Hz. If None, uses the pipeline sample rate.
mip_opt_out – Opt out of the Deepgram Model Improvement Program.
model –
Deepgram Flux model to use for transcription.
Deprecated since version 0.0.105: Use
settings=DeepgramFluxSTTService.Settings(model=...)instead.flux_encoding – Audio encoding format required by Flux API. Must be “linear16”. Raw signed little-endian 16-bit PCM encoding.
tag – Tags to label requests for identification during usage reporting.
params –
InputParams instance containing detailed API configuration options.
Deprecated since version 0.0.105: Use
settings=DeepgramFluxSTTService.Settings(...)instead.should_interrupt – Determine whether the bot should be interrupted when Flux detects that the user is speaking.
settings – Runtime-updatable settings. When provided alongside deprecated parameters,
settingsvalues take precedence.**kwargs – Additional arguments passed to the parent classes.
Examples
Basic usage with default parameters:
stt = DeepgramFluxSTTService(api_key="your-api-key")
Advanced usage with custom parameters:
stt = DeepgramFluxSTTService( api_key="your-api-key", settings=DeepgramFluxSTTService.Settings( model="flux-general-en", eager_eot_threshold=0.5, eot_threshold=0.8, keyterm=["AI", "machine learning", "neural network"], tag=["production", "voice-agent"], ), )
Multilingual usage with language hints:
stt = DeepgramFluxSTTService( api_key="your-api-key", settings=DeepgramFluxSTTService.Settings( model="flux-general-multi", language_hints=[Language.EN, Language.ES], ), )
- async run_stt(audio: bytes) AsyncGenerator[Frame | None, None][source]
Send audio data to Deepgram Flux for transcription.
Transmits raw audio bytes to the Deepgram Flux API for real-time speech recognition. Transcription results are received asynchronously through WebSocket callbacks and processed in the background.
- Parameters:
audio – Raw audio bytes in linear16 format (signed little-endian 16-bit PCM).
- Yields:
Frame –
- None (transcription results are delivered via WebSocket callbacks
rather than as return values from this method).
- Raises:
Exception – If the WebSocket connection is not established or if there are issues sending the audio data.
- class pipecat.services.deepgram.flux.stt.DeepgramFluxSTTSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, language: Language | str | None | _NotGiven = <factory>, eager_eot_threshold: float | None | _NotGiven = <factory>, eot_threshold: float | None | _NotGiven = <factory>, eot_timeout_ms: int | None | _NotGiven = <factory>, keyterm: list | _NotGiven = <factory>, min_confidence: float | None | _NotGiven = <factory>, language_hints: list[Language] | None | _NotGiven = <factory>)[source]
Bases:
STTSettingsSettings for DeepgramFluxSTTService.
- Parameters:
eager_eot_threshold – EagerEndOfTurn/TurnResumed threshold. Off by default. Lower values = more aggressive (faster response, more LLM calls). Higher values = more conservative (slower response, fewer LLM calls).
eot_threshold – End-of-turn confidence required to finish a turn (default 0.7).
eot_timeout_ms – Time in ms after speech to finish a turn regardless of EOT confidence (default 5000).
keyterm – Keyterms to boost recognition accuracy for specialized terminology.
min_confidence – Minimum confidence required to create a TranscriptionFrame.
language_hints – Languages to bias transcription toward. Only honored by the
flux-general-multimodel. An empty list clears any active hints;None/NOT_GIVENmeans no hints (auto-detect). Can be updated mid-stream viaSTTUpdateSettingsFrame.
- eager_eot_threshold: float | None | _NotGiven
- eot_threshold: float | None | _NotGiven
- eot_timeout_ms: int | None | _NotGiven
- keyterm: list | _NotGiven
- min_confidence: float | None | _NotGiven
- class pipecat.services.deepgram.flux.stt.FluxEventType(*values)[source]
Bases:
StrEnumDeepgram Flux TurnInfo event types.
These events are contained within TurnInfo messages and indicate different stages of speech processing and turn detection.
- START_OF_TURN = 'StartOfTurn'
- TURN_RESUMED = 'TurnResumed'
- END_OF_TURN = 'EndOfTurn'
- EAGER_END_OF_TURN = 'EagerEndOfTurn'
- UPDATE = 'Update'
- class pipecat.services.deepgram.flux.stt.FluxMessageType(*values)[source]
Bases:
StrEnumDeepgram Flux WebSocket message types.
These are the top-level message types that can be received from the Deepgram Flux WebSocket connection.
- RECEIVE_CONNECTED = 'Connected'
- RECEIVE_FATAL_ERROR = 'Error'
- TURN_INFO = 'TurnInfo'
- CONFIGURE_SUCCESS = 'ConfigureSuccess'
- CONFIGURE_FAILURE = 'ConfigureFailure'