stt
Mistral Speech-to-Text service implementation.
This module provides a real-time STT service that integrates with Mistral’s Voxtral Realtime transcription API using the Mistral SDK’s RealtimeConnection.
- class pipecat.services.mistral.stt.MistralSTTSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, language: Language | str | None | _NotGiven = <factory>)[source]
Bases:
STTSettingsSettings for MistralSTTService.
- Parameters:
model – STT model identifier.
language – Language hint for transcription.
- class pipecat.services.mistral.stt.MistralSTTService(*, api_key: str | None = None, base_url: str | None = None, sample_rate: int | None = None, target_streaming_delay_ms: int | None = None, ttfs_p99_latency: float | None = 1.0, settings: MistralSTTSettings | None = None, **kwargs)[source]
Bases:
STTServiceMistral Speech-to-Text service using the Voxtral Realtime API.
This service uses the Mistral SDK’s RealtimeConnection to stream audio and receive transcription events over WebSocket. It extends STTService directly (rather than WebsocketSTTService) because the SDK manages the WebSocket connection internally.
Event handlers available:
on_connected: Called when a transcription session is created.
on_disconnected: Called when the connection is closed.
on_connection_error: Called when a transcription error occurs.
Example:
@stt.event_handler("on_connected") async def on_connected(stt): logger.info("Mistral STT connected")
- Settings
alias of
MistralSTTSettings
- __init__(*, api_key: str | None = None, base_url: str | None = None, sample_rate: int | None = None, target_streaming_delay_ms: int | None = None, ttfs_p99_latency: float | None = 1.0, settings: MistralSTTSettings | None = None, **kwargs)[source]
Initialize Mistral STT service.
- Parameters:
api_key – Mistral API key for authentication.
base_url – Custom API endpoint URL.
sample_rate – Audio sample rate in Hz. If None, uses the pipeline sample rate.
target_streaming_delay_ms – Streaming delay for accuracy/latency tradeoff. Higher values may improve accuracy at the cost of latency.
ttfs_p99_latency – P99 latency from speech end to final transcript in seconds. Override for your deployment.
settings – Runtime-updatable settings.
**kwargs – Additional keyword arguments passed to STTService.
- can_generate_metrics() bool[source]
Check if the service can generate processing metrics.
- Returns:
True, indicating metrics are supported.
- async start(frame: StartFrame)[source]
Start the STT service and establish connection.
- Parameters:
frame – Frame indicating service should start.
- async stop(frame: EndFrame)[source]
Stop the STT service and close connection.
- Parameters:
frame – Frame indicating service should stop.
- async cancel(frame: CancelFrame)[source]
Cancel the STT service and close connection.
- Parameters:
frame – Frame indicating service should be cancelled.
- async process_frame(frame: Frame, direction: FrameDirection)[source]
Process incoming frames and handle speech events.
- Parameters:
frame – The frame to process.
direction – Direction of frame flow in the pipeline.