tts
Mistral text-to-speech service implementation.
This module provides integration with Mistral’s Voxtral TTS API for generating speech from text input using HTTP streaming with Server-Sent Events.
- class pipecat.services.mistral.tts.MistralTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>)[source]
Bases:
TTSSettingsSettings for MistralTTSService.
- Parameters:
model – TTS model identifier.
voice – Voice identifier.
language – Language for speech synthesis.
- class pipecat.services.mistral.tts.MistralTTSService(*, api_key: str | None = None, sample_rate: int | None = None, settings: MistralTTSSettings | None = None, **kwargs)[source]
Bases:
TTSServiceMistral Text-to-Speech service using the Voxtral TTS API.
This service uses Mistral’s streaming TTS API to generate PCM-encoded audio at 24kHz. The API returns base64-encoded float32 PCM chunks via Server-Sent Events, which are converted to int16 for the Pipecat pipeline.
- Settings
alias of
MistralTTSSettings
- MISTRAL_SAMPLE_RATE = 24000
- __init__(*, api_key: str | None = None, sample_rate: int | None = None, settings: MistralTTSSettings | None = None, **kwargs)[source]
Initialize Mistral TTS service.
- Parameters:
api_key – Mistral API key for authentication.
sample_rate – Output audio sample rate in Hz. Audio is resampled from Mistral’s native 24kHz when a different rate is requested.
settings – Runtime-updatable settings.
**kwargs – Additional keyword arguments passed to TTSService.
- can_generate_metrics() bool[source]
Check if this service can generate processing metrics.
- Returns:
True, as Mistral TTS service supports metrics generation.
- async run_tts(text: str, context_id: str) AsyncGenerator[Frame, None][source]
Generate speech from text using Mistral’s TTS API.
- Parameters:
text – The text to synthesize into speech.
context_id – The context ID for tracking audio frames.
- Yields:
Frame – Audio frames containing the synthesized speech data.