tts
MiniMax text-to-speech service implementation.
This module provides integration with MiniMax’s T2A (Text-to-Audio) API for streaming text-to-speech synthesis.
- pipecat.services.minimax.tts.language_to_minimax_language(language: Language) str | None[source]
Convert a Language enum to MiniMax language format.
- Parameters:
language – The Language enum value to convert.
- Returns:
The corresponding MiniMax language name, or None if not supported.
- class pipecat.services.minimax.tts.MiniMaxTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>, speed: float | None | _NotGiven = <factory>, volume: float | None | _NotGiven = <factory>, pitch: int | None | _NotGiven = <factory>, emotion: str | None | _NotGiven = <factory>, text_normalization: bool | None | _NotGiven = <factory>, latex_read: bool | None | _NotGiven = <factory>, language_boost: str | None | _NotGiven = <factory>)[source]
Bases:
TTSSettingsSettings for MiniMaxHttpTTSService.
- Parameters:
speed – Speech speed (range: 0.5 to 2.0).
volume – Speech volume (range: 0 to 10).
pitch – Pitch adjustment (range: -12 to 12).
emotion – Emotional tone (options: “happy”, “sad”, “angry”, “fearful”, “disgusted”, “surprised”, “calm”, “fluent”).
text_normalization – Enable text normalization (Chinese/English).
latex_read – Enable LaTeX formula reading.
language_boost – Language boost string for multilingual support.
- speed: float | None | _NotGiven
- volume: float | None | _NotGiven
- pitch: int | None | _NotGiven
- emotion: str | None | _NotGiven
- text_normalization: bool | None | _NotGiven
- latex_read: bool | None | _NotGiven
- language_boost: str | None | _NotGiven
- class pipecat.services.minimax.tts.MiniMaxHttpTTSService(*, api_key: str, base_url: str = 'https://api.minimax.io/v1/t2a_v2', group_id: str, model: str | None = None, voice_id: str | None = None, aiohttp_session: ClientSession, sample_rate: int | None = None, stream: bool = True, params: InputParams | None = None, settings: MiniMaxTTSSettings | None = None, **kwargs)[source]
Bases:
TTSServiceText-to-speech service using MiniMax’s T2A (Text-to-Audio) API.
Provides streaming text-to-speech synthesis using MiniMax’s HTTP API with support for various voice settings, emotions, and audio configurations. Supports real-time audio streaming with configurable voice parameters.
Platform documentation: https://platform.minimax.io/docs/api-reference/speech-t2a-http
- Settings
alias of
MiniMaxTTSSettings
- class InputParams(*, language: Language | None = Language.EN, speed: float | None = 1.0, volume: float | None = 1.0, pitch: int | None = 0, emotion: str | None = None, text_normalization: bool | None = None, latex_read: bool | None = None, exclude_aggregated_audio: bool | None = None)[source]
Bases:
BaseModelConfiguration parameters for MiniMax TTS.
Deprecated since version 0.0.105: Use
MiniMaxHttpTTSService.Settingsdirectly via thesettingsparameter instead.- Parameters:
language – Language for TTS generation. Supports 40 languages. Note: Filipino, Tamil, and Persian require speech-2.6-* models.
speed – Speech speed (range: 0.5 to 2.0).
volume – Speech volume (range: 0 to 10).
pitch – Pitch adjustment (range: -12 to 12).
emotion – Emotional tone (options: “happy”, “sad”, “angry”, “fearful”, “disgusted”, “surprised”, “calm”, “fluent”).
text_normalization – Enable text normalization (Chinese/English).
latex_read – Enable LaTeX formula reading.
exclude_aggregated_audio – Whether to exclude aggregated audio in final chunk.
- speed: float | None
- volume: float | None
- pitch: int | None
- emotion: str | None
- text_normalization: bool | None
- latex_read: bool | None
- exclude_aggregated_audio: bool | None
- __init__(*, api_key: str, base_url: str = 'https://api.minimax.io/v1/t2a_v2', group_id: str, model: str | None = None, voice_id: str | None = None, aiohttp_session: ClientSession, sample_rate: int | None = None, stream: bool = True, params: InputParams | None = None, settings: MiniMaxTTSSettings | None = None, **kwargs)[source]
Initialize the MiniMax TTS service.
- Parameters:
api_key – MiniMax API key for authentication.
base_url – API base URL, defaults to MiniMax’s T2A endpoint. Global: https://api.minimax.io/v1/t2a_v2 Mainland China: https://api.minimaxi.chat/v1/t2a_v2 Western United States: https://api-uw.minimax.io/v1/t2a_v2
group_id – MiniMax Group ID to identify project.
model –
TTS model name. Defaults to “speech-02-turbo”. Options include: “speech-2.6-hd”, “speech-2.6-turbo” (latest, supports Filipino/Tamil/Persian), “speech-02-hd”, “speech-02-turbo”, “speech-01-hd”, “speech-01-turbo”.
Deprecated since version 0.0.105: Use
settings=MiniMaxHttpTTSService.Settings(model=...)instead.voice_id –
Voice identifier. Defaults to “Calm_Woman”.
Deprecated since version 0.0.105: Use
settings=MiniMaxHttpTTSService.Settings(voice=...)instead.aiohttp_session – aiohttp.ClientSession for API communication.
sample_rate – Output audio sample rate in Hz. If None, uses pipeline default.
stream – Whether to use streaming mode. Defaults to True.
params –
Additional configuration parameters.
Deprecated since version 0.0.105: Use
settings=MiniMaxHttpTTSService.Settings(...)instead.settings – Runtime-updatable settings. When provided alongside deprecated parameters,
settingsvalues take precedence.**kwargs – Additional arguments passed to parent TTSService.
- can_generate_metrics() bool[source]
Check if this service can generate processing metrics.
- Returns:
True, as MiniMax service supports metrics generation.
- language_to_service_language(language: Language) str | None[source]
Convert a Language enum to MiniMax service language format.
- Parameters:
language – The language to convert.
- Returns:
The MiniMax-specific language name, or None if not supported.
- async start(frame: StartFrame)[source]
Start the MiniMax TTS service.
- Parameters:
frame – The start frame containing initialization parameters.
- async run_tts(text: str, context_id: str) AsyncGenerator[Frame, None][source]
Generate TTS audio from text using MiniMax’s streaming API.
- Parameters:
text – The text to synthesize into speech.
context_id – The context ID for tracking audio frames.
- Yields:
Frame – Audio frames containing the synthesized speech.