tts

MiniMax text-to-speech service implementation.

This module provides integration with MiniMax’s T2A (Text-to-Audio) API for streaming text-to-speech synthesis.

pipecat.services.minimax.tts.language_to_minimax_language(language: Language) str | None[source]

Convert a Language enum to MiniMax language format.

Parameters:

language – The Language enum value to convert.

Returns:

The corresponding MiniMax language name, or None if not supported.

class pipecat.services.minimax.tts.MiniMaxTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>, speed: float | None | _NotGiven = <factory>, volume: float | None | _NotGiven = <factory>, pitch: int | None | _NotGiven = <factory>, emotion: str | None | _NotGiven = <factory>, text_normalization: bool | None | _NotGiven = <factory>, latex_read: bool | None | _NotGiven = <factory>, language_boost: str | None | _NotGiven = <factory>)[source]

Bases: TTSSettings

Settings for MiniMaxHttpTTSService.

Parameters:
  • speed – Speech speed (range: 0.5 to 2.0).

  • volume – Speech volume (range: 0 to 10).

  • pitch – Pitch adjustment (range: -12 to 12).

  • emotion – Emotional tone (options: “happy”, “sad”, “angry”, “fearful”, “disgusted”, “surprised”, “calm”, “fluent”).

  • text_normalization – Enable text normalization (Chinese/English).

  • latex_read – Enable LaTeX formula reading.

  • language_boost – Language boost string for multilingual support.

speed: float | None | _NotGiven
volume: float | None | _NotGiven
pitch: int | None | _NotGiven
emotion: str | None | _NotGiven
text_normalization: bool | None | _NotGiven
latex_read: bool | None | _NotGiven
language_boost: str | None | _NotGiven
classmethod from_mapping(settings: Mapping[str, Any]) Self[source]

Construct settings from a plain dict, destructuring legacy nested dicts.

Handles voice_setting (with volvolume rename) and audio_setting (with prefixed field mapping).

class pipecat.services.minimax.tts.MiniMaxHttpTTSService(*, api_key: str, base_url: str = 'https://api.minimax.io/v1/t2a_v2', group_id: str, model: str | None = None, voice_id: str | None = None, aiohttp_session: ClientSession, sample_rate: int | None = None, stream: bool = True, params: InputParams | None = None, settings: MiniMaxTTSSettings | None = None, **kwargs)[source]

Bases: TTSService

Text-to-speech service using MiniMax’s T2A (Text-to-Audio) API.

Provides streaming text-to-speech synthesis using MiniMax’s HTTP API with support for various voice settings, emotions, and audio configurations. Supports real-time audio streaming with configurable voice parameters.

Platform documentation: https://platform.minimax.io/docs/api-reference/speech-t2a-http

Settings

alias of MiniMaxTTSSettings

class InputParams(*, language: Language | None = Language.EN, speed: float | None = 1.0, volume: float | None = 1.0, pitch: int | None = 0, emotion: str | None = None, text_normalization: bool | None = None, latex_read: bool | None = None, exclude_aggregated_audio: bool | None = None)[source]

Bases: BaseModel

Configuration parameters for MiniMax TTS.

Deprecated since version 0.0.105: Use MiniMaxHttpTTSService.Settings directly via the settings parameter instead.

Parameters:
  • language – Language for TTS generation. Supports 40 languages. Note: Filipino, Tamil, and Persian require speech-2.6-* models.

  • speed – Speech speed (range: 0.5 to 2.0).

  • volume – Speech volume (range: 0 to 10).

  • pitch – Pitch adjustment (range: -12 to 12).

  • emotion – Emotional tone (options: “happy”, “sad”, “angry”, “fearful”, “disgusted”, “surprised”, “calm”, “fluent”).

  • text_normalization – Enable text normalization (Chinese/English).

  • latex_read – Enable LaTeX formula reading.

  • exclude_aggregated_audio – Whether to exclude aggregated audio in final chunk.

language: Language | None
speed: float | None
volume: float | None
pitch: int | None
emotion: str | None
text_normalization: bool | None
latex_read: bool | None
exclude_aggregated_audio: bool | None
__init__(*, api_key: str, base_url: str = 'https://api.minimax.io/v1/t2a_v2', group_id: str, model: str | None = None, voice_id: str | None = None, aiohttp_session: ClientSession, sample_rate: int | None = None, stream: bool = True, params: InputParams | None = None, settings: MiniMaxTTSSettings | None = None, **kwargs)[source]

Initialize the MiniMax TTS service.

Parameters:
  • api_key – MiniMax API key for authentication.

  • base_url – API base URL, defaults to MiniMax’s T2A endpoint. Global: https://api.minimax.io/v1/t2a_v2 Mainland China: https://api.minimaxi.chat/v1/t2a_v2 Western United States: https://api-uw.minimax.io/v1/t2a_v2

  • group_id – MiniMax Group ID to identify project.

  • model

    TTS model name. Defaults to “speech-02-turbo”. Options include: “speech-2.6-hd”, “speech-2.6-turbo” (latest, supports Filipino/Tamil/Persian), “speech-02-hd”, “speech-02-turbo”, “speech-01-hd”, “speech-01-turbo”.

    Deprecated since version 0.0.105: Use settings=MiniMaxHttpTTSService.Settings(model=...) instead.

  • voice_id

    Voice identifier. Defaults to “Calm_Woman”.

    Deprecated since version 0.0.105: Use settings=MiniMaxHttpTTSService.Settings(voice=...) instead.

  • aiohttp_session – aiohttp.ClientSession for API communication.

  • sample_rate – Output audio sample rate in Hz. If None, uses pipeline default.

  • stream – Whether to use streaming mode. Defaults to True.

  • params

    Additional configuration parameters.

    Deprecated since version 0.0.105: Use settings=MiniMaxHttpTTSService.Settings(...) instead.

  • settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.

  • **kwargs – Additional arguments passed to parent TTSService.

can_generate_metrics() bool[source]

Check if this service can generate processing metrics.

Returns:

True, as MiniMax service supports metrics generation.

language_to_service_language(language: Language) str | None[source]

Convert a Language enum to MiniMax service language format.

Parameters:

language – The language to convert.

Returns:

The MiniMax-specific language name, or None if not supported.

async start(frame: StartFrame)[source]

Start the MiniMax TTS service.

Parameters:

frame – The start frame containing initialization parameters.

async run_tts(text: str, context_id: str) AsyncGenerator[Frame, None][source]

Generate TTS audio from text using MiniMax’s streaming API.

Parameters:
  • text – The text to synthesize into speech.

  • context_id – The context ID for tracking audio frames.

Yields:

Frame – Audio frames containing the synthesized speech.