tts

Google Cloud Text-to-Speech service implementations.

This module provides integration with Google Cloud Text-to-Speech API, offering both HTTP-based synthesis with SSML support and streaming synthesis for real-time applications.

It also includes GeminiTTSService which uses Gemini’s TTS-specific models for natural voice control and multi-speaker conversations.

pipecat.services.google.tts.language_to_google_tts_language(language: Language) str | None[source]

Convert a Language enum to Google TTS language code.

Source: https://docs.cloud.google.com/text-to-speech/docs/chirp3-hd

Parameters:

language – The Language enum value to convert.

Returns:

The corresponding Google TTS language code, or None if not supported.

pipecat.services.google.tts.language_to_gemini_tts_language(language: Language) str | None[source]

Convert a Language enum to Gemini TTS language code.

Source: https://docs.cloud.google.com/text-to-speech/docs/gemini-tts#available_languages

Parameters:

language – The Language enum value to convert.

Returns:

The corresponding Gemini TTS language code, or None if not supported.

class pipecat.services.google.tts.GoogleHttpTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>, pitch: str | None | _NotGiven = <factory>, rate: str | None | _NotGiven = <factory>, speaking_rate: float | None | _NotGiven = <factory>, volume: str | None | _NotGiven = <factory>, emphasis: Literal['strong', 'moderate', 'reduced', 'none'] | None | ~pipecat.services.settings._NotGiven=<factory>, gender: Literal['male', 'female', 'neutral'] | None | ~pipecat.services.settings._NotGiven=<factory>, google_style: Literal['apologetic', 'calm', 'empathetic', 'firm', 'lively'] | None | ~pipecat.services.settings._NotGiven=<factory>)[source]

Bases: TTSSettings

Settings for GoogleHttpTTSService.

Parameters:
  • pitch – Voice pitch adjustment (e.g., “+2st”, “-50%”).

  • rate – Speaking rate adjustment (e.g., “slow”, “fast”, “125%”). Used for SSML prosody tags (non-Chirp voices).

  • speaking_rate – Speaking rate for AudioConfig (Chirp/Journey voices). Range [0.25, 2.0].

  • volume – Volume adjustment (e.g., “loud”, “soft”, “+6dB”).

  • emphasis – Emphasis level for the text.

  • gender – Voice gender preference.

  • google_style – Google-specific voice style.

pitch: str | None | _NotGiven
rate: str | None | _NotGiven
speaking_rate: float | None | _NotGiven
volume: str | None | _NotGiven
emphasis: Literal['strong', 'moderate', 'reduced', 'none'] | None | _NotGiven
gender: Literal['male', 'female', 'neutral'] | None | _NotGiven
google_style: Literal['apologetic', 'calm', 'empathetic', 'firm', 'lively'] | None | _NotGiven
class pipecat.services.google.tts.GoogleTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>, speaking_rate: float | None | _NotGiven = <factory>)[source]

Bases: TTSSettings

Settings for GoogleTTSService.

Parameters:

speaking_rate – The speaking rate, in the range [0.25, 2.0].

speaking_rate: float | None | _NotGiven
pipecat.services.google.tts.GoogleStreamTTSSettings

Deprecated since 0.0.105: Use GoogleTTSService.Settings instead.

class pipecat.services.google.tts.GeminiTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>, prompt: str | None | _NotGiven = <factory>, multi_speaker: bool | _NotGiven = <factory>, speaker_configs: list[dict[str, ~typing.Any]] | None | ~pipecat.services.settings._NotGiven=<factory>)[source]

Bases: TTSSettings

Settings for GeminiTTSService.

Parameters:
  • prompt – Optional style instructions for how to synthesize the content.

  • multi_speaker – Whether to enable multi-speaker support.

  • speaker_configs – List of speaker configurations for multi-speaker mode.

prompt: str | None | _NotGiven
multi_speaker: bool | _NotGiven
speaker_configs: list[dict[str, Any]] | None | _NotGiven
class pipecat.services.google.tts.GoogleHttpTTSService(*, credentials: str | None = None, credentials_path: str | None = None, location: str | None = None, voice_id: str | None = None, sample_rate: int | None = None, params: InputParams | None = None, settings: GoogleHttpTTSSettings | None = None, **kwargs)[source]

Bases: TTSService

Google Cloud Text-to-Speech HTTP service with SSML support.

Provides text-to-speech synthesis using Google Cloud’s HTTP API with comprehensive SSML support for voice customization, prosody control, and styling options. Ideal for applications requiring fine-grained control over speech output.

Note

Requires Google Cloud credentials via service account JSON, credentials file, or default application credentials (GOOGLE_APPLICATION_CREDENTIALS). Chirp and Journey voices don’t support SSML and will use plain text input.

Settings

alias of GoogleHttpTTSSettings

class InputParams(*, pitch: str | None = None, rate: str | None = None, speaking_rate: float | None = None, volume: str | None = None, emphasis: Literal['strong', 'moderate', 'reduced', 'none'] | None = None, language: Language | None = Language.EN, gender: Literal['male', 'female', 'neutral'] | None = None, google_style: Literal['apologetic', 'calm', 'empathetic', 'firm', 'lively'] | None = None)[source]

Bases: BaseModel

Input parameters for Google HTTP TTS voice customization.

Deprecated since version 0.0.105: Use GoogleHttpTTSService.Settings directly via the settings parameter instead.

Parameters:
  • pitch – Voice pitch adjustment (e.g., “+2st”, “-50%”).

  • rate – Speaking rate adjustment (e.g., “slow”, “fast”, “125%”). Used for SSML prosody tags (non-Chirp voices).

  • speaking_rate – Speaking rate for AudioConfig (Chirp/Journey voices). Range [0.25, 2.0].

  • volume – Volume adjustment (e.g., “loud”, “soft”, “+6dB”).

  • emphasis – Emphasis level for the text.

  • language – Language for synthesis. Defaults to English.

  • gender – Voice gender preference.

  • google_style – Google-specific voice style.

pitch: str | None
rate: str | None
speaking_rate: float | None
volume: str | None
emphasis: Literal['strong', 'moderate', 'reduced', 'none'] | None
language: Language | None
gender: Literal['male', 'female', 'neutral'] | None
google_style: Literal['apologetic', 'calm', 'empathetic', 'firm', 'lively'] | None
__init__(*, credentials: str | None = None, credentials_path: str | None = None, location: str | None = None, voice_id: str | None = None, sample_rate: int | None = None, params: InputParams | None = None, settings: GoogleHttpTTSSettings | None = None, **kwargs)[source]

Initializes the Google HTTP TTS service.

Parameters:
  • credentials – JSON string containing Google Cloud service account credentials.

  • credentials_path – Path to Google Cloud service account JSON file.

  • location – Google Cloud location for regional endpoint (e.g., “us-central1”).

  • voice_id

    Google TTS voice identifier (e.g., “en-US-Standard-A”).

    Deprecated since version 0.0.105: Use settings=GoogleHttpTTSService.Settings(voice=...) instead.

  • sample_rate – Audio sample rate in Hz. If None, uses default.

  • params

    Voice customization parameters including pitch, rate, volume, etc.

    Deprecated since version 0.0.105: Use settings=GoogleHttpTTSService.Settings(...) instead.

  • settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.

  • **kwargs – Additional arguments passed to parent TTSService.

can_generate_metrics() bool[source]

Check if this service can generate processing metrics.

Returns:

True, as Google HTTP TTS service supports metrics generation.

language_to_service_language(language: Language) str | None[source]

Convert a Language enum to Google TTS language format.

Parameters:

language – The language to convert.

Returns:

The Google TTS-specific language code, or None if not supported.

async run_tts(text: str, context_id: str) AsyncGenerator[Frame, None][source]

Generate speech from text using Google’s HTTP TTS API.

Parameters:
  • text – The text to synthesize into speech.

  • context_id – The context ID for tracking audio frames.

Yields:

Frame – Audio frames containing the synthesized speech.

class pipecat.services.google.tts.GoogleBaseTTSService(*, text_aggregation_mode: TextAggregationMode | None = None, aggregate_sentences: bool | None = None, push_text_frames: bool = True, push_stop_frames: bool = False, push_start_frame: bool = False, stop_frame_timeout_s: float = 3.0, push_silence_after_stop: bool = False, silence_time_s: float = 2.0, pause_frame_processing: bool = False, append_trailing_space: bool = False, sample_rate: int | None = None, skip_aggregator_types: list[str] | None = [], text_transforms: list[tuple[AggregationType | str, Callable[[str, str | AggregationType], Awaitable[str]]]] | None = None, text_filters: Sequence[BaseTextFilter] | None = None, transport_destination: str | None = None, settings: TTSSettings | None = None, reuse_context_id_within_turn: bool = True, **kwargs)[source]

Bases: TTSService

Base class for Google Cloud Text-to-Speech streaming services.

Provides shared streaming synthesis logic for Google TTS services. This is an abstract base class. Use GoogleTTSService or GeminiTTSService instead.

can_generate_metrics() bool[source]

Check if this service can generate processing metrics.

Returns:

True, as Google streaming TTS services support metrics generation.

language_to_service_language(language: Language) str | None[source]

Convert a Language enum to Google TTS language format.

Parameters:

language – The language to convert.

Returns:

The Google TTS-specific language code, or None if not supported.

class pipecat.services.google.tts.GoogleTTSService(*, credentials: str | None = None, credentials_path: str | None = None, location: str | None = None, voice_id: str | None = None, voice_cloning_key: str | None = None, sample_rate: int | None = None, params: InputParams | None = None, settings: GoogleTTSSettings | None = None, **kwargs)[source]

Bases: GoogleBaseTTSService

Google Cloud Text-to-Speech streaming service.

Provides real-time text-to-speech synthesis using Google Cloud’s streaming API for low-latency applications. Optimized for Chirp 3 HD and Journey voices with continuous audio streaming capabilities.

Note

Requires Google Cloud credentials via service account JSON, file path, or default application credentials (GOOGLE_APPLICATION_CREDENTIALS env var). Only Chirp 3 HD and Journey voices are supported. Use GoogleHttpTTSService for other voices.

Example:

tts = GoogleTTSService(
    credentials_path="/path/to/service-account.json",
    settings=GoogleTTSService.Settings(
        voice="en-US-Chirp3-HD-Charon",
        language=Language.EN_US,
    )
)
Settings

alias of GoogleTTSSettings

class InputParams(*, language: Language | None = Language.EN, speaking_rate: float | None = None)[source]

Bases: BaseModel

Input parameters for Google streaming TTS configuration.

Deprecated since version 0.0.105: Use GoogleTTSService.Settings directly via the settings parameter instead.

Parameters:
  • language – Language for synthesis. Defaults to English.

  • speaking_rate – The speaking rate, in the range [0.25, 2.0].

language: Language | None
speaking_rate: float | None
__init__(*, credentials: str | None = None, credentials_path: str | None = None, location: str | None = None, voice_id: str | None = None, voice_cloning_key: str | None = None, sample_rate: int | None = None, params: InputParams | None = None, settings: GoogleTTSSettings | None = None, **kwargs)[source]

Initializes the Google streaming TTS service.

Parameters:
  • credentials – JSON string containing Google Cloud service account credentials.

  • credentials_path – Path to Google Cloud service account JSON file.

  • location – Google Cloud location for regional endpoint (e.g., “us-central1”).

  • voice_id

    Google TTS voice identifier (e.g., “en-US-Chirp3-HD-Charon”).

    Deprecated since version 0.0.105: Use settings=GoogleTTSService.Settings(voice=...) instead.

  • voice_cloning_key – The voice cloning key for Chirp 3 custom voices.

  • sample_rate – Audio sample rate in Hz. If None, uses default.

  • params

    Language configuration parameters.

    Deprecated since version 0.0.105: Use settings=GoogleTTSService.Settings(...) instead.

  • settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.

  • **kwargs – Additional arguments passed to parent TTSService.

async run_tts(text: str, context_id: str) AsyncGenerator[Frame, None][source]

Generate streaming speech from text using Google’s streaming API.

Parameters:
  • text – The text to synthesize into speech.

  • context_id – The context ID for tracking audio frames.

Yields:

Frame – Audio frames containing the synthesized speech as it’s generated.

class pipecat.services.google.tts.GeminiTTSService(*, model: str | None = None, credentials: str | None = None, credentials_path: str | None = None, location: str | None = None, voice_id: str | None = None, sample_rate: int | None = None, params: InputParams | None = None, settings: GeminiTTSSettings | None = None, **kwargs)[source]

Bases: GoogleBaseTTSService

Gemini Text-to-Speech streaming service using Gemini TTS models.

Provides real-time text-to-speech synthesis using Gemini’s TTS-specific models (gemini-2.5-flash-tts and gemini-2.5-pro-tts) with support for natural voice control, prompts for style instructions, expressive markup tags, and multi-speaker conversations.

Note

Requires Google Cloud credentials via service account JSON, credentials file, or default application credentials (GOOGLE_APPLICATION_CREDENTIALS).

Uses the Google Cloud Text-to-Speech streaming API for low-latency synthesis.

Example:

tts = GeminiTTSService(
    credentials_path="/path/to/service-account.json",
    settings=GeminiTTSService.Settings(
        model="gemini-2.5-flash-tts",
        voice="Kore",
        language=Language.EN_US,
        prompt="Say this in a friendly and helpful tone"
    )
)
Settings

alias of GeminiTTSSettings

GOOGLE_SAMPLE_RATE = 24000
AVAILABLE_VOICES = ['Achernar', 'Achird', 'Algenib', 'Algieba', 'Alnilam', 'Aoede', 'Autonoe', 'Callirhoe', 'Charon', 'Despina', 'Enceladus', 'Erinome', 'Fenrir', 'Gacrux', 'Iapetus', 'Kore', 'Laomedeia', 'Leda', 'Orus', 'Puck', 'Pulcherrima', 'Rasalgethi', 'Sadachbia', 'Sadaltager', 'Schedar', 'Sulafar', 'Umbriel', 'Vindemiatrix', 'Zephyr', 'Zubenelgenubi']
class InputParams(*, language: Language | None = Language.EN, prompt: str | None = None, multi_speaker: bool = False, speaker_configs: list[dict] | None = None)[source]

Bases: BaseModel

Input parameters for Gemini TTS configuration.

Deprecated since version 0.0.105: Use GeminiTTSService.Settings directly via the settings parameter instead.

Parameters:
  • language – Language for synthesis. Defaults to English.

  • prompt – Optional style instructions for how to synthesize the content.

  • multi_speaker – Whether to enable multi-speaker support.

  • speaker_configs – List of speaker configurations for multi-speaker mode.

language: Language | None
prompt: str | None
multi_speaker: bool
speaker_configs: list[dict] | None
__init__(*, model: str | None = None, credentials: str | None = None, credentials_path: str | None = None, location: str | None = None, voice_id: str | None = None, sample_rate: int | None = None, params: InputParams | None = None, settings: GeminiTTSSettings | None = None, **kwargs)[source]

Initializes the Gemini TTS service.

Parameters:
  • model

    Gemini TTS model to use. Must be a TTS model like

    ”gemini-2.5-flash-tts” or “gemini-2.5-pro-tts”.

    Deprecated since version 0.0.105: Use settings=GeminiTTSService.Settings(model=...) instead.

  • credentials – JSON string containing Google Cloud service account credentials.

  • credentials_path – Path to Google Cloud service account JSON file.

  • location – Google Cloud location for regional endpoint (e.g., “us-central1”).

  • voice_id

    Voice name from the available Gemini voices.

    Deprecated since version 0.0.105: Use settings=GeminiTTSService.Settings(voice=...) instead.

  • sample_rate – Audio sample rate in Hz. If None, uses Google’s default 24kHz.

  • params

    TTS configuration parameters.

    Deprecated since version 0.0.105: Use settings=GeminiTTSService.Settings(...) instead.

  • settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.

  • **kwargs – Additional arguments passed to parent TTSService.

language_to_service_language(language: Language) str | None[source]

Convert a Language enum to Gemini TTS language format.

Parameters:

language – The language to convert.

Returns:

The Gemini TTS-specific language code, or None if not supported.

async start(frame: StartFrame)[source]

Start the Gemini TTS service.

Parameters:

frame – The start frame containing initialization parameters.

async run_tts(text: str, context_id: str) AsyncGenerator[Frame, None][source]

Generate streaming speech from text using Gemini TTS models.

Parameters:
  • text – The text to synthesize into speech.

  • context_id – The context ID for tracking audio frames. Can include markup tags like [sigh], [laughing], [whispering] for expressive control.

Yields:

Frame – Audio frames containing the synthesized speech as it’s generated.