tts

NVIDIA Nemotron Speech text-to-speech service implementation.

This module provides integration with NVIDIA Nemotron Speech’s TTS services through gRPC API for high-quality speech synthesis.

Refer to the NVIDIA TTS NIM documentation for usage, customization, and local deployment steps: https://docs.nvidia.com/nim/speech/latest/tts/

class pipecat.services.nvidia.tts.NvidiaTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>, quality: int | _NotGiven = <factory>)[source]

Bases: TTSSettings

Settings for NvidiaTTSService.

Parameters:

quality – Audio quality setting (0-100).

quality: int | _NotGiven
class pipecat.services.nvidia.tts.NvidiaTTSService(*, api_key: str | None = None, server: str = 'grpc.nvcf.nvidia.com:443', voice_id: str | None = None, sample_rate: int | None = None, model_function_map: Mapping[str, str] = {'function_id': '877104f7-e885-42b9-8de8-f6e4c6303969', 'model_name': 'magpie-tts-multilingual'}, params: InputParams | None = None, settings: NvidiaTTSSettings | None = None, use_ssl: bool = True, custom_dictionary: dict | None = None, encoding: EnumTypeWrapper | None = 1, **kwargs)[source]

Bases: TTSService

NVIDIA Nemotron Speech text-to-speech service.

Provides high-quality text-to-speech synthesis using NVIDIA Nemotron Speech’s cloud-based TTS models. Supports multiple voices, languages, and configurable quality settings.

Settings

alias of NvidiaTTSSettings

class InputParams(*, language: Language | None = Language.EN_US, quality: int | None = 20)[source]

Bases: BaseModel

Input parameters for Nemotron Speech TTS configuration.

Deprecated since version 0.0.105: Use NvidiaTTSService.Settings directly via the settings parameter instead.

Parameters:
  • language – Language code for synthesis. Defaults to US English.

  • quality – Audio quality setting (0-100). Defaults to 20.

language: Language | None
quality: int | None
__init__(*, api_key: str | None = None, server: str = 'grpc.nvcf.nvidia.com:443', voice_id: str | None = None, sample_rate: int | None = None, model_function_map: Mapping[str, str] = {'function_id': '877104f7-e885-42b9-8de8-f6e4c6303969', 'model_name': 'magpie-tts-multilingual'}, params: InputParams | None = None, settings: NvidiaTTSSettings | None = None, use_ssl: bool = True, custom_dictionary: dict | None = None, encoding: EnumTypeWrapper | None = 1, **kwargs)[source]

Initialize the NVIDIA Nemotron Speech TTS service.

Parameters:
  • api_key – NVIDIA API key for authentication. Required when using the cloud endpoint. Not needed for local deployments.

  • server – gRPC server endpoint. Defaults to NVIDIA’s cloud endpoint. For local deployments, pass the local address (e.g. localhost:50051).

  • voice_id

    Voice model identifier. Defaults to multilingual Aria voice.

    Deprecated since version 0.0.105: Use settings=NvidiaTTSService.Settings(voice=...) instead.

  • sample_rate – Audio sample rate. If None, uses service default.

  • model_function_map – Dictionary containing function_id and model_name for the TTS model.

  • params

    Additional configuration parameters for TTS synthesis.

    Deprecated since version 0.0.105: Use settings=NvidiaTTSService.Settings(...) instead.

  • settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.

  • use_ssl – Whether to use SSL for the gRPC connection. Defaults to True for the NVIDIA cloud endpoint. Set to False for local deployments.

  • custom_dictionary – Custom pronunciation dictionary mapping words (graphemes) to IPA phonetic representations (phonemes), e.g. {"NVIDIA": "ɛn.vɪ.diː.ʌ"}. See https://docs.nvidia.com/nim/speech/latest/tts/phoneme-support.html for the list of supported IPA phonemes.

  • encoding – Output audio encoding format. Defaults to AudioEncoding.LINEAR_PCM.

  • **kwargs – Additional arguments passed to parent TTSService.

can_generate_metrics() bool[source]

Check if this service can generate metrics.

Returns:

True as this service supports metric generation.

async set_model(model: str)[source]

Set the TTS model.

Deprecated since version 0.0.104: Model cannot be changed after initialization for NVIDIA Nemotron Speech TTS. Set model and function id in the constructor instead.

Example:

NvidiaTTSService(
    api_key=...,
    model_function_map={"function_id": "<UUID>", "model_name": "<model_name>"},
)
Parameters:

model – The model name to set.

async start(frame: StartFrame)[source]

Start the NVIDIA Nemotron Speech TTS service.

Parameters:

frame – The start frame containing initialization parameters.

async stop(frame: EndFrame)[source]

Stop the NVIDIA Nemotron Speech TTS service.

Parameters:

frame – The end frame.

async cancel(frame: CancelFrame)[source]

Cancel the NVIDIA Nemotron Speech TTS service.

Parameters:

frame – The cancel frame.

async flush_audio(context_id: str | None = None)[source]

Flush any pending audio and finalize the current context.

Parameters:

context_id – The specific context to flush. If None, falls back to the currently active context.

async on_audio_context_interrupted(context_id: str)[source]

Cancel the active gRPC synthesis stream when the bot is interrupted.

Parameters:

context_id – The ID of the audio context that was interrupted.

async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]

Generate speech from text using NVIDIA Nemotron Speech TTS.

On the first call for a turn, starts a persistent synthesize_online gRPC stream. Subsequent calls within the same turn feed text into the existing stream, enabling Magpie’s cross-sentence stitching.

Text is split into chunks respecting Magpie’s per-request limits. Each chunk becomes a separate request in the gRPC stream, stitched seamlessly by Magpie.

Parameters:
  • text – The text to synthesize into speech.

  • context_id – The context ID for tracking audio frames.

Yields:
None on success. Audio is delivered asynchronously via the

response consumer. ErrorFrame on failure.