tts
NVIDIA Nemotron Speech text-to-speech service implementation.
This module provides integration with NVIDIA Nemotron Speech’s TTS services through gRPC API for high-quality speech synthesis.
Refer to the NVIDIA TTS NIM documentation for usage, customization, and local deployment steps: https://docs.nvidia.com/nim/speech/latest/tts/
- class pipecat.services.nvidia.tts.NvidiaTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>, quality: int | _NotGiven = <factory>)[source]
Bases:
TTSSettingsSettings for NvidiaTTSService.
- Parameters:
quality – Audio quality setting (0-100).
- quality: int | _NotGiven
- class pipecat.services.nvidia.tts.NvidiaTTSService(*, api_key: str | None = None, server: str = 'grpc.nvcf.nvidia.com:443', voice_id: str | None = None, sample_rate: int | None = None, model_function_map: Mapping[str, str] = {'function_id': '877104f7-e885-42b9-8de8-f6e4c6303969', 'model_name': 'magpie-tts-multilingual'}, params: InputParams | None = None, settings: NvidiaTTSSettings | None = None, use_ssl: bool = True, custom_dictionary: dict | None = None, encoding: EnumTypeWrapper | None = 1, **kwargs)[source]
Bases:
TTSServiceNVIDIA Nemotron Speech text-to-speech service.
Provides high-quality text-to-speech synthesis using NVIDIA Nemotron Speech’s cloud-based TTS models. Supports multiple voices, languages, and configurable quality settings.
- Settings
alias of
NvidiaTTSSettings
- class InputParams(*, language: Language | None = Language.EN_US, quality: int | None = 20)[source]
Bases:
BaseModelInput parameters for Nemotron Speech TTS configuration.
Deprecated since version 0.0.105: Use
NvidiaTTSService.Settingsdirectly via thesettingsparameter instead.- Parameters:
language – Language code for synthesis. Defaults to US English.
quality – Audio quality setting (0-100). Defaults to 20.
- quality: int | None
- __init__(*, api_key: str | None = None, server: str = 'grpc.nvcf.nvidia.com:443', voice_id: str | None = None, sample_rate: int | None = None, model_function_map: Mapping[str, str] = {'function_id': '877104f7-e885-42b9-8de8-f6e4c6303969', 'model_name': 'magpie-tts-multilingual'}, params: InputParams | None = None, settings: NvidiaTTSSettings | None = None, use_ssl: bool = True, custom_dictionary: dict | None = None, encoding: EnumTypeWrapper | None = 1, **kwargs)[source]
Initialize the NVIDIA Nemotron Speech TTS service.
- Parameters:
api_key – NVIDIA API key for authentication. Required when using the cloud endpoint. Not needed for local deployments.
server – gRPC server endpoint. Defaults to NVIDIA’s cloud endpoint. For local deployments, pass the local address (e.g.
localhost:50051).voice_id –
Voice model identifier. Defaults to multilingual Aria voice.
Deprecated since version 0.0.105: Use
settings=NvidiaTTSService.Settings(voice=...)instead.sample_rate – Audio sample rate. If None, uses service default.
model_function_map – Dictionary containing function_id and model_name for the TTS model.
params –
Additional configuration parameters for TTS synthesis.
Deprecated since version 0.0.105: Use
settings=NvidiaTTSService.Settings(...)instead.settings – Runtime-updatable settings. When provided alongside deprecated parameters,
settingsvalues take precedence.use_ssl – Whether to use SSL for the gRPC connection. Defaults to True for the NVIDIA cloud endpoint. Set to False for local deployments.
custom_dictionary – Custom pronunciation dictionary mapping words (graphemes) to IPA phonetic representations (phonemes), e.g.
{"NVIDIA": "ɛn.vɪ.diː.ʌ"}. See https://docs.nvidia.com/nim/speech/latest/tts/phoneme-support.html for the list of supported IPA phonemes.encoding – Output audio encoding format. Defaults to
AudioEncoding.LINEAR_PCM.**kwargs – Additional arguments passed to parent TTSService.
- can_generate_metrics() bool[source]
Check if this service can generate metrics.
- Returns:
True as this service supports metric generation.
- async set_model(model: str)[source]
Set the TTS model.
Deprecated since version 0.0.104: Model cannot be changed after initialization for NVIDIA Nemotron Speech TTS. Set model and function id in the constructor instead.
Example:
NvidiaTTSService( api_key=..., model_function_map={"function_id": "<UUID>", "model_name": "<model_name>"}, )
- Parameters:
model – The model name to set.
- async start(frame: StartFrame)[source]
Start the NVIDIA Nemotron Speech TTS service.
- Parameters:
frame – The start frame containing initialization parameters.
- async stop(frame: EndFrame)[source]
Stop the NVIDIA Nemotron Speech TTS service.
- Parameters:
frame – The end frame.
- async cancel(frame: CancelFrame)[source]
Cancel the NVIDIA Nemotron Speech TTS service.
- Parameters:
frame – The cancel frame.
- async flush_audio(context_id: str | None = None)[source]
Flush any pending audio and finalize the current context.
- Parameters:
context_id – The specific context to flush. If None, falls back to the currently active context.
- async on_audio_context_interrupted(context_id: str)[source]
Cancel the active gRPC synthesis stream when the bot is interrupted.
- Parameters:
context_id – The ID of the audio context that was interrupted.
- async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]
Generate speech from text using NVIDIA Nemotron Speech TTS.
On the first call for a turn, starts a persistent
synthesize_onlinegRPC stream. Subsequent calls within the same turn feed text into the existing stream, enabling Magpie’s cross-sentence stitching.Text is split into chunks respecting Magpie’s per-request limits. Each chunk becomes a separate request in the gRPC stream, stitched seamlessly by Magpie.
- Parameters:
text – The text to synthesize into speech.
context_id – The context ID for tracking audio frames.
- Yields:
- None on success. Audio is delivered asynchronously via the
response consumer. ErrorFrame on failure.