tts
Fish Audio text-to-speech service implementation.
This module provides integration with Fish Audio’s real-time TTS WebSocket API for streaming text-to-speech synthesis with customizable voice parameters.
- class pipecat.services.fish.tts.FishAudioTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>, latency: str | None | _NotGiven = <factory>, normalize: bool | None | _NotGiven = <factory>, temperature: float | None | _NotGiven = <factory>, top_p: float | None | _NotGiven = <factory>, prosody_speed: float | None | _NotGiven = <factory>, prosody_volume: int | None | _NotGiven = <factory>)[source]
Bases:
TTSSettingsSettings for FishAudioTTSService.
- Parameters:
latency – Latency mode (“normal” or “balanced”). Defaults to “balanced”.
normalize – Whether to normalize audio output. Defaults to True.
temperature – Controls randomness in speech generation (0.0-1.0).
top_p – Controls diversity via nucleus sampling (0.0-1.0).
prosody_speed – Speech speed multiplier (0.5-2.0). Defaults to 1.0.
prosody_volume – Volume adjustment in dB (-20 to 20). Defaults to 0.
- latency: str | None | _NotGiven
- normalize: bool | None | _NotGiven
- temperature: float | None | _NotGiven
- top_p: float | None | _NotGiven
- prosody_speed: float | None | _NotGiven
- prosody_volume: int | None | _NotGiven
- class pipecat.services.fish.tts.FishAudioTTSService(*, api_key: str, reference_id: str | None = None, model_id: str | None = None, output_format: Literal['opus', 'mp3', 'pcm', 'wav'] = 'pcm', sample_rate: int | None = None, params: InputParams | None = None, settings: FishAudioTTSSettings | None = None, **kwargs)[source]
Bases:
InterruptibleTTSServiceFish Audio text-to-speech service with WebSocket streaming.
Provides real-time text-to-speech synthesis using Fish Audio’s WebSocket API. Supports various audio formats, customizable prosody controls, and streaming audio generation with interruption handling.
- Settings
alias of
FishAudioTTSSettings
- class InputParams(*, language: Language | None = Language.EN, latency: str | None = 'normal', normalize: bool | None = True, prosody_speed: float | None = 1.0, prosody_volume: int | None = 0)[source]
Bases:
BaseModelInput parameters for Fish Audio TTS configuration.
Deprecated since version 0.0.105: Use
settings=FishAudioTTSService.Settings(...)instead.- Parameters:
language – Language for synthesis. Defaults to English.
latency – Latency mode (“normal” or “balanced”). Defaults to “normal”.
normalize – Whether to normalize audio output. Defaults to True.
prosody_speed – Speech speed multiplier (0.5-2.0). Defaults to 1.0.
prosody_volume – Volume adjustment in dB. Defaults to 0.
- latency: str | None
- normalize: bool | None
- prosody_speed: float | None
- prosody_volume: int | None
- __init__(*, api_key: str, reference_id: str | None = None, model_id: str | None = None, output_format: Literal['opus', 'mp3', 'pcm', 'wav'] = 'pcm', sample_rate: int | None = None, params: InputParams | None = None, settings: FishAudioTTSSettings | None = None, **kwargs)[source]
Initialize the Fish Audio TTS service.
- Parameters:
api_key – Fish Audio API key for authentication.
reference_id –
Reference ID of the voice model to use for synthesis.
Deprecated since version 0.0.105: Use
settings=FishAudioTTSService.Settings(voice=...)instead.model_id –
Specify which Fish Audio TTS model to use (e.g. “s1”).
Deprecated since version 0.0.105: Use
settings=FishAudioTTSService.Settings(model=...)instead.output_format – Audio output format. Defaults to “pcm”.
sample_rate – Audio sample rate. If None, uses default.
params –
Additional input parameters for voice customization.
Deprecated since version 0.0.105: Use
settings=FishAudioTTSService.Settings(...)instead.settings – Runtime-updatable settings. When provided alongside deprecated parameters,
settingsvalues take precedence.**kwargs – Additional arguments passed to the parent service.
- can_generate_metrics() bool[source]
Check if this service can generate processing metrics.
- Returns:
True, as Fish Audio service supports metrics generation.
- async start(frame: StartFrame)[source]
Start the Fish Audio TTS service.
- Parameters:
frame – The start frame containing initialization parameters.
- async stop(frame: EndFrame)[source]
Stop the Fish Audio TTS service.
- Parameters:
frame – The end frame.
- async cancel(frame: CancelFrame)[source]
Cancel the Fish Audio TTS service.
- Parameters:
frame – The cancel frame.
- async flush_audio(context_id: str | None = None)[source]
Flush any buffered audio by sending a flush event to Fish Audio.
- async on_audio_context_interrupted(context_id: str)[source]
Stop all metrics when audio context is interrupted.
- async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]
Generate speech from text using Fish Audio’s streaming API.
- Parameters:
text – The text to synthesize into speech.
context_id – The context ID for tracking audio frames.
- Yields:
Frame – Audio frames and control frames for the synthesized speech.