tts

Fish Audio text-to-speech service implementation.

This module provides integration with Fish Audio’s real-time TTS WebSocket API for streaming text-to-speech synthesis with customizable voice parameters.

class pipecat.services.fish.tts.FishAudioTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>, latency: str | None | _NotGiven = <factory>, normalize: bool | None | _NotGiven = <factory>, temperature: float | None | _NotGiven = <factory>, top_p: float | None | _NotGiven = <factory>, prosody_speed: float | None | _NotGiven = <factory>, prosody_volume: int | None | _NotGiven = <factory>)[source]

Bases: TTSSettings

Settings for FishAudioTTSService.

Parameters:
  • latency – Latency mode (“normal” or “balanced”). Defaults to “balanced”.

  • normalize – Whether to normalize audio output. Defaults to True.

  • temperature – Controls randomness in speech generation (0.0-1.0).

  • top_p – Controls diversity via nucleus sampling (0.0-1.0).

  • prosody_speed – Speech speed multiplier (0.5-2.0). Defaults to 1.0.

  • prosody_volume – Volume adjustment in dB (-20 to 20). Defaults to 0.

latency: str | None | _NotGiven
normalize: bool | None | _NotGiven
temperature: float | None | _NotGiven
top_p: float | None | _NotGiven
prosody_speed: float | None | _NotGiven
prosody_volume: int | None | _NotGiven
classmethod from_mapping(settings: Mapping[str, Any]) Self[source]

Construct settings from a plain dict, destructuring legacy nested prosody.

class pipecat.services.fish.tts.FishAudioTTSService(*, api_key: str, reference_id: str | None = None, model_id: str | None = None, output_format: Literal['opus', 'mp3', 'pcm', 'wav'] = 'pcm', sample_rate: int | None = None, params: InputParams | None = None, settings: FishAudioTTSSettings | None = None, **kwargs)[source]

Bases: InterruptibleTTSService

Fish Audio text-to-speech service with WebSocket streaming.

Provides real-time text-to-speech synthesis using Fish Audio’s WebSocket API. Supports various audio formats, customizable prosody controls, and streaming audio generation with interruption handling.

Settings

alias of FishAudioTTSSettings

class InputParams(*, language: Language | None = Language.EN, latency: str | None = 'normal', normalize: bool | None = True, prosody_speed: float | None = 1.0, prosody_volume: int | None = 0)[source]

Bases: BaseModel

Input parameters for Fish Audio TTS configuration.

Deprecated since version 0.0.105: Use settings=FishAudioTTSService.Settings(...) instead.

Parameters:
  • language – Language for synthesis. Defaults to English.

  • latency – Latency mode (“normal” or “balanced”). Defaults to “normal”.

  • normalize – Whether to normalize audio output. Defaults to True.

  • prosody_speed – Speech speed multiplier (0.5-2.0). Defaults to 1.0.

  • prosody_volume – Volume adjustment in dB. Defaults to 0.

language: Language | None
latency: str | None
normalize: bool | None
prosody_speed: float | None
prosody_volume: int | None
__init__(*, api_key: str, reference_id: str | None = None, model_id: str | None = None, output_format: Literal['opus', 'mp3', 'pcm', 'wav'] = 'pcm', sample_rate: int | None = None, params: InputParams | None = None, settings: FishAudioTTSSettings | None = None, **kwargs)[source]

Initialize the Fish Audio TTS service.

Parameters:
  • api_key – Fish Audio API key for authentication.

  • reference_id

    Reference ID of the voice model to use for synthesis.

    Deprecated since version 0.0.105: Use settings=FishAudioTTSService.Settings(voice=...) instead.

  • model_id

    Specify which Fish Audio TTS model to use (e.g. “s1”).

    Deprecated since version 0.0.105: Use settings=FishAudioTTSService.Settings(model=...) instead.

  • output_format – Audio output format. Defaults to “pcm”.

  • sample_rate – Audio sample rate. If None, uses default.

  • params

    Additional input parameters for voice customization.

    Deprecated since version 0.0.105: Use settings=FishAudioTTSService.Settings(...) instead.

  • settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.

  • **kwargs – Additional arguments passed to the parent service.

can_generate_metrics() bool[source]

Check if this service can generate processing metrics.

Returns:

True, as Fish Audio service supports metrics generation.

async start(frame: StartFrame)[source]

Start the Fish Audio TTS service.

Parameters:

frame – The start frame containing initialization parameters.

async stop(frame: EndFrame)[source]

Stop the Fish Audio TTS service.

Parameters:

frame – The end frame.

async cancel(frame: CancelFrame)[source]

Cancel the Fish Audio TTS service.

Parameters:

frame – The cancel frame.

async flush_audio(context_id: str | None = None)[source]

Flush any buffered audio by sending a flush event to Fish Audio.

async on_audio_context_interrupted(context_id: str)[source]

Stop all metrics when audio context is interrupted.

async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]

Generate speech from text using Fish Audio’s streaming API.

Parameters:
  • text – The text to synthesize into speech.

  • context_id – The context ID for tracking audio frames.

Yields:

Frame – Audio frames and control frames for the synthesized speech.