tts

Async text-to-speech service implementations.

pipecat.services.asyncai.tts.language_to_async_language(language: Language) str | None[source]

Convert a Language enum to Async language code.

Parameters:

language – The Language enum value to convert.

Returns:

The corresponding Async language code, or None if not supported.

class pipecat.services.asyncai.tts.AsyncAITTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>)[source]

Bases: TTSSettings

Settings for AsyncAITTSService and AsyncAIHttpTTSService.

class pipecat.services.asyncai.tts.AsyncAITTSService(*, api_key: str, voice_id: str | None = None, version: str = 'v1', url: str = 'wss://api.async.com/text_to_speech/websocket/ws', model: str | None = None, sample_rate: int | None = None, encoding: str = 'pcm_s16le', container: str = 'raw', params: InputParams | None = None, settings: AsyncAITTSSettings | None = None, aggregate_sentences: bool | None = None, text_aggregation_mode: TextAggregationMode | None = None, **kwargs)[source]

Bases: WebsocketTTSService

Async TTS service with WebSocket streaming.

Provides text-to-speech using Async’s streaming WebSocket API.

Settings

alias of AsyncAITTSSettings

class InputParams(*, language: Language | None = None)[source]

Bases: BaseModel

Input parameters for Async TTS configuration.

Deprecated since version 0.0.105: Use AsyncAITTSService.Settings directly via the settings parameter instead.

Parameters:

language – Language to use for synthesis.

language: Language | None
__init__(*, api_key: str, voice_id: str | None = None, version: str = 'v1', url: str = 'wss://api.async.com/text_to_speech/websocket/ws', model: str | None = None, sample_rate: int | None = None, encoding: str = 'pcm_s16le', container: str = 'raw', params: InputParams | None = None, settings: AsyncAITTSSettings | None = None, aggregate_sentences: bool | None = None, text_aggregation_mode: TextAggregationMode | None = None, **kwargs)[source]

Initialize the Async TTS service.

Parameters:
  • api_key – Async API key.

  • voice_id

    UUID of the voice to use for synthesis. See docs for a full list: https://docs.async.com/list-voices-16699698e0

    Deprecated since version 0.0.105: Use settings=AsyncAITTSService.Settings(voice=...) instead.

  • version – Async API version.

  • url – WebSocket URL for Async TTS API.

  • model

    TTS model to use (e.g., “async_flash_v1.0”).

    Deprecated since version 0.0.105: Use settings=AsyncAITTSService.Settings(model=...) instead.

  • sample_rate – Audio sample rate.

  • encoding – Audio encoding format.

  • container – Audio container format.

  • params

    Additional input parameters for voice customization.

    Deprecated since version 0.0.105: Use settings=AsyncAITTSService.Settings(...) instead.

  • settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.

  • aggregate_sentences

    Deprecated. Use text_aggregation_mode instead.

    Deprecated since version 0.0.104: Use text_aggregation_mode instead.

  • text_aggregation_mode – How to aggregate text before synthesis.

  • **kwargs – Additional arguments passed to the parent service.

can_generate_metrics() bool[source]

Check if this service can generate processing metrics.

Returns:

True, as Async service supports metrics generation.

language_to_service_language(language: Language) str | None[source]

Convert a Language enum to Async language format.

Parameters:

language – The language to convert.

Returns:

The Async-specific language code, or None if not supported.

async start(frame: StartFrame)[source]

Start the Async TTS service.

Parameters:

frame – The start frame containing initialization parameters.

async stop(frame: EndFrame)[source]

Stop the Async TTS service.

Parameters:

frame – The end frame.

async cancel(frame: CancelFrame)[source]

Cancel the Async TTS service.

Parameters:

frame – The cancel frame.

async flush_audio(context_id: str | None = None)[source]

Flush any pending audio.

Parameters:

context_id – The specific context to flush. If None, falls back to the currently active context.

async push_frame(frame: Frame, direction: FrameDirection = FrameDirection.DOWNSTREAM)[source]

Push a frame downstream with special handling for stop conditions.

Parameters:
  • frame – The frame to push.

  • direction – The direction to push the frame.

async on_audio_context_interrupted(context_id: str)[source]

Close the Async AI context when the bot is interrupted.

async on_audio_context_completed(context_id: str)[source]

Close the Async AI context after all audio has been played.

Async AI does not send a server-side signal when a context is exhausted, so Pipecat must explicitly close it with close_context: True to free server-side resources.

async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]

Generate speech from text using Async API websocket endpoint.

Parameters:
  • text – The text to synthesize into speech.

  • context_id – The context ID for tracking audio frames.

Yields:

Frame – Audio frames containing the synthesized speech.

class pipecat.services.asyncai.tts.AsyncAIHttpTTSService(*, api_key: str, voice_id: str | None = None, aiohttp_session: ClientSession, model: str | None = None, url: str = 'https://api.async.com', version: str = 'v1', sample_rate: int | None = None, encoding: str = 'pcm_s16le', container: str = 'raw', params: InputParams | None = None, settings: AsyncAITTSSettings | None = None, **kwargs)[source]

Bases: TTSService

HTTP-based Async TTS service.

Provides text-to-speech using Async’s HTTP streaming API for simpler, non-WebSocket integration. Suitable for use cases where streaming WebSocket connection is not required or desired.

Settings

alias of AsyncAITTSSettings

class InputParams(*, language: Language | None = None)[source]

Bases: BaseModel

Input parameters for Async API.

Deprecated since version 0.0.105: Use AsyncAIHttpTTSService.Settings directly via the settings parameter instead.

Parameters:

language – Language to use for synthesis.

language: Language | None
__init__(*, api_key: str, voice_id: str | None = None, aiohttp_session: ClientSession, model: str | None = None, url: str = 'https://api.async.com', version: str = 'v1', sample_rate: int | None = None, encoding: str = 'pcm_s16le', container: str = 'raw', params: InputParams | None = None, settings: AsyncAITTSSettings | None = None, **kwargs)[source]

Initialize the Async TTS service.

Parameters:
  • api_key – Async API key.

  • voice_id

    ID of the voice to use for synthesis.

    Deprecated since version 0.0.105: Use settings=AsyncAIHttpTTSService.Settings(voice=...) instead.

  • aiohttp_session – An aiohttp session for making HTTP requests.

  • model

    TTS model to use (e.g., “async_flash_v1.0”).

    Deprecated since version 0.0.105: Use settings=AsyncAIHttpTTSService.Settings(model=...) instead.

  • url – Base URL for Async API.

  • version – API version string for Async API.

  • sample_rate – Audio sample rate.

  • encoding – Audio encoding format.

  • container – Audio container format.

  • params

    Additional input parameters for voice customization.

    Deprecated since version 0.0.105: Use settings=AsyncAIHttpTTSService.Settings(...) instead.

  • settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.

  • **kwargs – Additional arguments passed to the parent TTSService.

can_generate_metrics() bool[source]

Check if this service can generate processing metrics.

Returns:

True, as Async HTTP service supports metrics generation.

language_to_service_language(language: Language) str | None[source]

Convert a Language enum to Async language format.

Parameters:

language – The language to convert.

Returns:

The Async-specific language code, or None if not supported.

async start(frame: StartFrame)[source]

Start the Async HTTP TTS service.

Parameters:

frame – The start frame containing initialization parameters.

async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]

Generate speech from text using Async’s HTTP streaming API.

Parameters:
  • text – The text to synthesize into speech.

  • context_id – The context ID for tracking audio frames.

Yields:

Frame – Audio frames containing the synthesized speech.