tts

Rime text-to-speech service implementations.

This module provides both WebSocket and HTTP-based text-to-speech services using Rime’s API for streaming and batch audio synthesis.

pipecat.services.rime.tts.language_to_rime_language(language: Language) str[source]

Convert pipecat Language to Rime language code.

Parameters:

language – The pipecat Language enum value.

Returns:

Three-letter language code used by Rime (e.g., ‘eng’ for English).

class pipecat.services.rime.tts.RimeTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>, segment: str | None | _NotGiven = <factory>, speedAlpha: float | None | _NotGiven = <factory>, reduceLatency: bool | None | _NotGiven = <factory>, pauseBetweenBrackets: bool | None | _NotGiven = <factory>, phonemizeBetweenBrackets: bool | None | _NotGiven = <factory>, noTextNormalization: bool | None | _NotGiven = <factory>, saveOovs: bool | None | _NotGiven = <factory>, inlineSpeedAlpha: str | None | _NotGiven = <factory>, repetition_penalty: float | None | _NotGiven = <factory>, temperature: float | None | _NotGiven = <factory>, top_p: float | None | _NotGiven = <factory>)[source]

Bases: TTSSettings

Settings for RimeTTSService and RimeHttpTTSService.

Parameters:
  • segment – Text segmentation mode (“immediate”, “bySentence”, “never”).

  • speedAlpha – Speech speed multiplier (mistv2 only).

  • reduceLatency – Whether to reduce latency at potential quality cost (mistv2 only).

  • pauseBetweenBrackets – Whether to add pauses between bracketed content (mistv2 only).

  • phonemizeBetweenBrackets – Whether to phonemize bracketed content (mistv2 only).

  • noTextNormalization – Whether to disable text normalization (mistv2 only).

  • saveOovs – Whether to save out-of-vocabulary words (mistv2 only).

  • inlineSpeedAlpha – Inline speed control markup.

  • repetition_penalty – Token repetition penalty (arcana only, 1.0-2.0).

  • temperature – Sampling temperature (arcana only, 0.0-1.0).

  • top_p – Cumulative probability threshold (arcana only, 0.0-1.0).

segment: str | None | _NotGiven
speedAlpha: float | None | _NotGiven
reduceLatency: bool | None | _NotGiven
pauseBetweenBrackets: bool | None | _NotGiven
phonemizeBetweenBrackets: bool | None | _NotGiven
noTextNormalization: bool | None | _NotGiven
saveOovs: bool | None | _NotGiven
inlineSpeedAlpha: str | None | _NotGiven
repetition_penalty: float | None | _NotGiven
temperature: float | None | _NotGiven
top_p: float | None | _NotGiven
class pipecat.services.rime.tts.RimeNonJsonTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>, segment: str | None | _NotGiven = <factory>, repetition_penalty: float | None | _NotGiven = <factory>, temperature: float | None | _NotGiven = <factory>, top_p: float | None | _NotGiven = <factory>)[source]

Bases: TTSSettings

Settings for RimeNonJsonTTSService.

Parameters:
  • segment – Text segmentation mode (“immediate”, “bySentence”, “never”).

  • repetition_penalty – Token repetition penalty (1.0-2.0).

  • temperature – Sampling temperature (0.0-1.0).

  • top_p – Cumulative probability threshold (0.0-1.0).

segment: str | None | _NotGiven
repetition_penalty: float | None | _NotGiven
temperature: float | None | _NotGiven
top_p: float | None | _NotGiven
class pipecat.services.rime.tts.RimeTTSService(*, api_key: str, voice_id: str | None = None, url: str = 'wss://users-ws.rime.ai/ws3', model: str | None = None, sample_rate: int | None = None, params: InputParams | None = None, settings: RimeTTSSettings | None = None, text_aggregation_mode: TextAggregationMode | None = None, aggregate_sentences: bool | None = None, **kwargs)[source]

Bases: WebsocketTTSService

Text-to-Speech service using Rime’s websocket API.

Uses Rime’s websocket JSON API to convert text to speech with word-level timing information. Supports interruptions and maintains context across multiple messages within a turn.

Settings

alias of RimeTTSSettings

class InputParams(*, language: Language | None = Language.EN, segment: str | None = None, speed_alpha: float | None = None, repetition_penalty: float | None = None, temperature: float | None = None, top_p: float | None = None, reduce_latency: bool | None = None, pause_between_brackets: bool | None = None, phonemize_between_brackets: bool | None = None, no_text_normalization: bool | None = None, save_oovs: bool | None = None)[source]

Bases: BaseModel

Configuration parameters for Rime TTS service.

Deprecated since version 0.0.105: Use settings=RimeTTSService.Settings(...) instead.

Parameters:
  • language – Language for synthesis. Defaults to English.

  • segment – Text segmentation mode (“immediate”, “bySentence”, “never”).

  • speed_alpha – Speech speed multiplier.

  • repetition_penalty – Token repetition penalty (arcana only).

  • temperature – Sampling temperature (arcana only).

  • top_p – Cumulative probability threshold (arcana only).

  • reduce_latency – Whether to reduce latency at potential quality cost (mistv2 only).

  • pause_between_brackets – Whether to add pauses between bracketed content (mistv2 only).

  • phonemize_between_brackets – Whether to phonemize bracketed content (mistv2 only).

  • no_text_normalization – Whether to disable text normalization (mistv2 only).

  • save_oovs – Whether to save out-of-vocabulary words (mistv2 only).

language: Language | None
segment: str | None
speed_alpha: float | None
repetition_penalty: float | None
temperature: float | None
top_p: float | None
reduce_latency: bool | None
pause_between_brackets: bool | None
phonemize_between_brackets: bool | None
no_text_normalization: bool | None
save_oovs: bool | None
__init__(*, api_key: str, voice_id: str | None = None, url: str = 'wss://users-ws.rime.ai/ws3', model: str | None = None, sample_rate: int | None = None, params: InputParams | None = None, settings: RimeTTSSettings | None = None, text_aggregation_mode: TextAggregationMode | None = None, aggregate_sentences: bool | None = None, **kwargs)[source]

Initialize Rime TTS service.

Parameters:
  • api_key – Rime API key for authentication.

  • voice_id

    ID of the voice to use.

    Deprecated since version 0.0.105: Use settings=RimeTTSService.Settings(voice=...) instead.

  • url – Rime websocket API endpoint.

  • model

    Model ID to use for synthesis.

    Deprecated since version 0.0.105: Use settings=RimeTTSService.Settings(model=...) instead.

  • sample_rate – Audio sample rate in Hz.

  • params

    Additional configuration parameters.

    Deprecated since version 0.0.105: Use settings=RimeTTSService.Settings(...) instead.

  • settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.

  • text_aggregation_mode – How to aggregate incoming text before synthesis.

  • aggregate_sentences

    Deprecated. Use text_aggregation_mode instead.

    Deprecated since version 0.0.104: Use text_aggregation_mode instead.

  • **kwargs – Additional arguments passed to parent class.

can_generate_metrics() bool[source]

Check if this service can generate processing metrics.

Returns:

True, as Rime service supports metrics generation.

language_to_service_language(language: Language) str | None[source]

Convert pipecat language to Rime language code.

Parameters:

language – The language to convert.

Returns:

The Rime-specific language code, or None if not supported.

SPELL() str[source]

Wrap text in Rime spell function.

PAUSE_TAG() str[source]

Convenience method to create a pause tag.

PRONOUNCE(text: str, word: str, phoneme: str) str[source]

Convenience method to support Rime’s custom pronunciations feature.

https://docs.rime.ai/api-reference/custom-pronunciation

INLINE_SPEED(text: str, speed: float) str[source]

Convenience method to support inline speeds.

async start(frame: StartFrame)[source]

Start the service and establish websocket connection.

Parameters:

frame – The start frame containing initialization parameters.

async stop(frame: EndFrame)[source]

Stop the service and close connection.

Parameters:

frame – The end frame.

async cancel(frame: CancelFrame)[source]

Cancel current operation and clean up.

Parameters:

frame – The cancel frame.

async on_audio_context_interrupted(context_id: str)[source]

Clear the Rime speech queue and stop metrics when the bot is interrupted.

async on_audio_context_completed(context_id: str)[source]

Clear server-side state and stop metrics after the Rime context finishes playing.

Sends a clear message to clean up any residual server-side state once all audio has been delivered.

async flush_audio(context_id: str | None = None)[source]

Flush any pending audio synthesis.

async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]

Generate speech from text using Rime’s streaming API.

Parameters:
  • text – The text to convert to speech.

  • context_id – Unique identifier for this TTS context.

Yields:

Frame – Audio frames containing the synthesized speech.

class pipecat.services.rime.tts.RimeHttpTTSService(*, api_key: str, voice_id: str | None = None, aiohttp_session: ClientSession, model: str | None = None, sample_rate: int | None = None, params: InputParams | None = None, settings: RimeTTSSettings | None = None, **kwargs)[source]

Bases: TTSService

Rime HTTP-based text-to-speech service.

Provides text-to-speech synthesis using Rime’s HTTP API for batch processing. Suitable for use cases where streaming is not required.

Settings

alias of RimeTTSSettings

class InputParams(*, language: Language | None = Language.EN, pause_between_brackets: bool | None = False, phonemize_between_brackets: bool | None = False, inline_speed_alpha: str | None = None, speed_alpha: float | None = 1.0, reduce_latency: bool | None = False)[source]

Bases: BaseModel

Configuration parameters for Rime HTTP TTS service.

Deprecated since version 0.0.105: Use settings=RimeHttpTTSService.Settings(...) instead.

Parameters:
  • language – Language for synthesis. Defaults to English.

  • pause_between_brackets – Whether to add pauses between bracketed content.

  • phonemize_between_brackets – Whether to phonemize bracketed content.

  • inline_speed_alpha – Inline speed control markup.

  • speed_alpha – Speech speed multiplier. Defaults to 1.0.

  • reduce_latency – Whether to reduce latency at potential quality cost.

language: Language | None
pause_between_brackets: bool | None
phonemize_between_brackets: bool | None
inline_speed_alpha: str | None
speed_alpha: float | None
reduce_latency: bool | None
__init__(*, api_key: str, voice_id: str | None = None, aiohttp_session: ClientSession, model: str | None = None, sample_rate: int | None = None, params: InputParams | None = None, settings: RimeTTSSettings | None = None, **kwargs)[source]

Initialize Rime HTTP TTS service.

Parameters:
  • api_key – Rime API key for authentication.

  • voice_id

    ID of the voice to use.

    Deprecated since version 0.0.105: Use settings=RimeHttpTTSService.Settings(voice=...) instead.

  • aiohttp_session – Shared aiohttp session for HTTP requests.

  • model

    Model ID to use for synthesis.

    Deprecated since version 0.0.105: Use settings=RimeHttpTTSService.Settings(model=...) instead.

  • sample_rate – Audio sample rate in Hz.

  • params

    Additional configuration parameters.

    Deprecated since version 0.0.105: Use settings=RimeHttpTTSService.Settings(...) instead.

  • settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.

  • **kwargs – Additional arguments passed to parent TTSService.

can_generate_metrics() bool[source]

Check if this service can generate processing metrics.

Returns:

True, as Rime HTTP service supports metrics generation.

language_to_service_language(language: Language) str | None[source]

Convert pipecat language to Rime language code.

Parameters:

language – The language to convert.

Returns:

The Rime-specific language code, or None if not supported.

async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]

Generate speech from text using Rime’s HTTP API.

Parameters:
  • text – The text to synthesize into speech.

  • context_id – The context ID for tracking audio frames.

Yields:

Frame – Audio frames containing the synthesized speech.

class pipecat.services.rime.tts.RimeNonJsonTTSService(*, api_key: str, voice_id: str | None = None, url: str = 'wss://users.rime.ai/ws', model: str | None = None, audio_format: str = 'pcm', sample_rate: int | None = None, params: InputParams | None = None, settings: RimeNonJsonTTSSettings | None = None, aggregate_sentences: bool | None = None, text_aggregation_mode: TextAggregationMode | None = None, **kwargs)[source]

Bases: InterruptibleTTSService

Pipecat TTS service for Rime’s non-JSON WebSocket API.

Deprecated since version 0.0.102: Arcana now supports JSON WebSocket with word-level timestamps via the wss://users-ws.rime.ai/ws3 endpoint. Use RimeTTSService with model="arcana" instead.

This service enables Text-to-Speech synthesis over WebSocket endpoints that require plain text (not JSON) messages and return raw audio bytes.

Limitations:
  • Does not support word-level timestamps or context IDs.

  • Intended specifically for integrations where the TTS provider only accepts and returns non-JSON messages.

Settings

alias of RimeNonJsonTTSSettings

class InputParams(*, language: Language | None = None, segment: str | None = None, repetition_penalty: float | None = None, temperature: float | None = None, top_p: float | None = None, extra: dict[str, Any] | None = None)[source]

Bases: BaseModel

Configuration parameters for Rime Non-JSON WebSocket TTS service.

Deprecated since version 0.0.105: Use settings=RimeNonJsonTTSService.Settings(...) instead.

Parameters:
  • language – Language for synthesis. Defaults to English.

  • segment – Text segmentation mode (“immediate”, “bySentence”, “never”).

  • repetition_penalty – Token repetition penalty (1.0-2.0).

  • temperature – Sampling temperature (0.0-1.0).

  • top_p – Cumulative probability threshold (0.0-1.0).

  • extra – Additional parameters to pass to the API (for future compatibility).

language: Language | None
segment: str | None
repetition_penalty: float | None
temperature: float | None
top_p: float | None
extra: dict[str, Any] | None
__init__(*, api_key: str, voice_id: str | None = None, url: str = 'wss://users.rime.ai/ws', model: str | None = None, audio_format: str = 'pcm', sample_rate: int | None = None, params: InputParams | None = None, settings: RimeNonJsonTTSSettings | None = None, aggregate_sentences: bool | None = None, text_aggregation_mode: TextAggregationMode | None = None, **kwargs)[source]

Initialize Rime Non-JSON WebSocket TTS service.

Parameters:
  • api_key – Rime API key for authentication.

  • voice_id

    ID of the voice to use.

    Deprecated since version 0.0.105: Use settings=RimeNonJsonTTSService.Settings(voice=...) instead.

  • url – Rime websocket API endpoint.

  • model

    Model ID to use for synthesis.

    Deprecated since version 0.0.105: Use settings=RimeNonJsonTTSService.Settings(model=...) instead.

  • audio_format – Audio format to use.

  • sample_rate – Audio sample rate in Hz.

  • params

    Additional configuration parameters.

    Deprecated since version 0.0.105: Use settings=RimeNonJsonTTSService.Settings(...) instead.

  • settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.

  • aggregate_sentences

    Deprecated. Use text_aggregation_mode instead.

    Deprecated since version 0.0.104: Use text_aggregation_mode instead. Set to TextAggregationMode.SENTENCE to aggregate text into sentences before synthesis, or TextAggregationMode.TOKEN to stream tokens directly for lower latency.

  • text_aggregation_mode – How to aggregate text before synthesis.

  • **kwargs – Additional arguments passed to parent class.

can_generate_metrics() bool[source]

Check if this service can generate processing metrics.

Returns:

True, as Rime Non-JSON WebSocket service supports metrics generation.

language_to_service_language(language: Language) str[source]

Convert pipecat Language enum to Rime language code.

Parameters:

language – The Language enum value to convert.

Returns:

Three-letter Rime language code (e.g., ‘eng’ for English). Falls back to the language’s base code with a warning if not in the verified list.

async start(frame: StartFrame)[source]

Start the Rime Non-JSON WebSocket TTS service.

Parameters:

frame – The start frame containing initialization parameters.

async stop(frame: EndFrame)[source]

Stop the service and close connection.

async cancel(frame: CancelFrame)[source]

Cancel current operation and clean up.

async flush_audio(context_id: str | None = None)[source]

Flush any pending audio synthesis.

async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]

Generate speech from text using Rime’s streaming API.

Parameters:
  • text – The text to synthesize into speech.

  • context_id – The context ID for tracking audio frames.

Yields:

Frame – Audio frames containing the synthesized speech.