tts
Rime text-to-speech service implementations.
This module provides both WebSocket and HTTP-based text-to-speech services using Rime’s API for streaming and batch audio synthesis.
- pipecat.services.rime.tts.language_to_rime_language(language: Language) str[source]
Convert pipecat Language to Rime language code.
- Parameters:
language – The pipecat Language enum value.
- Returns:
Three-letter language code used by Rime (e.g., ‘eng’ for English).
- class pipecat.services.rime.tts.RimeTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>, segment: str | None | _NotGiven = <factory>, speedAlpha: float | None | _NotGiven = <factory>, reduceLatency: bool | None | _NotGiven = <factory>, pauseBetweenBrackets: bool | None | _NotGiven = <factory>, phonemizeBetweenBrackets: bool | None | _NotGiven = <factory>, noTextNormalization: bool | None | _NotGiven = <factory>, saveOovs: bool | None | _NotGiven = <factory>, inlineSpeedAlpha: str | None | _NotGiven = <factory>, repetition_penalty: float | None | _NotGiven = <factory>, temperature: float | None | _NotGiven = <factory>, top_p: float | None | _NotGiven = <factory>)[source]
Bases:
TTSSettingsSettings for RimeTTSService and RimeHttpTTSService.
- Parameters:
segment – Text segmentation mode (“immediate”, “bySentence”, “never”).
speedAlpha – Speech speed multiplier (mistv2 only).
reduceLatency – Whether to reduce latency at potential quality cost (mistv2 only).
pauseBetweenBrackets – Whether to add pauses between bracketed content (mistv2 only).
phonemizeBetweenBrackets – Whether to phonemize bracketed content (mistv2 only).
noTextNormalization – Whether to disable text normalization (mistv2 only).
saveOovs – Whether to save out-of-vocabulary words (mistv2 only).
inlineSpeedAlpha – Inline speed control markup.
repetition_penalty – Token repetition penalty (arcana only, 1.0-2.0).
temperature – Sampling temperature (arcana only, 0.0-1.0).
top_p – Cumulative probability threshold (arcana only, 0.0-1.0).
- segment: str | None | _NotGiven
- speedAlpha: float | None | _NotGiven
- reduceLatency: bool | None | _NotGiven
- pauseBetweenBrackets: bool | None | _NotGiven
- phonemizeBetweenBrackets: bool | None | _NotGiven
- noTextNormalization: bool | None | _NotGiven
- saveOovs: bool | None | _NotGiven
- inlineSpeedAlpha: str | None | _NotGiven
- repetition_penalty: float | None | _NotGiven
- temperature: float | None | _NotGiven
- top_p: float | None | _NotGiven
- class pipecat.services.rime.tts.RimeNonJsonTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>, segment: str | None | _NotGiven = <factory>, repetition_penalty: float | None | _NotGiven = <factory>, temperature: float | None | _NotGiven = <factory>, top_p: float | None | _NotGiven = <factory>)[source]
Bases:
TTSSettingsSettings for RimeNonJsonTTSService.
- Parameters:
segment – Text segmentation mode (“immediate”, “bySentence”, “never”).
repetition_penalty – Token repetition penalty (1.0-2.0).
temperature – Sampling temperature (0.0-1.0).
top_p – Cumulative probability threshold (0.0-1.0).
- segment: str | None | _NotGiven
- repetition_penalty: float | None | _NotGiven
- temperature: float | None | _NotGiven
- top_p: float | None | _NotGiven
- class pipecat.services.rime.tts.RimeTTSService(*, api_key: str, voice_id: str | None = None, url: str = 'wss://users-ws.rime.ai/ws3', model: str | None = None, sample_rate: int | None = None, params: InputParams | None = None, settings: RimeTTSSettings | None = None, text_aggregation_mode: TextAggregationMode | None = None, aggregate_sentences: bool | None = None, **kwargs)[source]
Bases:
WebsocketTTSServiceText-to-Speech service using Rime’s websocket API.
Uses Rime’s websocket JSON API to convert text to speech with word-level timing information. Supports interruptions and maintains context across multiple messages within a turn.
- Settings
alias of
RimeTTSSettings
- class InputParams(*, language: Language | None = Language.EN, segment: str | None = None, speed_alpha: float | None = None, repetition_penalty: float | None = None, temperature: float | None = None, top_p: float | None = None, reduce_latency: bool | None = None, pause_between_brackets: bool | None = None, phonemize_between_brackets: bool | None = None, no_text_normalization: bool | None = None, save_oovs: bool | None = None)[source]
Bases:
BaseModelConfiguration parameters for Rime TTS service.
Deprecated since version 0.0.105: Use
settings=RimeTTSService.Settings(...)instead.- Parameters:
language – Language for synthesis. Defaults to English.
segment – Text segmentation mode (“immediate”, “bySentence”, “never”).
speed_alpha – Speech speed multiplier.
repetition_penalty – Token repetition penalty (arcana only).
temperature – Sampling temperature (arcana only).
top_p – Cumulative probability threshold (arcana only).
reduce_latency – Whether to reduce latency at potential quality cost (mistv2 only).
pause_between_brackets – Whether to add pauses between bracketed content (mistv2 only).
phonemize_between_brackets – Whether to phonemize bracketed content (mistv2 only).
no_text_normalization – Whether to disable text normalization (mistv2 only).
save_oovs – Whether to save out-of-vocabulary words (mistv2 only).
- segment: str | None
- speed_alpha: float | None
- repetition_penalty: float | None
- temperature: float | None
- top_p: float | None
- reduce_latency: bool | None
- pause_between_brackets: bool | None
- phonemize_between_brackets: bool | None
- no_text_normalization: bool | None
- save_oovs: bool | None
- __init__(*, api_key: str, voice_id: str | None = None, url: str = 'wss://users-ws.rime.ai/ws3', model: str | None = None, sample_rate: int | None = None, params: InputParams | None = None, settings: RimeTTSSettings | None = None, text_aggregation_mode: TextAggregationMode | None = None, aggregate_sentences: bool | None = None, **kwargs)[source]
Initialize Rime TTS service.
- Parameters:
api_key – Rime API key for authentication.
voice_id –
ID of the voice to use.
Deprecated since version 0.0.105: Use
settings=RimeTTSService.Settings(voice=...)instead.url – Rime websocket API endpoint.
model –
Model ID to use for synthesis.
Deprecated since version 0.0.105: Use
settings=RimeTTSService.Settings(model=...)instead.sample_rate – Audio sample rate in Hz.
params –
Additional configuration parameters.
Deprecated since version 0.0.105: Use
settings=RimeTTSService.Settings(...)instead.settings – Runtime-updatable settings. When provided alongside deprecated parameters,
settingsvalues take precedence.text_aggregation_mode – How to aggregate incoming text before synthesis.
aggregate_sentences –
Deprecated. Use text_aggregation_mode instead.
Deprecated since version 0.0.104: Use
text_aggregation_modeinstead.**kwargs – Additional arguments passed to parent class.
- can_generate_metrics() bool[source]
Check if this service can generate processing metrics.
- Returns:
True, as Rime service supports metrics generation.
- language_to_service_language(language: Language) str | None[source]
Convert pipecat language to Rime language code.
- Parameters:
language – The language to convert.
- Returns:
The Rime-specific language code, or None if not supported.
- PRONOUNCE(text: str, word: str, phoneme: str) str[source]
Convenience method to support Rime’s custom pronunciations feature.
- async start(frame: StartFrame)[source]
Start the service and establish websocket connection.
- Parameters:
frame – The start frame containing initialization parameters.
- async stop(frame: EndFrame)[source]
Stop the service and close connection.
- Parameters:
frame – The end frame.
- async cancel(frame: CancelFrame)[source]
Cancel current operation and clean up.
- Parameters:
frame – The cancel frame.
- async on_audio_context_interrupted(context_id: str)[source]
Clear the Rime speech queue and stop metrics when the bot is interrupted.
- async on_audio_context_completed(context_id: str)[source]
Clear server-side state and stop metrics after the Rime context finishes playing.
Sends a
clearmessage to clean up any residual server-side state once all audio has been delivered.
- async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]
Generate speech from text using Rime’s streaming API.
- Parameters:
text – The text to convert to speech.
context_id – Unique identifier for this TTS context.
- Yields:
Frame – Audio frames containing the synthesized speech.
- class pipecat.services.rime.tts.RimeHttpTTSService(*, api_key: str, voice_id: str | None = None, aiohttp_session: ClientSession, model: str | None = None, sample_rate: int | None = None, params: InputParams | None = None, settings: RimeTTSSettings | None = None, **kwargs)[source]
Bases:
TTSServiceRime HTTP-based text-to-speech service.
Provides text-to-speech synthesis using Rime’s HTTP API for batch processing. Suitable for use cases where streaming is not required.
- Settings
alias of
RimeTTSSettings
- class InputParams(*, language: Language | None = Language.EN, pause_between_brackets: bool | None = False, phonemize_between_brackets: bool | None = False, inline_speed_alpha: str | None = None, speed_alpha: float | None = 1.0, reduce_latency: bool | None = False)[source]
Bases:
BaseModelConfiguration parameters for Rime HTTP TTS service.
Deprecated since version 0.0.105: Use
settings=RimeHttpTTSService.Settings(...)instead.- Parameters:
language – Language for synthesis. Defaults to English.
pause_between_brackets – Whether to add pauses between bracketed content.
phonemize_between_brackets – Whether to phonemize bracketed content.
inline_speed_alpha – Inline speed control markup.
speed_alpha – Speech speed multiplier. Defaults to 1.0.
reduce_latency – Whether to reduce latency at potential quality cost.
- pause_between_brackets: bool | None
- phonemize_between_brackets: bool | None
- inline_speed_alpha: str | None
- speed_alpha: float | None
- reduce_latency: bool | None
- __init__(*, api_key: str, voice_id: str | None = None, aiohttp_session: ClientSession, model: str | None = None, sample_rate: int | None = None, params: InputParams | None = None, settings: RimeTTSSettings | None = None, **kwargs)[source]
Initialize Rime HTTP TTS service.
- Parameters:
api_key – Rime API key for authentication.
voice_id –
ID of the voice to use.
Deprecated since version 0.0.105: Use
settings=RimeHttpTTSService.Settings(voice=...)instead.aiohttp_session – Shared aiohttp session for HTTP requests.
model –
Model ID to use for synthesis.
Deprecated since version 0.0.105: Use
settings=RimeHttpTTSService.Settings(model=...)instead.sample_rate – Audio sample rate in Hz.
params –
Additional configuration parameters.
Deprecated since version 0.0.105: Use
settings=RimeHttpTTSService.Settings(...)instead.settings – Runtime-updatable settings. When provided alongside deprecated parameters,
settingsvalues take precedence.**kwargs – Additional arguments passed to parent TTSService.
- can_generate_metrics() bool[source]
Check if this service can generate processing metrics.
- Returns:
True, as Rime HTTP service supports metrics generation.
- language_to_service_language(language: Language) str | None[source]
Convert pipecat language to Rime language code.
- Parameters:
language – The language to convert.
- Returns:
The Rime-specific language code, or None if not supported.
- async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]
Generate speech from text using Rime’s HTTP API.
- Parameters:
text – The text to synthesize into speech.
context_id – The context ID for tracking audio frames.
- Yields:
Frame – Audio frames containing the synthesized speech.
- class pipecat.services.rime.tts.RimeNonJsonTTSService(*, api_key: str, voice_id: str | None = None, url: str = 'wss://users.rime.ai/ws', model: str | None = None, audio_format: str = 'pcm', sample_rate: int | None = None, params: InputParams | None = None, settings: RimeNonJsonTTSSettings | None = None, aggregate_sentences: bool | None = None, text_aggregation_mode: TextAggregationMode | None = None, **kwargs)[source]
Bases:
InterruptibleTTSServicePipecat TTS service for Rime’s non-JSON WebSocket API.
Deprecated since version 0.0.102: Arcana now supports JSON WebSocket with word-level timestamps via the
wss://users-ws.rime.ai/ws3endpoint. UseRimeTTSServicewithmodel="arcana"instead.This service enables Text-to-Speech synthesis over WebSocket endpoints that require plain text (not JSON) messages and return raw audio bytes.
- Limitations:
Does not support word-level timestamps or context IDs.
Intended specifically for integrations where the TTS provider only accepts and returns non-JSON messages.
- Settings
alias of
RimeNonJsonTTSSettings
- class InputParams(*, language: Language | None = None, segment: str | None = None, repetition_penalty: float | None = None, temperature: float | None = None, top_p: float | None = None, extra: dict[str, Any] | None = None)[source]
Bases:
BaseModelConfiguration parameters for Rime Non-JSON WebSocket TTS service.
Deprecated since version 0.0.105: Use
settings=RimeNonJsonTTSService.Settings(...)instead.- Parameters:
language – Language for synthesis. Defaults to English.
segment – Text segmentation mode (“immediate”, “bySentence”, “never”).
repetition_penalty – Token repetition penalty (1.0-2.0).
temperature – Sampling temperature (0.0-1.0).
top_p – Cumulative probability threshold (0.0-1.0).
extra – Additional parameters to pass to the API (for future compatibility).
- segment: str | None
- repetition_penalty: float | None
- temperature: float | None
- top_p: float | None
- extra: dict[str, Any] | None
- __init__(*, api_key: str, voice_id: str | None = None, url: str = 'wss://users.rime.ai/ws', model: str | None = None, audio_format: str = 'pcm', sample_rate: int | None = None, params: InputParams | None = None, settings: RimeNonJsonTTSSettings | None = None, aggregate_sentences: bool | None = None, text_aggregation_mode: TextAggregationMode | None = None, **kwargs)[source]
Initialize Rime Non-JSON WebSocket TTS service.
- Parameters:
api_key – Rime API key for authentication.
voice_id –
ID of the voice to use.
Deprecated since version 0.0.105: Use
settings=RimeNonJsonTTSService.Settings(voice=...)instead.url – Rime websocket API endpoint.
model –
Model ID to use for synthesis.
Deprecated since version 0.0.105: Use
settings=RimeNonJsonTTSService.Settings(model=...)instead.audio_format – Audio format to use.
sample_rate – Audio sample rate in Hz.
params –
Additional configuration parameters.
Deprecated since version 0.0.105: Use
settings=RimeNonJsonTTSService.Settings(...)instead.settings – Runtime-updatable settings. When provided alongside deprecated parameters,
settingsvalues take precedence.aggregate_sentences –
Deprecated. Use text_aggregation_mode instead.
Deprecated since version 0.0.104: Use
text_aggregation_modeinstead. Set toTextAggregationMode.SENTENCEto aggregate text into sentences before synthesis, orTextAggregationMode.TOKENto stream tokens directly for lower latency.text_aggregation_mode – How to aggregate text before synthesis.
**kwargs – Additional arguments passed to parent class.
- can_generate_metrics() bool[source]
Check if this service can generate processing metrics.
- Returns:
True, as Rime Non-JSON WebSocket service supports metrics generation.
- language_to_service_language(language: Language) str[source]
Convert pipecat Language enum to Rime language code.
- Parameters:
language – The Language enum value to convert.
- Returns:
Three-letter Rime language code (e.g., ‘eng’ for English). Falls back to the language’s base code with a warning if not in the verified list.
- async start(frame: StartFrame)[source]
Start the Rime Non-JSON WebSocket TTS service.
- Parameters:
frame – The start frame containing initialization parameters.
- async cancel(frame: CancelFrame)[source]
Cancel current operation and clean up.
- async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]
Generate speech from text using Rime’s streaming API.
- Parameters:
text – The text to synthesize into speech.
context_id – The context ID for tracking audio frames.
- Yields:
Frame – Audio frames containing the synthesized speech.