tts
Soniox text-to-speech service implementation.
This module provides a WebSocket-based TTS service using the Soniox real-time
Text-to-Speech API. It streams text to the server incrementally and receives
audio back as base64-encoded chunks, multiplexed across multiple concurrent
streams by stream_id.
Soniox API reference: https://soniox.com/docs/tts/api-reference/websocket-api
- pipecat.services.soniox.tts.language_to_soniox_tts_language(language: Language) str | None[source]
Convert a Pipecat Language to a Soniox TTS language code.
For the full list of supported languages, see: https://soniox.com/docs/tts/concepts/languages
- class pipecat.services.soniox.tts.SonioxTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>)[source]
Bases:
TTSSettingsSettings for SonioxTTSService.
voice,model, andlanguagetravel in the per-stream config message, so changing any of them does not require reconnecting the WebSocket. The current context is flushed so the next stream opens with the new values.
- class pipecat.services.soniox.tts.SonioxTTSService(*, api_key: str, url: str = 'wss://tts-rt.soniox.com/tts-websocket', sample_rate: int | None = None, audio_format: str = 'pcm_s16le', settings: SonioxTTSSettings | None = None, text_aggregation_mode: TextAggregationMode | None = None, **kwargs)[source]
Bases:
WebsocketTTSServiceSoniox WebSocket TTS service with streaming text-in, streaming audio-out.
Streams text incrementally to Soniox’s real-time TTS endpoint and routes the returned base64-encoded audio back as
TTSAudioRawFrameframes. Multiple concurrent streams are multiplexed over a single WebSocket connection via Pipecat’s audio-context mechanism (mapped to Soniox’sstream_id). Supports up to 5 concurrent streams per connection.For complete API documentation, see: https://soniox.com/docs/tts/api-reference/websocket-api
- Settings
alias of
SonioxTTSSettings
- __init__(*, api_key: str, url: str = 'wss://tts-rt.soniox.com/tts-websocket', sample_rate: int | None = None, audio_format: str = 'pcm_s16le', settings: SonioxTTSSettings | None = None, text_aggregation_mode: TextAggregationMode | None = None, **kwargs)[source]
Initialize the Soniox TTS service.
- Parameters:
api_key – Soniox API key for authentication. Create API keys at https://console.soniox.com.
url – WebSocket URL for the Soniox TTS endpoint.
sample_rate – Output sample rate in Hz. Must be one of
{8000, 16000, 24000, 44100, 48000}when using a raw PCM audio format. IfNone, inherits from the pipeline.audio_format – Output audio format. Defaults to
"pcm_s16le", which matches Pipecat’s downstream audio pipeline.settings – Runtime-updatable settings. When provided alongside deprecated parameters,
settingsvalues take precedence.text_aggregation_mode – How to aggregate incoming text before synthesis. Defaults to
TextAggregationMode.SENTENCE.**kwargs – Additional arguments passed to the parent service.
- can_generate_metrics() bool[source]
Check if this service can generate processing metrics.
- Returns:
True, as Soniox TTS supports metrics generation.
- language_to_service_language(language: Language) str | None[source]
Convert a Language enum to a Soniox TTS language code.
- Parameters:
language – The language to convert.
- Returns:
The Soniox-specific language code, or None if not supported.
- async start(frame: StartFrame)[source]
Start the Soniox TTS service.
- Parameters:
frame – The start frame containing initialization parameters.
- async stop(frame: EndFrame)[source]
Stop the Soniox TTS service.
- Parameters:
frame – The end frame.
- async cancel(frame: CancelFrame)[source]
Cancel the Soniox TTS service.
- Parameters:
frame – The cancel frame.
- async flush_audio(context_id: str | None = None)[source]
Flush any pending audio and finalize the current stream.
- Parameters:
context_id – The specific context to flush. If
None, falls back to the currently active context.
- async on_turn_context_created(context_id: str)[source]
Eagerly open the Soniox stream when a new turn context is created.
Overlaps Soniox-side stream creation with sentence aggregation so the stream is ready by the time text reaches
run_tts.
- async on_turn_context_completed()[source]
Cancel any eagerly-opened Soniox stream that never received text.
The base class sends
text_end:true(viaflush_audio) for streams that received text — that already terminates the stream. For an empty turn (e.g., the LLM produced only tool calls), no text reachesrun_ttsand the eager-opened stream would otherwise sit until Soniox’s per-stream idle timer fires. Cancel it here.
- async on_audio_context_interrupted(context_id: str)[source]
Cancel the active Soniox stream when the bot is interrupted.
- async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]
Stream text to Soniox and deliver synthesized audio asynchronously.
The first
run_ttscall for a givencontext_idsends the per-stream config message; subsequent calls within the same stream send only text chunks. Audio arrives via the receive loop and is appended to the matching audio context.- Parameters:
text – The text to synthesize.
context_id – The audio context (maps to Soniox
stream_id).
- Yields:
None— audio frames are delivered out of band via the receive task and the audio-context queue.