tts

xAI text-to-speech service implementation.

Provides two TTS services against xAI’s voice API:

  • XAIHttpTTSService uses the batch HTTP endpoint at https://api.x.ai/v1/tts.

  • XAITTSService uses the streaming WebSocket endpoint at wss://api.x.ai/v1/tts.

See https://docs.x.ai/developers/rest-api-reference/inference/voice.

pipecat.services.xai.tts.language_to_xai_language(language: Language) str | None[source]

Convert a Language enum to xAI language code.

Parameters:

language – The Language enum value to convert.

Returns:

The corresponding xAI language code, or None if not supported.

class pipecat.services.xai.tts.XAITTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>)[source]

Bases: TTSSettings

Settings for XAIHttpTTSService.

class pipecat.services.xai.tts.XAIHttpTTSService(*, api_key: str, base_url: str = 'https://api.x.ai/v1/tts', sample_rate: int | None = None, encoding: str | None = 'pcm', aiohttp_session: ClientSession | None = None, settings: XAITTSSettings | None = None, **kwargs)[source]

Bases: TTSService

xAI HTTP text-to-speech service.

The service requests raw PCM audio so emitted TTSAudioRawFrame objects match Pipecat’s downstream expectations without extra decoding.

Settings

alias of XAITTSSettings

__init__(*, api_key: str, base_url: str = 'https://api.x.ai/v1/tts', sample_rate: int | None = None, encoding: str | None = 'pcm', aiohttp_session: ClientSession | None = None, settings: XAITTSSettings | None = None, **kwargs)[source]

Initialize the xAI TTS service.

Parameters:
  • api_key – xAI API key for authentication.

  • base_url – xAI TTS endpoint. Defaults to https://api.x.ai/v1/tts.

  • sample_rate – Audio sample rate. If None, uses default.

  • encoding – Output encoding format. Defaults to “pcm”.

  • aiohttp_session – Optional shared aiohttp session.

  • settings – Runtime-updatable settings.

  • **kwargs – Additional keyword arguments passed to TTSService.

can_generate_metrics() bool[source]

Check if this service can generate processing metrics.

language_to_service_language(language: Language) str | None[source]

Convert a Language enum to xAI language format.

Parameters:

language – The language to convert.

Returns:

The xAI-specific language code, or None if not supported.

async start(frame)[source]

Start the xAI TTS service.

async stop(frame)[source]

Stop the xAI TTS service.

async cancel(frame)[source]

Cancel the xAI TTS service.

async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]

Generate speech from text using xAI’s TTS API.

class pipecat.services.xai.tts.XAIWebsocketTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>)[source]

Bases: TTSSettings

Settings for XAITTSService (WebSocket streaming).

class pipecat.services.xai.tts.XAITTSService(*, api_key: str, base_url: str = 'wss://api.x.ai/v1/tts', sample_rate: int | None = None, codec: str = 'pcm', settings: XAIWebsocketTTSSettings | None = None, **kwargs)[source]

Bases: InterruptibleTTSService

xAI streaming text-to-speech service.

Connects to xAI’s WebSocket TTS endpoint and streams audio chunks back as they are synthesized. Text can be sent incrementally via text.delta messages and each utterance is terminated with text.done. The server responds with audio.delta chunks followed by an audio.done message.

Audio parameters (voice, language, codec, sample rate, bit rate) are passed as query string parameters on the WebSocket URL; changing any of them at runtime reconnects the WebSocket.

Settings

alias of XAIWebsocketTTSSettings

__init__(*, api_key: str, base_url: str = 'wss://api.x.ai/v1/tts', sample_rate: int | None = None, codec: str = 'pcm', settings: XAIWebsocketTTSSettings | None = None, **kwargs)[source]

Initialize the xAI WebSocket TTS service.

Parameters:
  • api_key – xAI API key for authentication.

  • base_url – xAI TTS WebSocket endpoint. Defaults to wss://api.x.ai/v1/tts.

  • sample_rate – Output audio sample rate in Hz. If None, uses the pipeline default.

  • codec – Output audio codec. One of pcm, wav, mulaw, alaw. Defaults to pcm so emitted TTSAudioRawFrame objects need no decoding downstream.

  • settings – Runtime-updatable settings.

  • **kwargs – Additional arguments passed to parent InterruptibleTTSService.

can_generate_metrics() bool[source]

Check if this service can generate processing metrics.

language_to_service_language(language: Language) str | None[source]

Convert a Language enum to xAI language format.

async start(frame: StartFrame)[source]

Start the xAI WebSocket TTS service.

async stop(frame: EndFrame)[source]

Stop the xAI WebSocket TTS service.

async cancel(frame: CancelFrame)[source]

Cancel the xAI WebSocket TTS service.

async flush_audio(context_id: str | None = None)[source]

Signal end-of-utterance so xAI begins synthesizing what it has buffered.

async run_tts(text: str, context_id: str) AsyncGenerator[Frame | None, None][source]

Generate TTS audio from text using xAI’s streaming WebSocket API.