llm

Grok Realtime Voice Agent LLM service implementation with WebSocket support.

Based on xAI’s Grok Voice Agent API documentation: https://docs.x.ai/docs/guides/voice/agent

class pipecat.services.xai.realtime.llm.CurrentAudioResponse(item_id: str, content_index: int, start_time_ms: int, total_size: int = 0)[source]

Bases: object

Tracks the current audio response from the assistant.

Parameters:
  • item_id – Unique identifier for the audio response item.

  • content_index – Index of the audio content within the item.

  • start_time_ms – Timestamp when the audio response started in milliseconds.

  • total_size – Total size of audio data received in bytes. Defaults to 0.

item_id: str
content_index: int
start_time_ms: int
total_size: int = 0
class pipecat.services.xai.realtime.llm.GrokRealtimeLLMSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, system_instruction: str | None | _NotGiven = <factory>, temperature: float | None | _NotGiven = <factory>, max_tokens: int | None | _NotGiven = <factory>, top_p: float | None | _NotGiven = <factory>, top_k: int | None | _NotGiven = <factory>, frequency_penalty: float | None | _NotGiven = <factory>, presence_penalty: float | None | _NotGiven = <factory>, seed: int | None | _NotGiven = <factory>, filter_incomplete_user_turns: bool | None | _NotGiven = <factory>, user_turn_completion_config: UserTurnCompletionConfig | None | _NotGiven = <factory>, session_properties: SessionProperties | _NotGiven = <factory>)[source]

Bases: LLMSettings

Settings for GrokRealtimeLLMService.

Parameters:

session_properties – Grok Realtime session properties (voice, audio config, tools, etc.). instructions is synced bidirectionally with the top-level system_instruction field.

session_properties: SessionProperties | _NotGiven
apply_update(delta: GrokRealtimeLLMSettings) dict[str, Any][source]

Merge a delta, keeping system_instruction in sync with SP.

When the delta contains session_properties, it replaces the stored SP wholesale (matching legacy behaviour). Top-level field values always take precedence over conflicting SP values.

classmethod from_mapping(settings: Mapping[str, Any]) GrokRealtimeLLMSettings[source]

Build a delta from a plain dict, routing SP keys into session_properties.

Keys that correspond to SessionProperties fields are collected into a nested session_properties value. model is always routed to the top-level field. Unknown keys go to extra.

class pipecat.services.xai.realtime.llm.GrokRealtimeLLMService(*, api_key: str, base_url: str = 'wss://api.x.ai/v1/realtime', session_properties: SessionProperties | None = None, settings: GrokRealtimeLLMSettings | None = None, start_audio_paused: bool = False, **kwargs)[source]

Bases: LLMService

Grok Realtime Voice Agent LLM service providing real-time audio and text communication.

Implements the Grok Voice Agent API with WebSocket communication for low-latency bidirectional audio and text interactions. Supports function calling, conversation management, and real-time transcription.

Features:
  • Real-time audio streaming (PCM, PCMU, PCMA formats)

  • Configurable sample rates (8kHz to 48kHz for PCM)

  • Multiple voice options (Ara, Rex, Sal, Eve, Leo)

  • Built-in tools (web_search, x_search, file_search)

  • Custom function calling

  • Server-side VAD (Voice Activity Detection)

Settings

alias of GrokRealtimeLLMSettings

adapter_class

alias of GrokRealtimeLLMAdapter

__init__(*, api_key: str, base_url: str = 'wss://api.x.ai/v1/realtime', session_properties: SessionProperties | None = None, settings: GrokRealtimeLLMSettings | None = None, start_audio_paused: bool = False, **kwargs)[source]

Initialize the Grok Realtime Voice Agent LLM service.

Parameters:
  • api_key – xAI API key for authentication.

  • base_url – WebSocket base URL for the realtime API. Defaults to “wss://api.x.ai/v1/realtime”.

  • session_properties

    Configuration properties for the realtime session. If None, uses default SessionProperties with voice “Ara”.

    Deprecated since version 0.0.105: Use settings=GrokRealtimeLLMService.Settings(session_properties=...) instead.

    To set a different voice, configure it in session_properties:

    session_properties = events.SessionProperties(voice=”Rex”)

    Available voices: Ara, Rex, Sal, Eve, Leo.

  • settings – Runtime-updatable settings for this service.

  • start_audio_paused – Whether to start with audio input paused. Defaults to False.

  • **kwargs – Additional arguments passed to parent LLMService.

can_generate_metrics() bool[source]

Check if the service can generate usage metrics.

Returns:

True if metrics generation is supported.

set_audio_input_paused(paused: bool)[source]

Set whether audio input is paused.

Parameters:

paused – True to pause audio input, False to resume.

async start(frame: StartFrame)[source]

Start the service and establish WebSocket connection.

Parameters:

frame – The start frame triggering service initialization.

async stop(frame: EndFrame)[source]

Stop the service and close WebSocket connection.

Parameters:

frame – The end frame triggering service shutdown.

async cancel(frame: CancelFrame)[source]

Cancel the service and close WebSocket connection.

Parameters:

frame – The cancel frame triggering service cancellation.

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process incoming frames from the pipeline.

Parameters:
  • frame – The frame to process.

  • direction – The direction of frame flow in the pipeline.

async send_client_event(event: ClientEvent)[source]

Send a client event to the Grok Voice Agent API.

Parameters:

event – The client event to send.

async reset_conversation()[source]

Reset the conversation by disconnecting and reconnecting.