llm

Ultravox Realtime API service implementation.

This module provides real-time conversational AI capabilities using Ultravox’s Realtime API, supporting both text and audio modalities with voice transcription, streaming responses, and tool usage.

Bases: LLMSettings

Settings for UltravoxRealtimeLLMService.

Parameters:: output_medium – The output medium for the model (“voice” or “text”).

output_medium: str | None | _NotGiven = NOT_GIVEN

class pipecat.services.ultravox.llm.AgentInputParams(*, api_key: str, agent_id: UUID, template_context: dict[str, ~typing.Any]=<factory>, metadata: dict[str, str]=<factory>, output_medium: Literal['text', 'voice'] | None=None, max_duration: timedelta | None, ~annotated_types.Ge(ge=datetime.timedelta(seconds=10)), ~annotated_types.Le(le=datetime.timedelta(seconds=3600))] = None, extra: dict[str, ~typing.Any]=<factory>)[source]

Bases: BaseModel

Input parameters for Ultravox Realtime generation using a pre-defined Agent.

Parameters:

api_key – Ultravox API key for authentication.
agent_id – The ID of the Ultravox Realtime agent you’d like to use. Agents are pre-configured to handle calls consistently. You can create and edit agents in the Ultravox console (https://app.ultravox.ai/agents) or using the Ultravox API (https://docs.ultravox.ai/api-reference/agents/agents-post).
template_context – Context variables to use when instantiating a call with the agent. Defaults to an empty dict.
metadata – Metadata to attach to the call. Default to an empty dict.
output_medium – The initial output medium for the agent. Use “text” for text responses or “voice” for audio responses. Defaults to None, which uses the agent’s default.
max_duration – The maximum duration of the call. Defaults to None, which will use the agent’s default maximum duration.
extra – Extra parameters to include in the agent call creation request. Defaults to an empty dict. See the Ultravox API documentation for valid arguments: https://docs.ultravox.ai/api-reference/agents/agents-calls-post

api_key: str

agent_id: UUID

template_context: dict[str, Any]

metadata: dict[str, str]

output_medium: Literal['text', 'voice'] | None

max_duration: timedelta | None

extra: dict[str, Any]

class pipecat.services.ultravox.llm.OneShotInputParams(*, api_key: str, system_prompt: str | None = None, temperature: Annotated[float, ~annotated_types.Ge(ge=0.0), ~annotated_types.Le(le=1.0)] = 0.0, model: str | None = None, voice: UUID | None = None, metadata: dict[str, str]=<factory>, output_medium: Literal['text', 'voice'] | None=None, max_duration: timedelta, ~annotated_types.Ge(ge=datetime.timedelta(seconds=10)), ~annotated_types.Le(le=datetime.timedelta(seconds=3600))] = datetime.timedelta(seconds=3600), extra: dict[str, ~typing.Any]=<factory>)[source]

Bases: BaseModel

Input parameters for Ultravox Realtime generation using a one-off call.

Parameters:

api_key – Ultravox API key for authentication.
system_prompt – System prompt to guide the model’s behavior. Defaults to None.
temperature – Sampling temperature for response generation. Defaults to 0.
model – Model identifier to use. Defaults to “fixie-ai/ultravox”.
voice – Voice identifier for speech generation. Defaults to None.
metadata – Metadata to attach to the call. Default to an empty dict.
output_medium – The initial output medium for the agent. Use “text” for text responses or “voice” for audio responses. Defaults to None (voice).
max_duration – The maximum duration of the call. Defaults to one hour.
extra – Extra parameters to include in the call creation request. Defaults to an empty dict. See the Ultravox API documentation for valid arguments: https://docs.ultravox.ai/api-reference/calls/calls-post

api_key: str

system_prompt: str | None

temperature: float

model: str | None

voice: UUID | None

metadata: dict[str, str]

output_medium: Literal['text', 'voice'] | None

max_duration: timedelta

extra: dict[str, Any]

class pipecat.services.ultravox.llm.JoinUrlInputParams(*, join_url: str)[source]

Bases: BaseModel

Input parameters for joining an existing Ultravox Realtime call via join URL.

Parameters:: join_url – The join URL for the existing Ultravox Realtime call.

join_url: str

class pipecat.services.ultravox.llm.UltravoxRealtimeLLMService(*, params: AgentInputParams | OneShotInputParams | JoinUrlInputParams, settings: UltravoxRealtimeLLMSettings | None = None, one_shot_selected_tools: ToolsSchema | None = None, **kwargs)[source]

Bases: LLMService

Provides access to the Ultravox Realtime API.

This service enables real-time conversations with Ultravox, supporting both text and audio output. It handles voice transcription, streaming audio responses, and tool usage.

Note: Ultravox is an audio-native model, so voice transcriptions are not used by the model and may not always align with its understanding of user input.

Settings: alias of UltravoxRealtimeLLMSettings

__init__(*, params: AgentInputParams | OneShotInputParams | JoinUrlInputParams, settings: UltravoxRealtimeLLMSettings | None = None, one_shot_selected_tools: ToolsSchema | None = None, **kwargs)[source]

Initialize the Ultravox Realtime LLM service.

Parameters:

params – Configuration parameters for the model.
settings – Ultravox Realtime LLM settings. If provided, the settings values take precedence over default values.
one_shot_selected_tools – ToolsSchema for tools to use with this call. May only be set with OneShotInputParams.
**kwargs – Additional arguments passed to parent LLMService.

can_generate_metrics() → bool[source]

Check if the service can generate usage metrics.

Returns:: True if metrics generation is supported.

async start(frame: StartFrame)[source]

Start the service and establish connection.

Parameters:: frame – The start frame.

async stop(frame: EndFrame)[source]

Stop the service and close connections.

Parameters:: frame – The end frame.

async cancel(frame: CancelFrame)[source]

Cancel the service and close connections.

Parameters:: frame – The cancel frame.

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process incoming frames for the Ultravox Realtime service.

Parameters:

frame – The frame to process.
direction – The frame processing direction.