llm

Ultravox Realtime API service implementation.

This module provides real-time conversational AI capabilities using Ultravox’s Realtime API, supporting both text and audio modalities with voice transcription, streaming responses, and tool usage.

class pipecat.services.ultravox.llm.UltravoxRealtimeLLMSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, system_instruction: str | None | _NotGiven = <factory>, temperature: float | None | _NotGiven = <factory>, max_tokens: int | None | _NotGiven = <factory>, top_p: float | None | _NotGiven = <factory>, top_k: int | None | _NotGiven = <factory>, frequency_penalty: float | None | _NotGiven = <factory>, presence_penalty: float | None | _NotGiven = <factory>, seed: int | None | _NotGiven = <factory>, filter_incomplete_user_turns: bool | None | _NotGiven = <factory>, user_turn_completion_config: UserTurnCompletionConfig | None | _NotGiven = <factory>, output_medium: str | None | _NotGiven = NOT_GIVEN)[source]

Bases: LLMSettings

Settings for UltravoxRealtimeLLMService.

Parameters:

output_medium – The output medium for the model (“voice” or “text”).

output_medium: str | None | _NotGiven = NOT_GIVEN
class pipecat.services.ultravox.llm.AgentInputParams(*, api_key: str, agent_id: UUID, template_context: dict[str, ~typing.Any]=<factory>, metadata: dict[str, str]=<factory>, output_medium: Literal['text', 'voice'] | None=None, max_duration: timedelta | None, ~annotated_types.Ge(ge=datetime.timedelta(seconds=10)), ~annotated_types.Le(le=datetime.timedelta(seconds=3600))] = None, extra: dict[str, ~typing.Any]=<factory>)[source]

Bases: BaseModel

Input parameters for Ultravox Realtime generation using a pre-defined Agent.

Parameters:
  • api_key – Ultravox API key for authentication.

  • agent_id – The ID of the Ultravox Realtime agent you’d like to use. Agents are pre-configured to handle calls consistently. You can create and edit agents in the Ultravox console (https://app.ultravox.ai/agents) or using the Ultravox API (https://docs.ultravox.ai/api-reference/agents/agents-post).

  • template_context – Context variables to use when instantiating a call with the agent. Defaults to an empty dict.

  • metadata – Metadata to attach to the call. Default to an empty dict.

  • output_medium – The initial output medium for the agent. Use “text” for text responses or “voice” for audio responses. Defaults to None, which uses the agent’s default.

  • max_duration – The maximum duration of the call. Defaults to None, which will use the agent’s default maximum duration.

  • extra – Extra parameters to include in the agent call creation request. Defaults to an empty dict. See the Ultravox API documentation for valid arguments: https://docs.ultravox.ai/api-reference/agents/agents-calls-post

api_key: str
agent_id: UUID
template_context: dict[str, Any]
metadata: dict[str, str]
output_medium: Literal['text', 'voice'] | None
max_duration: timedelta | None
extra: dict[str, Any]
class pipecat.services.ultravox.llm.OneShotInputParams(*, api_key: str, system_prompt: str | None = None, temperature: Annotated[float, ~annotated_types.Ge(ge=0.0), ~annotated_types.Le(le=1.0)] = 0.0, model: str | None = None, voice: UUID | None = None, metadata: dict[str, str]=<factory>, output_medium: Literal['text', 'voice'] | None=None, max_duration: timedelta, ~annotated_types.Ge(ge=datetime.timedelta(seconds=10)), ~annotated_types.Le(le=datetime.timedelta(seconds=3600))] = datetime.timedelta(seconds=3600), extra: dict[str, ~typing.Any]=<factory>)[source]

Bases: BaseModel

Input parameters for Ultravox Realtime generation using a one-off call.

Parameters:
  • api_key – Ultravox API key for authentication.

  • system_prompt – System prompt to guide the model’s behavior. Defaults to None.

  • temperature – Sampling temperature for response generation. Defaults to 0.

  • model – Model identifier to use. Defaults to “fixie-ai/ultravox”.

  • voice – Voice identifier for speech generation. Defaults to None.

  • metadata – Metadata to attach to the call. Default to an empty dict.

  • output_medium – The initial output medium for the agent. Use “text” for text responses or “voice” for audio responses. Defaults to None (voice).

  • max_duration – The maximum duration of the call. Defaults to one hour.

  • extra – Extra parameters to include in the call creation request. Defaults to an empty dict. See the Ultravox API documentation for valid arguments: https://docs.ultravox.ai/api-reference/calls/calls-post

api_key: str
system_prompt: str | None
temperature: float
model: str | None
voice: UUID | None
metadata: dict[str, str]
output_medium: Literal['text', 'voice'] | None
max_duration: timedelta
extra: dict[str, Any]
class pipecat.services.ultravox.llm.JoinUrlInputParams(*, join_url: str)[source]

Bases: BaseModel

Input parameters for joining an existing Ultravox Realtime call via join URL.

Parameters:

join_url – The join URL for the existing Ultravox Realtime call.

join_url: str
class pipecat.services.ultravox.llm.UltravoxRealtimeLLMService(*, params: AgentInputParams | OneShotInputParams | JoinUrlInputParams, settings: UltravoxRealtimeLLMSettings | None = None, one_shot_selected_tools: ToolsSchema | None = None, **kwargs)[source]

Bases: LLMService

Provides access to the Ultravox Realtime API.

This service enables real-time conversations with Ultravox, supporting both text and audio output. It handles voice transcription, streaming audio responses, and tool usage.

Note: Ultravox is an audio-native model, so voice transcriptions are not used by the model and may not always align with its understanding of user input.

Settings

alias of UltravoxRealtimeLLMSettings

__init__(*, params: AgentInputParams | OneShotInputParams | JoinUrlInputParams, settings: UltravoxRealtimeLLMSettings | None = None, one_shot_selected_tools: ToolsSchema | None = None, **kwargs)[source]

Initialize the Ultravox Realtime LLM service.

Parameters:
  • params – Configuration parameters for the model.

  • settings – Ultravox Realtime LLM settings. If provided, the settings values take precedence over default values.

  • one_shot_selected_tools – ToolsSchema for tools to use with this call. May only be set with OneShotInputParams.

  • **kwargs – Additional arguments passed to parent LLMService.

can_generate_metrics() bool[source]

Check if the service can generate usage metrics.

Returns:

True if metrics generation is supported.

async start(frame: StartFrame)[source]

Start the service and establish connection.

Parameters:

frame – The start frame.

async stop(frame: EndFrame)[source]

Stop the service and close connections.

Parameters:

frame – The end frame.

async cancel(frame: CancelFrame)[source]

Cancel the service and close connections.

Parameters:

frame – The cancel frame.

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process incoming frames for the Ultravox Realtime service.

Parameters:
  • frame – The frame to process.

  • direction – The frame processing direction.