llm
Ultravox Realtime API service implementation.
This module provides real-time conversational AI capabilities using Ultravox’s Realtime API, supporting both text and audio modalities with voice transcription, streaming responses, and tool usage.
- class pipecat.services.ultravox.llm.UltravoxRealtimeLLMSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, system_instruction: str | None | _NotGiven = <factory>, temperature: float | None | _NotGiven = <factory>, max_tokens: int | None | _NotGiven = <factory>, top_p: float | None | _NotGiven = <factory>, top_k: int | None | _NotGiven = <factory>, frequency_penalty: float | None | _NotGiven = <factory>, presence_penalty: float | None | _NotGiven = <factory>, seed: int | None | _NotGiven = <factory>, filter_incomplete_user_turns: bool | None | _NotGiven = <factory>, user_turn_completion_config: UserTurnCompletionConfig | None | _NotGiven = <factory>, output_medium: str | None | _NotGiven = NOT_GIVEN)[source]
Bases:
LLMSettingsSettings for UltravoxRealtimeLLMService.
- Parameters:
output_medium – The output medium for the model (“voice” or “text”).
- output_medium: str | None | _NotGiven = NOT_GIVEN
- class pipecat.services.ultravox.llm.AgentInputParams(*, api_key: str, agent_id: UUID, template_context: dict[str, ~typing.Any]=<factory>, metadata: dict[str, str]=<factory>, output_medium: Literal['text', 'voice'] | None=None, max_duration: timedelta | None, ~annotated_types.Ge(ge=datetime.timedelta(seconds=10)), ~annotated_types.Le(le=datetime.timedelta(seconds=3600))] = None, extra: dict[str, ~typing.Any]=<factory>)[source]
Bases:
BaseModelInput parameters for Ultravox Realtime generation using a pre-defined Agent.
- Parameters:
api_key – Ultravox API key for authentication.
agent_id – The ID of the Ultravox Realtime agent you’d like to use. Agents are pre-configured to handle calls consistently. You can create and edit agents in the Ultravox console (https://app.ultravox.ai/agents) or using the Ultravox API (https://docs.ultravox.ai/api-reference/agents/agents-post).
template_context – Context variables to use when instantiating a call with the agent. Defaults to an empty dict.
metadata – Metadata to attach to the call. Default to an empty dict.
output_medium – The initial output medium for the agent. Use “text” for text responses or “voice” for audio responses. Defaults to None, which uses the agent’s default.
max_duration – The maximum duration of the call. Defaults to None, which will use the agent’s default maximum duration.
extra – Extra parameters to include in the agent call creation request. Defaults to an empty dict. See the Ultravox API documentation for valid arguments: https://docs.ultravox.ai/api-reference/agents/agents-calls-post
- api_key: str
- agent_id: UUID
- template_context: dict[str, Any]
- metadata: dict[str, str]
- output_medium: Literal['text', 'voice'] | None
- max_duration: timedelta | None
- extra: dict[str, Any]
- class pipecat.services.ultravox.llm.OneShotInputParams(*, api_key: str, system_prompt: str | None = None, temperature: Annotated[float, ~annotated_types.Ge(ge=0.0), ~annotated_types.Le(le=1.0)] = 0.0, model: str | None = None, voice: UUID | None = None, metadata: dict[str, str]=<factory>, output_medium: Literal['text', 'voice'] | None=None, max_duration: timedelta, ~annotated_types.Ge(ge=datetime.timedelta(seconds=10)), ~annotated_types.Le(le=datetime.timedelta(seconds=3600))] = datetime.timedelta(seconds=3600), extra: dict[str, ~typing.Any]=<factory>)[source]
Bases:
BaseModelInput parameters for Ultravox Realtime generation using a one-off call.
- Parameters:
api_key – Ultravox API key for authentication.
system_prompt – System prompt to guide the model’s behavior. Defaults to None.
temperature – Sampling temperature for response generation. Defaults to 0.
model – Model identifier to use. Defaults to “fixie-ai/ultravox”.
voice – Voice identifier for speech generation. Defaults to None.
metadata – Metadata to attach to the call. Default to an empty dict.
output_medium – The initial output medium for the agent. Use “text” for text responses or “voice” for audio responses. Defaults to None (voice).
max_duration – The maximum duration of the call. Defaults to one hour.
extra – Extra parameters to include in the call creation request. Defaults to an empty dict. See the Ultravox API documentation for valid arguments: https://docs.ultravox.ai/api-reference/calls/calls-post
- api_key: str
- system_prompt: str | None
- temperature: float
- model: str | None
- voice: UUID | None
- metadata: dict[str, str]
- output_medium: Literal['text', 'voice'] | None
- max_duration: timedelta
- extra: dict[str, Any]
- class pipecat.services.ultravox.llm.JoinUrlInputParams(*, join_url: str)[source]
Bases:
BaseModelInput parameters for joining an existing Ultravox Realtime call via join URL.
- Parameters:
join_url – The join URL for the existing Ultravox Realtime call.
- join_url: str
- class pipecat.services.ultravox.llm.UltravoxRealtimeLLMService(*, params: AgentInputParams | OneShotInputParams | JoinUrlInputParams, settings: UltravoxRealtimeLLMSettings | None = None, one_shot_selected_tools: ToolsSchema | None = None, **kwargs)[source]
Bases:
LLMServiceProvides access to the Ultravox Realtime API.
This service enables real-time conversations with Ultravox, supporting both text and audio output. It handles voice transcription, streaming audio responses, and tool usage.
Note: Ultravox is an audio-native model, so voice transcriptions are not used by the model and may not always align with its understanding of user input.
- Settings
alias of
UltravoxRealtimeLLMSettings
- __init__(*, params: AgentInputParams | OneShotInputParams | JoinUrlInputParams, settings: UltravoxRealtimeLLMSettings | None = None, one_shot_selected_tools: ToolsSchema | None = None, **kwargs)[source]
Initialize the Ultravox Realtime LLM service.
- Parameters:
params – Configuration parameters for the model.
settings – Ultravox Realtime LLM settings. If provided, the
settingsvalues take precedence over default values.one_shot_selected_tools – ToolsSchema for tools to use with this call. May only be set with OneShotInputParams.
**kwargs – Additional arguments passed to parent LLMService.
- can_generate_metrics() bool[source]
Check if the service can generate usage metrics.
- Returns:
True if metrics generation is supported.
- async start(frame: StartFrame)[source]
Start the service and establish connection.
- Parameters:
frame – The start frame.
- async stop(frame: EndFrame)[source]
Stop the service and close connections.
- Parameters:
frame – The end frame.
- async cancel(frame: CancelFrame)[source]
Cancel the service and close connections.
- Parameters:
frame – The cancel frame.
- async process_frame(frame: Frame, direction: FrameDirection)[source]
Process incoming frames for the Ultravox Realtime service.
- Parameters:
frame – The frame to process.
direction – The frame processing direction.