llm

Inworld Realtime LLM service implementation with WebSocket support.

Based on Inworld’s Realtime API documentation: https://docs.inworld.ai/api-reference/realtimeAPI/realtime/realtime-websocket

class pipecat.services.inworld.realtime.llm.CurrentAudioResponse(item_id: str, content_index: int, start_time_ms: int, total_size: int = 0)[source]

Bases: object

Tracks the current audio response from the assistant.

Parameters:

item_id – Unique identifier for the audio response item.
content_index – Index of the audio content within the item.
start_time_ms – Timestamp when the audio response started in milliseconds.
total_size – Total size of audio data received in bytes. Defaults to 0.

item_id: str

content_index: int

start_time_ms: int

total_size: int = 0

Bases: LLMSettings

Settings for InworldRealtimeLLMService.

Parameters:: session_properties – Inworld Realtime session properties (audio config, tools, etc.). model and instructions are synced bidirectionally with the top-level model and system_instruction fields.

session_properties: SessionProperties | _NotGiven

apply_update(delta: InworldRealtimeLLMSettings) → dict[str, Any][source]

Merge a delta, keeping model/system_instruction in sync with SP.

When the delta contains session_properties, it replaces the stored SP wholesale (matching legacy behaviour). Top-level field values always take precedence over conflicting SP values.

classmethod from_mapping(settings: Mapping[str, Any]) → InworldRealtimeLLMSettings[source]

Build a delta from a plain dict, routing SP keys into session_properties.

Keys that correspond to SessionProperties fields are collected into a nested session_properties value. model is always routed to the top-level field. Unknown keys go to extra.

class pipecat.services.inworld.realtime.llm.InworldRealtimeLLMService(*, api_key: str, llm_model: str | None = None, voice: str | None = None, tts_model: str | None = None, stt_model: str | None = None, base_url: str = 'wss://api.inworld.ai/api/v1/realtime/session', auth_type: Literal['basic', 'bearer'] = 'basic', settings: InworldRealtimeLLMSettings | None = None, start_audio_paused: bool = False, **kwargs)[source]

Bases: LLMService

Inworld Realtime LLM service for real-time audio and text communication.

Implements the Inworld Realtime API with WebSocket communication for low-latency bidirectional audio and text interactions. The API operates as a cascade STT/LLM/TTS pipeline under the hood, with built-in semantic voice activity detection (VAD) for turn management.

Supports function calling, conversation management, and real-time transcription.

Example:

llm = InworldRealtimeLLMService(
    api_key=os.getenv("INWORLD_API_KEY"),
    llm_model="openai/gpt-4.1-nano",
    voice="Sarah",
    tts_model="inworld-tts-1.5-max",
)

For full control over session properties (note: session_properties replaces all defaults, so provide a complete config):

from pipecat.services.inworld.realtime.events import *

llm = InworldRealtimeLLMService(
    api_key=os.getenv("INWORLD_API_KEY"),
    settings=InworldRealtimeLLMService.Settings(
        session_properties=SessionProperties(
            model="openai/gpt-4.1-nano",
            temperature=0.7,
            audio=AudioConfiguration(
                input=AudioInput(
                    format=PCMAudioFormat(rate=24000),
                    turn_detection=TurnDetection(
                        type="semantic_vad",
                        eagerness="low",
                    ),
                ),
                output=AudioOutput(
                    format=PCMAudioFormat(rate=24000),
                    voice="Sarah",
                    model="inworld-tts-1.5-max",
                ),
            ),
        ),
    ),
)

Settings: alias of InworldRealtimeLLMSettings

adapter_class: alias of InworldRealtimeLLMAdapter

__init__(*, api_key: str, llm_model: str | None = None, voice: str | None = None, tts_model: str | None = None, stt_model: str | None = None, base_url: str = 'wss://api.inworld.ai/api/v1/realtime/session', auth_type: Literal['basic', 'bearer'] = 'basic', settings: InworldRealtimeLLMSettings | None = None, start_audio_paused: bool = False, **kwargs)[source]

Initialize the Inworld Realtime LLM service.

Parameters:

api_key – Inworld API key for authentication.
llm_model – LLM model to use (e.g. “openai/gpt-4.1-nano”). Shorthand for session_properties.model.
voice – Voice ID for TTS output (e.g. “Sarah”, “Clive”). Shorthand for session_properties.audio.output.voice.
tts_model – TTS model to use (e.g. “inworld-tts-1.5-max”). Shorthand for session_properties.audio.output.model.
stt_model – STT model for input transcription (e.g. “assemblyai/universal-streaming-multilingual”). Shorthand for session_properties.audio.input.transcription.model.
base_url – WebSocket base URL for the realtime API.
auth_type – Authentication type. "basic" for server-side API key auth, "bearer" for client-side JWT auth.
settings – Full settings for fine-grained control. When session_properties is provided in settings, it replaces all defaults wholesale — provide a complete SessionProperties in that case.
start_audio_paused – Whether to start with audio input paused.
**kwargs – Additional arguments passed to parent LLMService.

can_generate_metrics() → bool[source]: Check if the service can generate usage metrics.

set_audio_input_paused(paused: bool)[source]

Set whether audio input is paused.

Parameters:: paused – True to pause audio input, False to resume.

async start(frame: StartFrame)[source]: Start the service and establish WebSocket connection.

async stop(frame: EndFrame)[source]: Stop the service and close WebSocket connection.

async cancel(frame: CancelFrame)[source]: Cancel the service and close WebSocket connection.

async process_frame(frame: Frame, direction: FrameDirection)[source]: Process incoming frames from the pipeline.

async send_client_event(event: ClientEvent)[source]

Send a client event to the Inworld Realtime API.

Parameters:: event – The client event to send.

async reset_conversation()[source]

Reset the conversation by disconnecting and reconnecting.

This fully resets the server-side conversation state. Audio buffers, pending function calls, and conversation history are cleared.