llm

OpenAI Responses API LLM service implementations (WebSocket and HTTP).

class pipecat.services.openai.responses.llm.OpenAIResponsesLLMService(*, ws_url: str = 'wss://api.openai.com/v1/responses', **kwargs)[source]

Bases: _BaseOpenAIResponsesLLMService, WebsocketLLMService

OpenAI Responses API LLM service using WebSocket transport.

Maintains a persistent WebSocket connection to wss://api.openai.com/v1/responses for lower-latency inference, especially beneficial for tool-call-heavy workflows. Automatically uses previous_response_id to send only incremental context when possible, and falls back to full context on reconnection or cache miss.

The previous_response_id optimization works with store=False (the default) because WebSocket mode uses a connection-local in-memory cache — no conversations are stored on OpenAI’s servers. This is why the HTTP variant (OpenAIResponsesHttpLLMService) does not offer this optimization by default (or at all, yet): over HTTP, previous_response_id requires store=True, which enables OpenAI-side 30-day conversation storage.

This is the recommended variant for real-time / conversational use.

Example:

llm = OpenAIResponsesLLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAIResponsesLLMService.Settings(
        model="gpt-4.1",
        system_instruction="You are a helpful assistant.",
    ),
)
__init__(*, ws_url: str = 'wss://api.openai.com/v1/responses', **kwargs)[source]

Initialize the WebSocket-based OpenAI Responses API LLM service.

Parameters:
  • ws_url – WebSocket endpoint URL. Defaults to wss://api.openai.com/v1/responses.

  • **kwargs – Additional arguments passed to the base class (api_key, base_url, organization, project, default_headers, service_tier, settings, etc.).

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process frames for LLM completion requests.

Parameters:
  • frame – The frame to process.

  • direction – The direction of frame processing.

class pipecat.services.openai.responses.llm.OpenAIResponsesHttpLLMService(*, api_key=None, base_url=None, organization=None, project=None, default_headers: Mapping[str, str] | None = None, service_tier: str | None = None, settings: OpenAIResponsesLLMSettings | None = None, **kwargs)[source]

Bases: _BaseOpenAIResponsesLLMService

OpenAI Responses API LLM service using HTTP streaming transport.

Uses server-sent events (SSE) via the OpenAI Python SDK for streaming inference. Each _process_context call opens a new HTTP connection.

Unlike the WebSocket variant, this service does not use previous_response_id for incremental context delivery by default (or at all, yet). Over HTTP, previous_response_id requires store=True, which enables OpenAI-side 30-day conversation storage — a privacy/compliance tradeoff that many users won’t want. The WebSocket variant avoids this because its previous_response_id uses a connection-local in-memory cache that works with store=False (nothing is stored long-term).

Example:

llm = OpenAIResponsesHttpLLMService(
    api_key=os.getenv("OPENAI_API_KEY"),
    settings=OpenAIResponsesHttpLLMService.Settings(
        model="gpt-4.1",
        system_instruction="You are a helpful assistant.",
    ),
)
async process_frame(frame: Frame, direction: FrameDirection)[source]

Process frames for LLM completion requests.

Parameters:
  • frame – The frame to process.

  • direction – The direction of frame processing.

class pipecat.services.openai.responses.llm.OpenAIResponsesLLMSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, system_instruction: str | None | _NotGiven = <factory>, temperature: float | None | _NotGiven | NotGiven = <factory>, max_tokens: int | None | _NotGiven = <factory>, top_p: float | None | _NotGiven | NotGiven = <factory>, top_k: int | None | _NotGiven = <factory>, frequency_penalty: float | None | _NotGiven = <factory>, presence_penalty: float | None | _NotGiven = <factory>, seed: int | None | _NotGiven = <factory>, filter_incomplete_user_turns: bool | None | _NotGiven = <factory>, user_turn_completion_config: UserTurnCompletionConfig | None | _NotGiven = <factory>, max_completion_tokens: int | _NotGiven | NotGiven = <factory>)[source]

Bases: LLMSettings

Settings for OpenAI Responses API LLM services.

Parameters:

max_completion_tokens – Maximum completion tokens to generate.

temperature: float | None | _NotGiven | NotGiven
top_p: float | None | _NotGiven | NotGiven
max_completion_tokens: int | _NotGiven | NotGiven