llm
OpenAI Responses API LLM service implementations (WebSocket and HTTP).
- class pipecat.services.openai.responses.llm.OpenAIResponsesLLMService(*, ws_url: str = 'wss://api.openai.com/v1/responses', **kwargs)[source]
Bases:
_BaseOpenAIResponsesLLMService,WebsocketLLMServiceOpenAI Responses API LLM service using WebSocket transport.
Maintains a persistent WebSocket connection to
wss://api.openai.com/v1/responsesfor lower-latency inference, especially beneficial for tool-call-heavy workflows. Automatically usesprevious_response_idto send only incremental context when possible, and falls back to full context on reconnection or cache miss.The
previous_response_idoptimization works withstore=False(the default) because WebSocket mode uses a connection-local in-memory cache — no conversations are stored on OpenAI’s servers. This is why the HTTP variant (OpenAIResponsesHttpLLMService) does not offer this optimization by default (or at all, yet): over HTTP,previous_response_idrequiresstore=True, which enables OpenAI-side 30-day conversation storage.This is the recommended variant for real-time / conversational use.
Example:
llm = OpenAIResponsesLLMService( api_key=os.getenv("OPENAI_API_KEY"), settings=OpenAIResponsesLLMService.Settings( model="gpt-4.1", system_instruction="You are a helpful assistant.", ), )
- __init__(*, ws_url: str = 'wss://api.openai.com/v1/responses', **kwargs)[source]
Initialize the WebSocket-based OpenAI Responses API LLM service.
- Parameters:
ws_url – WebSocket endpoint URL. Defaults to
wss://api.openai.com/v1/responses.**kwargs – Additional arguments passed to the base class (api_key, base_url, organization, project, default_headers, service_tier, settings, etc.).
- async process_frame(frame: Frame, direction: FrameDirection)[source]
Process frames for LLM completion requests.
- Parameters:
frame – The frame to process.
direction – The direction of frame processing.
- class pipecat.services.openai.responses.llm.OpenAIResponsesHttpLLMService(*, api_key=None, base_url=None, organization=None, project=None, default_headers: Mapping[str, str] | None = None, service_tier: str | None = None, settings: OpenAIResponsesLLMSettings | None = None, **kwargs)[source]
Bases:
_BaseOpenAIResponsesLLMServiceOpenAI Responses API LLM service using HTTP streaming transport.
Uses server-sent events (SSE) via the OpenAI Python SDK for streaming inference. Each
_process_contextcall opens a new HTTP connection.Unlike the WebSocket variant, this service does not use
previous_response_idfor incremental context delivery by default (or at all, yet). Over HTTP,previous_response_idrequiresstore=True, which enables OpenAI-side 30-day conversation storage — a privacy/compliance tradeoff that many users won’t want. The WebSocket variant avoids this because itsprevious_response_iduses a connection-local in-memory cache that works withstore=False(nothing is stored long-term).Example:
llm = OpenAIResponsesHttpLLMService( api_key=os.getenv("OPENAI_API_KEY"), settings=OpenAIResponsesHttpLLMService.Settings( model="gpt-4.1", system_instruction="You are a helpful assistant.", ), )
- async process_frame(frame: Frame, direction: FrameDirection)[source]
Process frames for LLM completion requests.
- Parameters:
frame – The frame to process.
direction – The direction of frame processing.
- class pipecat.services.openai.responses.llm.OpenAIResponsesLLMSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, system_instruction: str | None | _NotGiven = <factory>, temperature: float | None | _NotGiven | NotGiven = <factory>, max_tokens: int | None | _NotGiven = <factory>, top_p: float | None | _NotGiven | NotGiven = <factory>, top_k: int | None | _NotGiven = <factory>, frequency_penalty: float | None | _NotGiven = <factory>, presence_penalty: float | None | _NotGiven = <factory>, seed: int | None | _NotGiven = <factory>, filter_incomplete_user_turns: bool | None | _NotGiven = <factory>, user_turn_completion_config: UserTurnCompletionConfig | None | _NotGiven = <factory>, max_completion_tokens: int | _NotGiven | NotGiven = <factory>)[source]
Bases:
LLMSettingsSettings for OpenAI Responses API LLM services.
- Parameters:
max_completion_tokens – Maximum completion tokens to generate.
- temperature: float | None | _NotGiven | NotGiven
- top_p: float | None | _NotGiven | NotGiven
- max_completion_tokens: int | _NotGiven | NotGiven