base_llm

Base LLM service implementation for services that use the AsyncOpenAI client.

Bases: LLMSettings

Settings for BaseOpenAILLMService.

Parameters:: max_completion_tokens – Maximum completion tokens to generate.

frequency_penalty: float | None | _NotGiven | NotGiven

presence_penalty: float | None | _NotGiven | NotGiven

seed: int | None | _NotGiven | NotGiven

temperature: float | None | _NotGiven | NotGiven

top_p: float | None | _NotGiven | NotGiven

max_tokens: int | None | _NotGiven | NotGiven

max_completion_tokens: int | _NotGiven | NotGiven

class pipecat.services.openai.base_llm.BaseOpenAILLMService(*, model: str | None = None, api_key=None, base_url=None, organization=None, project=None, default_headers: Mapping[str, str] | None = None, service_tier: str | None = None, params: InputParams | None = None, settings: OpenAILLMSettings | None = None, retry_timeout_secs: float | None = 5.0, retry_on_timeout: bool | None = False, **kwargs)[source]

Bases: LLMService

Base class for all services that use the AsyncOpenAI client.

This service consumes LLMContextFrame frames, which contain a reference to an LLMContext object. The context defines what is sent to the LLM for completion, including user, assistant, and system messages, as well as tool choices and function call configurations.

Settings: alias of OpenAILLMSettings

supports_developer_role: bool = True

Whether this service’s API supports the “developer” message role.

OpenAI’s native API supports it, but some OpenAI-compatible services (e.g. Cerebras) do not. Subclasses that don’t support it should set this to False, which causes the adapter to convert “developer” messages to “user” messages before sending them to the API.

class InputParams(*, frequency_penalty: Annotated[float | None, ~annotated_types.Ge(ge=-2.0), ~annotated_types.Le(le=2.0)] = <factory>, presence_penalty: Annotated[float | None, ~annotated_types.Ge(ge=-2.0), ~annotated_types.Le(le=2.0)] = <factory>, seed: Annotated[int | None, ~annotated_types.Ge(ge=0)] = <factory>, temperature: Annotated[float | None, ~annotated_types.Ge(ge=0.0), ~annotated_types.Le(le=2.0)] = <factory>, top_k: Annotated[int | None, ~annotated_types.Ge(ge=0)] = None, top_p: Annotated[float | None, ~annotated_types.Ge(ge=0.0), ~annotated_types.Le(le=1.0)] = <factory>, max_tokens: Annotated[int | None, ~annotated_types.Ge(ge=1)] = <factory>, max_completion_tokens: Annotated[int | None, ~annotated_types.Ge(ge=1)] = <factory>, service_tier: str | None = <factory>, extra: dict[str, ~typing.Any] | None=<factory>)[source]

Bases: BaseModel

Input parameters for OpenAI model configuration.

Deprecated since version 0.0.105: Use settings=BaseOpenAILLMService.Settings(...) instead of params=InputParams(...).

Parameters:

frequency_penalty – Penalty for frequent tokens (-2.0 to 2.0).
presence_penalty – Penalty for new tokens (-2.0 to 2.0).
seed – Random seed for deterministic outputs.
temperature – Sampling temperature (0.0 to 2.0).
top_k – Top-k sampling parameter (currently ignored by OpenAI).
top_p – Top-p (nucleus) sampling parameter (0.0 to 1.0).
max_tokens – Maximum tokens in response (deprecated, use max_completion_tokens).
max_completion_tokens – Maximum completion tokens to generate.
service_tier – Service tier to use (e.g., “auto”, “flex”, “priority”).
extra – Additional model-specific parameters.

frequency_penalty: float | None

presence_penalty: float | None

seed: int | None

temperature: float | None

top_k: int | None

top_p: float | None

max_tokens: int | None

max_completion_tokens: int | None

service_tier: str | None

extra: dict[str, Any] | None

__init__(*, model: str | None = None, api_key=None, base_url=None, organization=None, project=None, default_headers: Mapping[str, str] | None = None, service_tier: str | None = None, params: InputParams | None = None, settings: OpenAILLMSettings | None = None, retry_timeout_secs: float | None = 5.0, retry_on_timeout: bool | None = False, **kwargs)[source]

Initialize the BaseOpenAILLMService.

Parameters:

model –
The OpenAI model name to use (e.g., “gpt-4.1”, “gpt-4o”).

Deprecated since version 0.0.105: Use settings=BaseOpenAILLMService.Settings(model=...) instead.
api_key – OpenAI API key. If None, uses environment variable.
base_url – Custom base URL for OpenAI API. If None, uses default.
organization – OpenAI organization ID.
project – OpenAI project ID.
default_headers – Additional HTTP headers to include in requests.
service_tier – Service tier to use (e.g., “auto”, “flex”, “priority”).
params –
Input parameters for model configuration and behavior.

Deprecated since version 0.0.105: Use settings=BaseOpenAILLMService.Settings(...) instead.
settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.
retry_timeout_secs – Request timeout in seconds. Defaults to 5.0 seconds.
retry_on_timeout – Whether to retry the request once if it times out.
**kwargs – Additional arguments passed to the parent LLMService.

create_client(api_key=None, base_url=None, organization=None, project=None, default_headers=None, **kwargs)[source]

Create an AsyncOpenAI client instance.

Parameters:

api_key – OpenAI API key.
base_url – Custom base URL for the API.
organization – OpenAI organization ID.
project – OpenAI project ID.
default_headers – Additional HTTP headers.
**kwargs – Additional client configuration arguments.

Returns:

Configured AsyncOpenAI client instance.

can_generate_metrics() → bool[source]

Check if this service can generate processing metrics.

Returns:: True, as OpenAI service supports metrics generation.

set_full_model_name(full_model_name: str)[source]

Set the full AI model name.

Parameters:: full_model_name – The full name of the AI model to use.

get_full_model_name()[source]

Get the current full model name.

Returns:: The full name of the AI model being used.

async get_chat_completions(context: LLMContext) → AsyncStream[ChatCompletionChunk][source]

Get streaming chat completions from OpenAI API with optional timeout and retry.

Parameters:: context – Context to use for the chat completion. Contains messages, tools, and tool choice.
Returns:: Async stream of chat completion chunks.

build_chat_completion_params(params_from_context: OpenAILLMInvocationParams) → dict[source]

Build parameters for chat completion request.

Subclasses can override this to customize parameters for different providers.

Parameters:: params_from_context – Parameters, derived from the LLM context, to use for the chat completion. Contains messages, tools, and tool choice.
Returns:: Dictionary of parameters for the chat completion request.

async run_inference(context: LLMContext, max_tokens: int | None = None, system_instruction: str | None = None) → str | None[source]

Run a one-shot, out-of-band (i.e. out-of-pipeline) inference with the given LLM context.

Parameters:

context – The LLM context containing conversation history.
max_tokens – Optional maximum number of tokens to generate. If provided, overrides the service’s default max_tokens/max_completion_tokens setting.
system_instruction – Optional system instruction to use for this inference. If provided, overrides any system instruction in the context.

Returns:

The LLM’s response as a string, or None if no response is generated.

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process frames for LLM completion requests.

Handles LLMContextFrame to trigger LLM completions.

Parameters:

frame – The frame to process.
direction – The direction of frame processing.