llm
Google Gemini integration for Pipecat.
This module provides Google Gemini integration for the Pipecat framework, including LLM services, context management, and message aggregation.
- class pipecat.services.google.llm.GoogleThinkingConfig(*, thinking_budget: int | None = None, thinking_level: Literal['low', 'high', 'medium', 'minimal'] | str | None = None, include_thoughts: bool | None = None)[source]
Bases:
BaseModelConfiguration for controlling the model’s internal “thinking” process used before generating a response.
Gemini 2.5 and 3 series models have this thinking process.
- Parameters:
thinking_level – Thinking level for Gemini 3 models. For Gemini 3 Pro, this can be “low” or “high”. For Gemini 3 Flash, this can be “minimal”, “low”, “medium”, or “high”. If not provided, Gemini 3 models default to “high”. Note: Gemini 2.5 series must use thinking_budget instead.
thinking_budget – Token budget for thinking, for Gemini 2.5 series. -1 for dynamic thinking (model decides), 0 to disable thinking, or a specific token count (e.g., 128-32768 for 2.5 Pro). If not provided, most models today default to dynamic thinking. See https://ai.google.dev/gemini-api/docs/thinking#set-budget for default values and allowed ranges. Note: Gemini 3 models must use thinking_level instead.
include_thoughts – Whether to include thought summaries in the response. Today’s models default to not including thoughts (False).
- thinking_budget: int | None
- thinking_level: Literal['low', 'high', 'medium', 'minimal'] | str | None
- include_thoughts: bool | None
- class pipecat.services.google.llm.GoogleLLMSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, system_instruction: str | None | _NotGiven = <factory>, temperature: float | None | _NotGiven = <factory>, max_tokens: int | None | _NotGiven = <factory>, top_p: float | None | _NotGiven = <factory>, top_k: int | None | _NotGiven = <factory>, frequency_penalty: float | None | _NotGiven = <factory>, presence_penalty: float | None | _NotGiven = <factory>, seed: int | None | _NotGiven = <factory>, filter_incomplete_user_turns: bool | None | _NotGiven = <factory>, user_turn_completion_config: UserTurnCompletionConfig | None | _NotGiven = <factory>, thinking: GoogleLLMService.ThinkingConfig | None | _NotGiven = <factory>)[source]
Bases:
LLMSettingsSettings for GoogleLLMService.
- Parameters:
thinking – Thinking configuration.
- thinking: GoogleLLMService.ThinkingConfig | None | _NotGiven
- classmethod from_mapping(settings)[source]
Convert a plain dict to settings, coercing thinking dicts.
For backward compatibility, a
thinkingvalue that is a plain dict is converted to aGoogleLLMService.ThinkingConfig.
- class pipecat.services.google.llm.GoogleLLMService(*, api_key: str, model: str | None = None, params: InputParams | None = None, settings: GoogleLLMSettings | None = None, system_instruction: str | None = None, tools: list[dict[str, Any]] | None = None, tool_config: dict[str, Any] | None = None, http_options: HttpOptions | None = None, **kwargs)[source]
Bases:
LLMServiceGoogle AI (Gemini) LLM service implementation.
This class implements inference with Google’s AI models, translating internally from an LLMContext to the messages format expected by the Google AI model.
- Settings
alias of
GoogleLLMSettings
- adapter_class
alias of
GeminiLLMAdapter
- ThinkingConfig
alias of
GoogleThinkingConfig
- class InputParams(**data: Any)[source]
Bases:
BaseModelInput parameters for Google AI models.
Deprecated since version 0.0.105: Use
settings=GoogleLLMService.Settings(...)instead.- Parameters:
max_tokens – Maximum number of tokens to generate.
temperature – Sampling temperature between 0.0 and 2.0.
top_k – Top-k sampling parameter.
top_p – Top-p sampling parameter between 0.0 and 1.0.
thinking – Thinking configuration with thinking_budget, thinking_level, and include_thoughts. Used to control the model’s internal “thinking” process used before generating a response. Gemini 2.5 series models use thinking_budget; Gemini 3 models use thinking_level. If this is not provided, Pipecat disables thinking for all models where that’s possible (the 2.5 series, except 2.5 Pro), to reduce latency.
extra – Additional parameters as a dictionary.
- max_tokens: int | None
- temperature: float | None
- top_k: int | None
- top_p: float | None
- thinking: GoogleLLMService.ThinkingConfig | None
- extra: dict[str, Any] | None
- __init__(*, api_key: str, model: str | None = None, params: InputParams | None = None, settings: GoogleLLMSettings | None = None, system_instruction: str | None = None, tools: list[dict[str, Any]] | None = None, tool_config: dict[str, Any] | None = None, http_options: HttpOptions | None = None, **kwargs)[source]
Initialize the Google LLM service.
- Parameters:
api_key – Google AI API key for authentication.
model –
Model name to use.
Deprecated since version 0.0.105: Use
settings=GoogleLLMService.Settings(model=...)instead.params –
Optional model parameters for inference.
Deprecated since version 0.0.105: Use
settings=GoogleLLMService.Settings(...)instead.settings – Runtime-updatable settings for this service. When both deprecated parameters and settings are provided, settings values take precedence.
system_instruction –
System instruction/prompt for the model.
Deprecated since version 0.0.105: Use
settings=GoogleLLMService.Settings(system_instruction=...)instead.tools – List of available tools/functions.
tool_config – Configuration for tool usage.
http_options – HTTP options for the client.
**kwargs – Additional arguments passed to parent class.
- can_generate_metrics() bool[source]
Check if the service can generate usage metrics.
- Returns:
True, as Google AI provides token usage metrics.
- async run_inference(context: LLMContext, max_tokens: int | None = None, system_instruction: str | None = None) str | None[source]
Run a one-shot, out-of-band (i.e. out-of-pipeline) inference with the given LLM context.
- Parameters:
context – The LLM context containing conversation history.
max_tokens – Optional maximum number of tokens to generate. If provided, overrides the service’s default max_tokens setting.
system_instruction – Optional system instruction to use for this inference. If provided, overrides any system instruction in the context.
- Returns:
The LLM’s response as a string, or None if no response is generated.
- async process_frame(frame: Frame, direction: FrameDirection)[source]
Process incoming frames and handle different frame types.
- Parameters:
frame – The frame to process.
direction – Direction of frame processing.