llm

Anthropic AI service integration for Pipecat.

This module provides LLM services and context management for Anthropic’s Claude models, including support for function calling, vision, and prompt caching features.

class pipecat.services.anthropic.llm.AnthropicThinkingConfig(*, type: Literal['enabled', 'disabled'] | str, budget_tokens: int | None = None)[source]

Bases: BaseModel

Configuration for extended thinking.

Parameters:

type – Type of thinking mode (currently only “enabled” or “disabled”).
budget_tokens – Maximum number of tokens for thinking. With today’s models, the minimum is 1024. Currently required when type is “enabled”, not allowed when “disabled”.

type: Literal['enabled', 'disabled'] | str

budget_tokens: int | None

Bases: LLMSettings

Settings for AnthropicLLMService.

Parameters:

enable_prompt_caching – Whether to enable prompt caching.
thinking – Extended thinking configuration.

enable_prompt_caching: bool | _NotGiven

temperature: float | None | _NotGiven | NotGiven

top_k: int | None | _NotGiven | NotGiven

top_p: float | None | _NotGiven | NotGiven

thinking: AnthropicLLMService.ThinkingConfig | _NotGiven | NotGiven

classmethod from_mapping(settings)[source]

Convert a plain dict to settings, coercing thinking dicts.

For backward compatibility, a thinking value that is a plain dict is converted to a AnthropicLLMService.ThinkingConfig.

class pipecat.services.anthropic.llm.AnthropicLLMService(*, api_key: str, model: str | None = None, params: InputParams | None = None, settings: AnthropicLLMSettings | None = None, client=None, retry_timeout_secs: float | None = 5.0, retry_on_timeout: bool | None = False, **kwargs)[source]

Bases: LLMService

LLM service for Anthropic’s Claude models.

Provides inference capabilities with Claude models including support for function calling, vision processing, streaming responses, and prompt caching. Can use custom clients like AsyncAnthropicBedrock and AsyncAnthropicVertex.

Settings: alias of AnthropicLLMSettings

adapter_class: alias of AnthropicLLMAdapter

ThinkingConfig: alias of AnthropicThinkingConfig

class InputParams(**data: Any)[source]

Bases: BaseModel

Input parameters for Anthropic model inference.

Deprecated since version 0.0.105: Use AnthropicLLMService.Settings instead. Pass settings directly via the settings parameter of AnthropicLLMService.

Parameters:

enable_prompt_caching – Whether to enable the prompt caching feature.
max_tokens – Maximum tokens to generate. Must be at least 1.
temperature – Sampling temperature between 0.0 and 1.0.
top_k – Top-k sampling parameter.
top_p – Top-p sampling parameter between 0.0 and 1.0.
thinking – Extended thinking configuration. Enabling extended thinking causes the model to spend more time “thinking” before responding. It also causes this service to emit LLMThinking*Frames during response generation. Extended thinking is disabled by default.
extra – Additional parameters to pass to the API.

enable_prompt_caching: bool | None

max_tokens: int | None

temperature: float | None

top_k: int | None

top_p: float | None

thinking: AnthropicLLMService.ThinkingConfig | None

extra: dict[str, Any] | None

Initialize the Anthropic LLM service.

Parameters:

api_key – Anthropic API key for authentication.
model –
Model name to use.

Deprecated since version 0.0.105: Use settings=AnthropicLLMService.Settings(model=...) instead.
params –
Optional model parameters for inference.

Deprecated since version 0.0.105: Use settings=AnthropicLLMService.Settings(...) instead.
settings – Runtime-updatable settings for this service. When both deprecated parameters and settings are provided, settings values take precedence.
client – Optional custom Anthropic client instance.
retry_timeout_secs – Request timeout in seconds for retry logic.
retry_on_timeout – Whether to retry the request once if it times out.
**kwargs – Additional arguments passed to parent LLMService.

can_generate_metrics() → bool[source]

Check if this service can generate usage metrics.

Returns:: True, as Anthropic provides detailed token usage metrics.

async run_inference(context: LLMContext, max_tokens: int | None = None, system_instruction: str | None = None) → str | None[source]

Run a one-shot, out-of-band (i.e. out-of-pipeline) inference with the given LLM context.

Parameters:

context – The LLM context containing conversation history.
max_tokens – Optional maximum number of tokens to generate. If provided, overrides the service’s default max_tokens setting.
system_instruction – Optional system instruction to use for this inference. If provided, overrides any system instruction in the context.

Returns:

The LLM’s response as a string, or None if no response is generated.

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process incoming frames and route them appropriately.

Handles various frame types including context frames, message frames, vision frames, and settings updates.

Parameters:

frame – The frame to process.
direction – The direction of frame processing.