llm

AWS Bedrock integration for Large Language Model services.

This module provides AWS Bedrock LLM service implementation with support for Amazon Nova and Anthropic Claude models, including vision capabilities and function calling.

Bases: LLMSettings

Settings for AWSBedrockLLMService.

Parameters:

stop_sequences – List of strings that stop generation.
latency – Performance mode - “standard” or “optimized”.
enable_prompt_caching – Whether to enable prompt caching by adding cachePoint markers to system prompts and tool definitions. Can reduce TTFT by up to 85% for multi-turn conversations. See: https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html
additional_model_request_fields – Additional model-specific parameters.

stop_sequences: list[str] | None | _NotGiven

latency: str | None | _NotGiven

enable_prompt_caching: bool | _NotGiven

additional_model_request_fields: dict[str, Any] | _NotGiven

Bases: LLMService

AWS Bedrock Large Language Model service implementation.

Provides inference capabilities for AWS Bedrock models including Amazon Nova and Anthropic Claude. Supports streaming responses, function calling, and vision capabilities.

Settings: alias of AWSBedrockLLMSettings

adapter_class: alias of AWSBedrockLLMAdapter

class InputParams(*, max_tokens: Annotated[int | None, ~annotated_types.Ge(ge=1)] = None, temperature: Annotated[float | None, ~annotated_types.Ge(ge=0.0), ~annotated_types.Le(le=1.0)] = None, top_p: Annotated[float | None, ~annotated_types.Ge(ge=0.0), ~annotated_types.Le(le=1.0)] = None, stop_sequences: list[str] | None = <factory>, latency: str | None = None, additional_model_request_fields: dict[str, ~typing.Any] | None=<factory>)[source]

Bases: BaseModel

Input parameters for AWS Bedrock LLM service.

Deprecated since version 0.0.105: Use AWSBedrockLLMService.Settings instead. Pass settings directly via the settings parameter of AWSBedrockLLMService.

Parameters:

max_tokens – Maximum number of tokens to generate.
temperature – Sampling temperature between 0.0 and 1.0.
top_p – Nucleus sampling parameter between 0.0 and 1.0.
stop_sequences – List of strings that stop generation.
latency – Performance mode - “standard” or “optimized”.
additional_model_request_fields – Additional model-specific parameters.

max_tokens: int | None

temperature: float | None

top_p: float | None

stop_sequences: list[str] | None

latency: str | None

additional_model_request_fields: dict[str, Any] | None

Initialize the AWS Bedrock LLM service.

Parameters:

model –
The AWS Bedrock model identifier to use.

Deprecated since version 0.0.105: Use settings=AWSBedrockLLMService.Settings(model=...) instead.
aws_access_key – AWS access key ID. If None, uses default credentials.
aws_secret_key – AWS secret access key. If None, uses default credentials.
aws_session_token – AWS session token for temporary credentials.
aws_region – AWS region for the Bedrock service.
params –
Model parameters and configuration.

Deprecated since version 0.0.105: Use settings=AWSBedrockLLMService.Settings(...) instead.
settings – Runtime-updatable settings for this service. When both deprecated parameters and settings are provided, settings values take precedence.
stop_sequences –
List of strings that stop generation.

Deprecated since version 0.0.105: Use settings=AWSBedrockLLMService.Settings(stop_sequences=...) instead.
client_config – Custom boto3 client configuration.
retry_timeout_secs – Request timeout in seconds for retry logic.
retry_on_timeout – Whether to retry the request once if it times out.
**kwargs – Additional arguments passed to parent LLMService.

can_generate_metrics() → bool[source]

Check if the service can generate usage metrics.

Returns:: True if metrics generation is supported.

async run_inference(context: LLMContext, max_tokens: int | None = None, system_instruction: str | None = None) → str | None[source]

Run a one-shot, out-of-band (i.e. out-of-pipeline) inference with the given LLM context.

Parameters:

context – The LLM context containing conversation history.
max_tokens – Optional maximum number of tokens to generate. If provided, overrides the service’s default max_tokens setting.
system_instruction – Optional system instruction to use for this inference. If provided, overrides any system instruction in the context.

Returns:

The LLM’s response as a string, or None if no response is generated.

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process incoming frames and handle LLM-specific frame types.

Parameters:

frame – The frame to process.
direction – The direction of frame processing.