llm

AWS Nova Sonic LLM service implementation for Pipecat AI framework.

This module provides a speech-to-speech LLM service using AWS Nova Sonic, which supports bidirectional audio streaming, text generation, and function calling capabilities.

exception pipecat.services.aws.nova_sonic.llm.AWSNovaSonicUnhandledFunctionException[source]

Bases: Exception

Exception raised when the LLM attempts to call an unregistered function.

class pipecat.services.aws.nova_sonic.llm.ContentType(*values)[source]

Bases: Enum

Content types supported by AWS Nova Sonic.

Parameters:

AUDIO – Audio content type.
TEXT – Text content type.
TOOL – Tool content type.

AUDIO = 'AUDIO'

TEXT = 'TEXT'

TOOL = 'TOOL'

class pipecat.services.aws.nova_sonic.llm.TextStage(*values)[source]

Bases: Enum

Text generation stages in AWS Nova Sonic responses.

Parameters:

FINAL – Final text that has been fully generated.
SPECULATIVE – Speculative text that is still being generated.

FINAL = 'FINAL'

SPECULATIVE = 'SPECULATIVE'

class pipecat.services.aws.nova_sonic.llm.CurrentContent(type: ContentType, role: Role, text_stage: TextStage, text_content: str)[source]

Bases: object

Represents content currently being received from AWS Nova Sonic.

Parameters:

type – The type of content (audio, text, or tool).
role – The role generating the content (user, assistant, etc.).
text_stage – The stage of text generation (final or speculative).
text_content – The actual text content if applicable.

type: ContentType

role: Role

text_stage: TextStage

text_content: str

Bases: BaseModel

Configuration parameters for AWS Nova Sonic.

Deprecated since version 0.0.105: Use settings=AWSNovaSonicLLMService.Settings(...) for inference settings and audio_config=AudioConfig(...) for audio configuration.

Parameters:

input_sample_rate – Audio input sample rate in Hz.
input_sample_size – Audio input sample size in bits.
input_channel_count – Number of input audio channels.
output_sample_rate – Audio output sample rate in Hz.
output_sample_size – Audio output sample size in bits.
output_channel_count – Number of output audio channels.
max_tokens – Maximum number of tokens to generate.
top_p – Nucleus sampling parameter.
temperature – Sampling temperature for text generation.
endpointing_sensitivity – Controls how quickly Nova Sonic decides the user has stopped speaking. Can be “LOW”, “MEDIUM”, or “HIGH”, with “HIGH” being the most sensitive (i.e., causing the model to respond most quickly). If not set, uses the model’s default behavior. Only supported with Nova 2 Sonic (the default model).

input_sample_rate: int | None

input_sample_size: int | None

input_channel_count: int | None

output_sample_rate: int | None

output_sample_size: int | None

output_channel_count: int | None

max_tokens: int | None

top_p: float | None

temperature: float | None

endpointing_sensitivity: str | None

property audio_config: AudioConfig: Return an AudioConfig populated from this instance’s audio fields.

class pipecat.services.aws.nova_sonic.llm.AudioConfig(*, input_sample_rate: int | None = 16000, input_sample_size: int | None = 16, input_channel_count: int | None = 1, output_sample_rate: int | None = 24000, output_sample_size: int | None = 16, output_channel_count: int | None = 1)[source]

Bases: BaseModel

Audio configuration for AWS Nova Sonic.

Parameters:

input_sample_rate – Audio input sample rate in Hz.
input_sample_size – Audio input sample size in bits.
input_channel_count – Number of input audio channels.
output_sample_rate – Audio output sample rate in Hz.
output_sample_size – Audio output sample size in bits.
output_channel_count – Number of output audio channels.

input_sample_rate: int | None

input_sample_size: int | None

input_channel_count: int | None

output_sample_rate: int | None

output_sample_size: int | None

output_channel_count: int | None

Bases: LLMSettings

Settings for AWSNovaSonicLLMService.

Parameters:

voice – Voice identifier for speech synthesis.
endpointing_sensitivity – Controls how quickly Nova Sonic decides the user has stopped speaking. Can be “LOW”, “MEDIUM”, or “HIGH”.

voice: str | _NotGiven

endpointing_sensitivity: str | None | _NotGiven

class pipecat.services.aws.nova_sonic.llm.AWSNovaSonicLLMService(*, secret_access_key: str, access_key_id: str, session_token: str | None = None, region: str, model: str = 'amazon.nova-2-sonic-v1:0', voice_id: str = 'matthew', params: Params | None = None, audio_config: AudioConfig | None = None, settings: AWSNovaSonicLLMSettings | None = None, system_instruction: str | None = None, tools: ToolsSchema | None = None, session_continuation: SessionContinuationParams | None = None, **kwargs)[source]

Bases: LLMService

AWS Nova Sonic speech-to-speech LLM service.

Provides bidirectional audio streaming, real-time transcription, text generation, and function calling capabilities using AWS Nova Sonic model.

Settings: alias of AWSNovaSonicLLMSettings

adapter_class: alias of AWSNovaSonicLLMAdapter

__init__(*, secret_access_key: str, access_key_id: str, session_token: str | None = None, region: str, model: str = 'amazon.nova-2-sonic-v1:0', voice_id: str = 'matthew', params: Params | None = None, audio_config: AudioConfig | None = None, settings: AWSNovaSonicLLMSettings | None = None, system_instruction: str | None = None, tools: ToolsSchema | None = None, session_continuation: SessionContinuationParams | None = None, **kwargs)[source]

Initializes the AWS Nova Sonic LLM service.

Parameters:

secret_access_key – AWS secret access key for authentication.
access_key_id – AWS access key ID for authentication.
session_token – AWS session token for authentication.
region – AWS region where the service is hosted. Supported regions: - Nova 2 Sonic (the default model): “us-east-1”, “us-west-2”, “ap-northeast-1” - Nova Sonic (the older model): “us-east-1”, “ap-northeast-1”
model –
Model identifier. Defaults to “amazon.nova-2-sonic-v1:0”.

Deprecated since version 0.0.105: Use settings=AWSNovaSonicLLMService.Settings(model=...) instead.
voice_id –
Voice ID for speech synthesis. Note that some voices are designed for use with a specific language. Options: - Nova 2 Sonic (the default model): see https://docs.aws.amazon.com/nova/latest/nova2-userguide/sonic-language-support.html - Nova Sonic (the older model): see https://docs.aws.amazon.com/nova/latest/userguide/available-voices.html.

Deprecated since version 0.0.105: Use settings=AWSNovaSonicLLMService.Settings(voice=...) instead.
params –
Model parameters for audio configuration and inference.

Deprecated since version 0.0.105: Use settings=AWSNovaSonicLLMService.Settings(...) for inference settings and audio_config=AudioConfig(...) for audio configuration.
audio_config – Audio configuration (sample rates, sample sizes, channel counts). If not provided, defaults are used.
settings – AWS Nova Sonic LLM settings. If provided together with deprecated top-level parameters, the settings values take precedence.
system_instruction –
System-level instruction for the model.

Deprecated since version 0.0.105: Use settings=AWSNovaSonicLLMService.Settings(system_instruction=...) instead.
tools – Available tools/functions for the model to use.
session_continuation – Configuration for automatic session continuation. When enabled (the default), sessions are seamlessly rotated before the AWS time limit (~8 minutes) with no user-perceptible interruption.
**kwargs – Additional arguments passed to the parent LLMService.

async start(frame: StartFrame)[source]

Start the service and initiate connection to AWS Nova Sonic.

Parameters:: frame – The start frame triggering service initialization.

async stop(frame: EndFrame)[source]

Stop the service and close connections.

Parameters:: frame – The end frame triggering service shutdown.

async cancel(frame: CancelFrame)[source]

Cancel the service and close connections.

Parameters:: frame – The cancel frame triggering service cancellation.

async reset_conversation()[source]

Reset the conversation state while preserving context.

Cleans up any in-progress assistant response, disconnects from the service, and reconnects with the preserved context.

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process incoming frames and handle service-specific logic.

Parameters:

frame – The frame to process.
direction – The direction the frame is traveling.

create_client() → BedrockRuntimeClient[source]: Create a new Bedrock runtime client (NovaSonicSessionSender protocol).

property audio_config: AudioConfig: Return the audio configuration (NovaSonicSessionSender protocol).

build_session_start_json() → str[source]

Build the sessionStart event JSON.

Shared between the current and next session setup.

async open_stream(client)[source]: Open a bidirectional stream on the given client.

async send_event(event_json: str, stream)[source]: Send a raw event JSON to the given stream.

async send_text(text: str, role: str, prompt_name: str, stream, interactive: bool)[source]: Send a text content block (contentStart/textInput/contentEnd) to the given stream.

async send_audio_input_start(prompt_name: str, content_name: str, stream)[source]: Send an audio input contentStart to the given stream.

async send_audio(audio: bytes, prompt_name: str, content_name: str, stream)[source]: Send an audioInput event to the given stream.

async send_prompt_start(tools: list, prompt_name: str, stream)[source]: Send a promptStart event to the given stream.

get_setup_params()[source]: Return (system_instruction, tools) for the next session setup.

AWAIT_TRIGGER_ASSISTANT_RESPONSE_INSTRUCTION = "Start speaking when you hear the user say 'ready', but don't consider that 'ready' to be a meaningful part of the conversation other than as a trigger for you to start speaking."

async trigger_assistant_response()[source]

Trigger an assistant response by sending audio cue.

Sends a pre-recorded “ready” audio trigger to prompt the assistant to start speaking. This is useful for controlling conversation flow.