llm

AWS Nova Sonic LLM service implementation for Pipecat AI framework.

This module provides a speech-to-speech LLM service using AWS Nova Sonic, which supports bidirectional audio streaming, text generation, and function calling capabilities.

exception pipecat.services.aws.nova_sonic.llm.AWSNovaSonicUnhandledFunctionException[source]

Bases: Exception

Exception raised when the LLM attempts to call an unregistered function.

class pipecat.services.aws.nova_sonic.llm.ContentType(*values)[source]

Bases: Enum

Content types supported by AWS Nova Sonic.

Parameters:
  • AUDIO – Audio content type.

  • TEXT – Text content type.

  • TOOL – Tool content type.

AUDIO = 'AUDIO'
TEXT = 'TEXT'
TOOL = 'TOOL'
class pipecat.services.aws.nova_sonic.llm.TextStage(*values)[source]

Bases: Enum

Text generation stages in AWS Nova Sonic responses.

Parameters:
  • FINAL – Final text that has been fully generated.

  • SPECULATIVE – Speculative text that is still being generated.

FINAL = 'FINAL'
SPECULATIVE = 'SPECULATIVE'
class pipecat.services.aws.nova_sonic.llm.CurrentContent(type: ContentType, role: Role, text_stage: TextStage, text_content: str)[source]

Bases: object

Represents content currently being received from AWS Nova Sonic.

Parameters:
  • type – The type of content (audio, text, or tool).

  • role – The role generating the content (user, assistant, etc.).

  • text_stage – The stage of text generation (final or speculative).

  • text_content – The actual text content if applicable.

type: ContentType
role: Role
text_stage: TextStage
text_content: str
class pipecat.services.aws.nova_sonic.llm.Params(*, input_sample_rate: int | None = 16000, input_sample_size: int | None = 16, input_channel_count: int | None = 1, output_sample_rate: int | None = 24000, output_sample_size: int | None = 16, output_channel_count: int | None = 1, max_tokens: int | None = 1024, top_p: float | None = 0.9, temperature: float | None = 0.7, endpointing_sensitivity: str | None = None)[source]

Bases: BaseModel

Configuration parameters for AWS Nova Sonic.

Deprecated since version 0.0.105: Use settings=AWSNovaSonicLLMService.Settings(...) for inference settings and audio_config=AudioConfig(...) for audio configuration.

Parameters:
  • input_sample_rate – Audio input sample rate in Hz.

  • input_sample_size – Audio input sample size in bits.

  • input_channel_count – Number of input audio channels.

  • output_sample_rate – Audio output sample rate in Hz.

  • output_sample_size – Audio output sample size in bits.

  • output_channel_count – Number of output audio channels.

  • max_tokens – Maximum number of tokens to generate.

  • top_p – Nucleus sampling parameter.

  • temperature – Sampling temperature for text generation.

  • endpointing_sensitivity – Controls how quickly Nova Sonic decides the user has stopped speaking. Can be “LOW”, “MEDIUM”, or “HIGH”, with “HIGH” being the most sensitive (i.e., causing the model to respond most quickly). If not set, uses the model’s default behavior. Only supported with Nova 2 Sonic (the default model).

input_sample_rate: int | None
input_sample_size: int | None
input_channel_count: int | None
output_sample_rate: int | None
output_sample_size: int | None
output_channel_count: int | None
max_tokens: int | None
top_p: float | None
temperature: float | None
endpointing_sensitivity: str | None
property audio_config: AudioConfig

Return an AudioConfig populated from this instance’s audio fields.

class pipecat.services.aws.nova_sonic.llm.AudioConfig(*, input_sample_rate: int | None = 16000, input_sample_size: int | None = 16, input_channel_count: int | None = 1, output_sample_rate: int | None = 24000, output_sample_size: int | None = 16, output_channel_count: int | None = 1)[source]

Bases: BaseModel

Audio configuration for AWS Nova Sonic.

Parameters:
  • input_sample_rate – Audio input sample rate in Hz.

  • input_sample_size – Audio input sample size in bits.

  • input_channel_count – Number of input audio channels.

  • output_sample_rate – Audio output sample rate in Hz.

  • output_sample_size – Audio output sample size in bits.

  • output_channel_count – Number of output audio channels.

input_sample_rate: int | None
input_sample_size: int | None
input_channel_count: int | None
output_sample_rate: int | None
output_sample_size: int | None
output_channel_count: int | None
class pipecat.services.aws.nova_sonic.llm.AWSNovaSonicLLMSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, system_instruction: str | None | _NotGiven = <factory>, temperature: float | None | _NotGiven = <factory>, max_tokens: int | None | _NotGiven = <factory>, top_p: float | None | _NotGiven = <factory>, top_k: int | None | _NotGiven = <factory>, frequency_penalty: float | None | _NotGiven = <factory>, presence_penalty: float | None | _NotGiven = <factory>, seed: int | None | _NotGiven = <factory>, filter_incomplete_user_turns: bool | None | _NotGiven = <factory>, user_turn_completion_config: UserTurnCompletionConfig | None | _NotGiven = <factory>, voice: str | _NotGiven = <factory>, endpointing_sensitivity: str | None | _NotGiven = <factory>)[source]

Bases: LLMSettings

Settings for AWSNovaSonicLLMService.

Parameters:
  • voice – Voice identifier for speech synthesis.

  • endpointing_sensitivity – Controls how quickly Nova Sonic decides the user has stopped speaking. Can be “LOW”, “MEDIUM”, or “HIGH”.

voice: str | _NotGiven
endpointing_sensitivity: str | None | _NotGiven
class pipecat.services.aws.nova_sonic.llm.AWSNovaSonicLLMService(*, secret_access_key: str, access_key_id: str, session_token: str | None = None, region: str, model: str = 'amazon.nova-2-sonic-v1:0', voice_id: str = 'matthew', params: Params | None = None, audio_config: AudioConfig | None = None, settings: AWSNovaSonicLLMSettings | None = None, system_instruction: str | None = None, tools: ToolsSchema | None = None, session_continuation: SessionContinuationParams | None = None, **kwargs)[source]

Bases: LLMService

AWS Nova Sonic speech-to-speech LLM service.

Provides bidirectional audio streaming, real-time transcription, text generation, and function calling capabilities using AWS Nova Sonic model.

Settings

alias of AWSNovaSonicLLMSettings

adapter_class

alias of AWSNovaSonicLLMAdapter

__init__(*, secret_access_key: str, access_key_id: str, session_token: str | None = None, region: str, model: str = 'amazon.nova-2-sonic-v1:0', voice_id: str = 'matthew', params: Params | None = None, audio_config: AudioConfig | None = None, settings: AWSNovaSonicLLMSettings | None = None, system_instruction: str | None = None, tools: ToolsSchema | None = None, session_continuation: SessionContinuationParams | None = None, **kwargs)[source]

Initializes the AWS Nova Sonic LLM service.

Parameters:
  • secret_access_key – AWS secret access key for authentication.

  • access_key_id – AWS access key ID for authentication.

  • session_token – AWS session token for authentication.

  • region – AWS region where the service is hosted. Supported regions: - Nova 2 Sonic (the default model): “us-east-1”, “us-west-2”, “ap-northeast-1” - Nova Sonic (the older model): “us-east-1”, “ap-northeast-1”

  • model

    Model identifier. Defaults to “amazon.nova-2-sonic-v1:0”.

    Deprecated since version 0.0.105: Use settings=AWSNovaSonicLLMService.Settings(model=...) instead.

  • voice_id

    Voice ID for speech synthesis. Note that some voices are designed for use with a specific language. Options: - Nova 2 Sonic (the default model): see https://docs.aws.amazon.com/nova/latest/nova2-userguide/sonic-language-support.html - Nova Sonic (the older model): see https://docs.aws.amazon.com/nova/latest/userguide/available-voices.html.

    Deprecated since version 0.0.105: Use settings=AWSNovaSonicLLMService.Settings(voice=...) instead.

  • params

    Model parameters for audio configuration and inference.

    Deprecated since version 0.0.105: Use settings=AWSNovaSonicLLMService.Settings(...) for inference settings and audio_config=AudioConfig(...) for audio configuration.

  • audio_config – Audio configuration (sample rates, sample sizes, channel counts). If not provided, defaults are used.

  • settings – AWS Nova Sonic LLM settings. If provided together with deprecated top-level parameters, the settings values take precedence.

  • system_instruction

    System-level instruction for the model.

    Deprecated since version 0.0.105: Use settings=AWSNovaSonicLLMService.Settings(system_instruction=...) instead.

  • tools – Available tools/functions for the model to use.

  • session_continuation – Configuration for automatic session continuation. When enabled (the default), sessions are seamlessly rotated before the AWS time limit (~8 minutes) with no user-perceptible interruption.

  • **kwargs – Additional arguments passed to the parent LLMService.

async start(frame: StartFrame)[source]

Start the service and initiate connection to AWS Nova Sonic.

Parameters:

frame – The start frame triggering service initialization.

async stop(frame: EndFrame)[source]

Stop the service and close connections.

Parameters:

frame – The end frame triggering service shutdown.

async cancel(frame: CancelFrame)[source]

Cancel the service and close connections.

Parameters:

frame – The cancel frame triggering service cancellation.

async reset_conversation()[source]

Reset the conversation state while preserving context.

Cleans up any in-progress assistant response, disconnects from the service, and reconnects with the preserved context.

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process incoming frames and handle service-specific logic.

Parameters:
  • frame – The frame to process.

  • direction – The direction the frame is traveling.

create_client() BedrockRuntimeClient[source]

Create a new Bedrock runtime client (NovaSonicSessionSender protocol).

property audio_config: AudioConfig

Return the audio configuration (NovaSonicSessionSender protocol).

build_session_start_json() str[source]

Build the sessionStart event JSON.

Shared between the current and next session setup.

async open_stream(client)[source]

Open a bidirectional stream on the given client.

async send_event(event_json: str, stream)[source]

Send a raw event JSON to the given stream.

async send_text(text: str, role: str, prompt_name: str, stream, interactive: bool)[source]

Send a text content block (contentStart/textInput/contentEnd) to the given stream.

async send_audio_input_start(prompt_name: str, content_name: str, stream)[source]

Send an audio input contentStart to the given stream.

async send_audio(audio: bytes, prompt_name: str, content_name: str, stream)[source]

Send an audioInput event to the given stream.

async send_prompt_start(tools: list, prompt_name: str, stream)[source]

Send a promptStart event to the given stream.

get_setup_params()[source]

Return (system_instruction, tools) for the next session setup.

AWAIT_TRIGGER_ASSISTANT_RESPONSE_INSTRUCTION = "Start speaking when you hear the user say 'ready', but don't consider that 'ready' to be a meaningful part of the conversation other than as a trigger for you to start speaking."
async trigger_assistant_response()[source]

Trigger an assistant response by sending audio cue.

Sends a pre-recorded “ready” audio trigger to prompt the assistant to start speaking. This is useful for controlling conversation flow.