llm
AWS Nova Sonic LLM service implementation for Pipecat AI framework.
This module provides a speech-to-speech LLM service using AWS Nova Sonic, which supports bidirectional audio streaming, text generation, and function calling capabilities.
- exception pipecat.services.aws.nova_sonic.llm.AWSNovaSonicUnhandledFunctionException[source]
Bases:
ExceptionException raised when the LLM attempts to call an unregistered function.
- class pipecat.services.aws.nova_sonic.llm.ContentType(*values)[source]
Bases:
EnumContent types supported by AWS Nova Sonic.
- Parameters:
AUDIO – Audio content type.
TEXT – Text content type.
TOOL – Tool content type.
- AUDIO = 'AUDIO'
- TEXT = 'TEXT'
- TOOL = 'TOOL'
- class pipecat.services.aws.nova_sonic.llm.TextStage(*values)[source]
Bases:
EnumText generation stages in AWS Nova Sonic responses.
- Parameters:
FINAL – Final text that has been fully generated.
SPECULATIVE – Speculative text that is still being generated.
- FINAL = 'FINAL'
- SPECULATIVE = 'SPECULATIVE'
- class pipecat.services.aws.nova_sonic.llm.CurrentContent(type: ContentType, role: Role, text_stage: TextStage, text_content: str)[source]
Bases:
objectRepresents content currently being received from AWS Nova Sonic.
- Parameters:
type – The type of content (audio, text, or tool).
role – The role generating the content (user, assistant, etc.).
text_stage – The stage of text generation (final or speculative).
text_content – The actual text content if applicable.
- type: ContentType
- text_content: str
- class pipecat.services.aws.nova_sonic.llm.Params(*, input_sample_rate: int | None = 16000, input_sample_size: int | None = 16, input_channel_count: int | None = 1, output_sample_rate: int | None = 24000, output_sample_size: int | None = 16, output_channel_count: int | None = 1, max_tokens: int | None = 1024, top_p: float | None = 0.9, temperature: float | None = 0.7, endpointing_sensitivity: str | None = None)[source]
Bases:
BaseModelConfiguration parameters for AWS Nova Sonic.
Deprecated since version 0.0.105: Use
settings=AWSNovaSonicLLMService.Settings(...)for inference settings andaudio_config=AudioConfig(...)for audio configuration.- Parameters:
input_sample_rate – Audio input sample rate in Hz.
input_sample_size – Audio input sample size in bits.
input_channel_count – Number of input audio channels.
output_sample_rate – Audio output sample rate in Hz.
output_sample_size – Audio output sample size in bits.
output_channel_count – Number of output audio channels.
max_tokens – Maximum number of tokens to generate.
top_p – Nucleus sampling parameter.
temperature – Sampling temperature for text generation.
endpointing_sensitivity – Controls how quickly Nova Sonic decides the user has stopped speaking. Can be “LOW”, “MEDIUM”, or “HIGH”, with “HIGH” being the most sensitive (i.e., causing the model to respond most quickly). If not set, uses the model’s default behavior. Only supported with Nova 2 Sonic (the default model).
- input_sample_rate: int | None
- input_sample_size: int | None
- input_channel_count: int | None
- output_sample_rate: int | None
- output_sample_size: int | None
- output_channel_count: int | None
- max_tokens: int | None
- top_p: float | None
- temperature: float | None
- endpointing_sensitivity: str | None
- property audio_config: AudioConfig
Return an
AudioConfigpopulated from this instance’s audio fields.
- class pipecat.services.aws.nova_sonic.llm.AudioConfig(*, input_sample_rate: int | None = 16000, input_sample_size: int | None = 16, input_channel_count: int | None = 1, output_sample_rate: int | None = 24000, output_sample_size: int | None = 16, output_channel_count: int | None = 1)[source]
Bases:
BaseModelAudio configuration for AWS Nova Sonic.
- Parameters:
input_sample_rate – Audio input sample rate in Hz.
input_sample_size – Audio input sample size in bits.
input_channel_count – Number of input audio channels.
output_sample_rate – Audio output sample rate in Hz.
output_sample_size – Audio output sample size in bits.
output_channel_count – Number of output audio channels.
- input_sample_rate: int | None
- input_sample_size: int | None
- input_channel_count: int | None
- output_sample_rate: int | None
- output_sample_size: int | None
- output_channel_count: int | None
- class pipecat.services.aws.nova_sonic.llm.AWSNovaSonicLLMSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, system_instruction: str | None | _NotGiven = <factory>, temperature: float | None | _NotGiven = <factory>, max_tokens: int | None | _NotGiven = <factory>, top_p: float | None | _NotGiven = <factory>, top_k: int | None | _NotGiven = <factory>, frequency_penalty: float | None | _NotGiven = <factory>, presence_penalty: float | None | _NotGiven = <factory>, seed: int | None | _NotGiven = <factory>, filter_incomplete_user_turns: bool | None | _NotGiven = <factory>, user_turn_completion_config: UserTurnCompletionConfig | None | _NotGiven = <factory>, voice: str | _NotGiven = <factory>, endpointing_sensitivity: str | None | _NotGiven = <factory>)[source]
Bases:
LLMSettingsSettings for AWSNovaSonicLLMService.
- Parameters:
voice – Voice identifier for speech synthesis.
endpointing_sensitivity – Controls how quickly Nova Sonic decides the user has stopped speaking. Can be “LOW”, “MEDIUM”, or “HIGH”.
- voice: str | _NotGiven
- endpointing_sensitivity: str | None | _NotGiven
- class pipecat.services.aws.nova_sonic.llm.AWSNovaSonicLLMService(*, secret_access_key: str, access_key_id: str, session_token: str | None = None, region: str, model: str = 'amazon.nova-2-sonic-v1:0', voice_id: str = 'matthew', params: Params | None = None, audio_config: AudioConfig | None = None, settings: AWSNovaSonicLLMSettings | None = None, system_instruction: str | None = None, tools: ToolsSchema | None = None, session_continuation: SessionContinuationParams | None = None, **kwargs)[source]
Bases:
LLMServiceAWS Nova Sonic speech-to-speech LLM service.
Provides bidirectional audio streaming, real-time transcription, text generation, and function calling capabilities using AWS Nova Sonic model.
- Settings
alias of
AWSNovaSonicLLMSettings
- adapter_class
alias of
AWSNovaSonicLLMAdapter
- __init__(*, secret_access_key: str, access_key_id: str, session_token: str | None = None, region: str, model: str = 'amazon.nova-2-sonic-v1:0', voice_id: str = 'matthew', params: Params | None = None, audio_config: AudioConfig | None = None, settings: AWSNovaSonicLLMSettings | None = None, system_instruction: str | None = None, tools: ToolsSchema | None = None, session_continuation: SessionContinuationParams | None = None, **kwargs)[source]
Initializes the AWS Nova Sonic LLM service.
- Parameters:
secret_access_key – AWS secret access key for authentication.
access_key_id – AWS access key ID for authentication.
session_token – AWS session token for authentication.
region – AWS region where the service is hosted. Supported regions: - Nova 2 Sonic (the default model): “us-east-1”, “us-west-2”, “ap-northeast-1” - Nova Sonic (the older model): “us-east-1”, “ap-northeast-1”
model –
Model identifier. Defaults to “amazon.nova-2-sonic-v1:0”.
Deprecated since version 0.0.105: Use
settings=AWSNovaSonicLLMService.Settings(model=...)instead.voice_id –
Voice ID for speech synthesis. Note that some voices are designed for use with a specific language. Options: - Nova 2 Sonic (the default model): see https://docs.aws.amazon.com/nova/latest/nova2-userguide/sonic-language-support.html - Nova Sonic (the older model): see https://docs.aws.amazon.com/nova/latest/userguide/available-voices.html.
Deprecated since version 0.0.105: Use
settings=AWSNovaSonicLLMService.Settings(voice=...)instead.params –
Model parameters for audio configuration and inference.
Deprecated since version 0.0.105: Use
settings=AWSNovaSonicLLMService.Settings(...)for inference settings andaudio_config=AudioConfig(...)for audio configuration.audio_config – Audio configuration (sample rates, sample sizes, channel counts). If not provided, defaults are used.
settings – AWS Nova Sonic LLM settings. If provided together with deprecated top-level parameters, the
settingsvalues take precedence.system_instruction –
System-level instruction for the model.
Deprecated since version 0.0.105: Use
settings=AWSNovaSonicLLMService.Settings(system_instruction=...)instead.tools – Available tools/functions for the model to use.
session_continuation – Configuration for automatic session continuation. When enabled (the default), sessions are seamlessly rotated before the AWS time limit (~8 minutes) with no user-perceptible interruption.
**kwargs – Additional arguments passed to the parent LLMService.
- async start(frame: StartFrame)[source]
Start the service and initiate connection to AWS Nova Sonic.
- Parameters:
frame – The start frame triggering service initialization.
- async stop(frame: EndFrame)[source]
Stop the service and close connections.
- Parameters:
frame – The end frame triggering service shutdown.
- async cancel(frame: CancelFrame)[source]
Cancel the service and close connections.
- Parameters:
frame – The cancel frame triggering service cancellation.
- async reset_conversation()[source]
Reset the conversation state while preserving context.
Cleans up any in-progress assistant response, disconnects from the service, and reconnects with the preserved context.
- async process_frame(frame: Frame, direction: FrameDirection)[source]
Process incoming frames and handle service-specific logic.
- Parameters:
frame – The frame to process.
direction – The direction the frame is traveling.
- create_client() BedrockRuntimeClient[source]
Create a new Bedrock runtime client (NovaSonicSessionSender protocol).
- property audio_config: AudioConfig
Return the audio configuration (NovaSonicSessionSender protocol).
- build_session_start_json() str[source]
Build the
sessionStartevent JSON.Shared between the current and next session setup.
- async send_text(text: str, role: str, prompt_name: str, stream, interactive: bool)[source]
Send a text content block (contentStart/textInput/contentEnd) to the given stream.
- async send_audio_input_start(prompt_name: str, content_name: str, stream)[source]
Send an audio input
contentStartto the given stream.
- async send_audio(audio: bytes, prompt_name: str, content_name: str, stream)[source]
Send an
audioInputevent to the given stream.
- async send_prompt_start(tools: list, prompt_name: str, stream)[source]
Send a
promptStartevent to the given stream.
- AWAIT_TRIGGER_ASSISTANT_RESPONSE_INSTRUCTION = "Start speaking when you hear the user say 'ready', but don't consider that 'ready' to be a meaningful part of the conversation other than as a trigger for you to start speaking."