audio_buffer_processor

Audio buffer processor for managing and synchronizing audio streams.

This module provides an AudioBufferProcessor that handles buffering and synchronization of audio from both user input and bot output sources, with support for various audio configurations and event-driven processing.

class pipecat.processors.audio.audio_buffer_processor.AudioBufferProcessor(*, sample_rate: int | None = None, num_channels: int = 1, buffer_size: int = 0, enable_turn_audio: bool = False, **kwargs)[source]

Bases: FrameProcessor

Processes and buffers audio frames from both input (user) and output (bot) sources.

This processor manages audio buffering and synchronization, providing both merged and track-specific audio access through event handlers. It supports various audio configurations including sample rate conversion and mono/stereo output.

Events:

on_audio_data: Triggered when buffer_size is reached, providing merged audio
on_track_audio_data: Triggered when buffer_size is reached, providing separate tracks
on_user_turn_audio_data: Triggered when user turn has ended, providing that user turn’s audio
on_bot_turn_audio_data: Triggered when bot turn has ended, providing that bot turn’s audio

Audio handling:

Mono output (num_channels=1): User and bot audio are mixed
Stereo output (num_channels=2): User audio on left, bot audio on right
Automatic resampling of incoming audio to match desired sample_rate
Silence insertion for non-continuous audio streams
Buffer synchronization between user and bot audio

__init__(*, sample_rate: int | None = None, num_channels: int = 1, buffer_size: int = 0, enable_turn_audio: bool = False, **kwargs)[source]

Initialize the audio buffer processor.

Parameters:

sample_rate – Desired output sample rate. If None, uses source rate.
num_channels – Number of channels (1 for mono, 2 for stereo). Defaults to 1.
buffer_size – Size of buffer before triggering events. 0 for no buffering.
enable_turn_audio – Whether turn audio event handlers should be triggered.
**kwargs – Additional arguments passed to parent class.

property sample_rate: int

Current sample rate of the audio processor.

Returns:: The sample rate in Hz.

property num_channels: int

Number of channels in the audio output.

Returns:: Number of channels (1 for mono, 2 for stereo).

has_audio() → bool[source]

Check if either user or bot audio buffers contain data.

Returns:: True if either buffer contains audio data.

merge_audio_buffers() → bytes[source]

Merge user and bot audio buffers into a single audio stream.

For mono output, audio is mixed. For stereo output, user audio is placed on the left channel and bot audio on the right channel.

Returns:: Mixed audio data as bytes.

async start_recording()[source]

Start recording audio from both user and bot.

Initializes recording state and resets audio buffers.

async stop_recording()[source]

Stop recording and trigger final audio data handlers.

Calls audio handlers with any remaining buffered audio before stopping.

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process incoming audio frames and manage audio buffers.

Parameters:

frame – The frame to process.
direction – The direction of frame flow in the pipeline.