audio_buffer_processor

Audio buffer processor for managing and synchronizing audio streams.

This module provides an AudioBufferProcessor that handles buffering and synchronization of audio from both user input and bot output sources, with support for various audio configurations and event-driven processing.

class pipecat.processors.audio.audio_buffer_processor.AudioBufferProcessor(*, sample_rate: int | None = None, num_channels: int = 1, buffer_size: int = 0, enable_turn_audio: bool = False, **kwargs)[source]

Bases: FrameProcessor

Processes and buffers audio frames from both input (user) and output (bot) sources.

This processor manages audio buffering and synchronization, providing both merged and track-specific audio access through event handlers. It supports various audio configurations including sample rate conversion and mono/stereo output.

Events:

  • on_audio_data: Triggered when buffer_size is reached, providing merged audio

  • on_track_audio_data: Triggered when buffer_size is reached, providing separate tracks

  • on_user_turn_audio_data: Triggered when user turn has ended, providing that user turn’s audio

  • on_bot_turn_audio_data: Triggered when bot turn has ended, providing that bot turn’s audio

Audio handling:

  • Mono output (num_channels=1): User and bot audio are mixed

  • Stereo output (num_channels=2): User audio on left, bot audio on right

  • Automatic resampling of incoming audio to match desired sample_rate

  • Silence insertion for non-continuous audio streams

  • Buffer synchronization between user and bot audio

__init__(*, sample_rate: int | None = None, num_channels: int = 1, buffer_size: int = 0, enable_turn_audio: bool = False, **kwargs)[source]

Initialize the audio buffer processor.

Parameters:
  • sample_rate – Desired output sample rate. If None, uses source rate.

  • num_channels – Number of channels (1 for mono, 2 for stereo). Defaults to 1.

  • buffer_size – Size of buffer before triggering events. 0 for no buffering.

  • enable_turn_audio – Whether turn audio event handlers should be triggered.

  • **kwargs – Additional arguments passed to parent class.

property sample_rate: int

Current sample rate of the audio processor.

Returns:

The sample rate in Hz.

property num_channels: int

Number of channels in the audio output.

Returns:

Number of channels (1 for mono, 2 for stereo).

has_audio() bool[source]

Check if either user or bot audio buffers contain data.

Returns:

True if either buffer contains audio data.

merge_audio_buffers() bytes[source]

Merge user and bot audio buffers into a single audio stream.

For mono output, audio is mixed. For stereo output, user audio is placed on the left channel and bot audio on the right channel.

Returns:

Mixed audio data as bytes.

async start_recording()[source]

Start recording audio from both user and bot.

Initializes recording state and resets audio buffers.

async stop_recording()[source]

Stop recording and trigger final audio data handlers.

Calls audio handlers with any remaining buffered audio before stopping.

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process incoming audio frames and manage audio buffers.

Parameters:
  • frame – The frame to process.

  • direction – The direction of frame flow in the pipeline.