vad_analyzer

Voice Activity Detection (VAD) analyzer base classes and utilities.

This module provides the abstract base class for VAD analyzers and associated data structures for voice activity detection in audio streams. Includes state management, parameter configuration, and audio analysis framework.

class pipecat.audio.vad.vad_analyzer.VADState(*values)[source]

Bases: Enum

Voice Activity Detection states.

Parameters:
  • QUIET – No voice activity detected.

  • STARTING – Voice activity beginning, transitioning from quiet.

  • SPEAKING – Active voice detected and confirmed.

  • STOPPING – Voice activity ending, transitioning to quiet.

QUIET = 1
STARTING = 2
SPEAKING = 3
STOPPING = 4
class pipecat.audio.vad.vad_analyzer.VADParams(*, confidence: float = 0.7, start_secs: float = 0.2, stop_secs: float = 0.2, min_volume: float = 0.6)[source]

Bases: BaseModel

Configuration parameters for Voice Activity Detection.

Parameters:
  • confidence – Minimum confidence threshold for voice detection.

  • start_secs – Duration to wait before confirming voice start.

  • stop_secs – Duration to wait before confirming voice stop.

  • min_volume – Minimum audio volume threshold for voice detection.

confidence: float
start_secs: float
stop_secs: float
min_volume: float
class pipecat.audio.vad.vad_analyzer.VADAnalyzer(*, sample_rate: int | None = None, params: VADParams | None = None)[source]

Bases: ABC

Abstract base class for Voice Activity Detection analyzers.

Provides the framework for implementing VAD analysis with configurable parameters, state management, and audio processing capabilities. Subclasses must implement the core voice confidence calculation.

__init__(*, sample_rate: int | None = None, params: VADParams | None = None)[source]

Initialize the VAD analyzer.

Parameters:
  • sample_rate – Audio sample rate in Hz. If None, will be set later.

  • params – VAD parameters for detection configuration.

property sample_rate: int

Get the current sample rate.

Returns:

Current audio sample rate in Hz.

property num_channels: int

Get the number of audio channels.

Returns:

Number of audio channels (always 1 for mono).

property params: VADParams

Get the current VAD parameters.

Returns:

Current VAD configuration parameters.

abstractmethod num_frames_required() int[source]

Get the number of audio frames required for analysis.

Returns:

Number of frames needed for VAD processing.

abstractmethod voice_confidence(buffer: bytes) float[source]

Calculate voice activity confidence for the given audio buffer.

Parameters:

buffer – Audio buffer to analyze.

Returns:

Voice confidence score between 0.0 and 1.0.

set_sample_rate(sample_rate: int)[source]

Set the sample rate for audio processing.

Parameters:

sample_rate – Audio sample rate in Hz.

set_params(params: VADParams)[source]

Set VAD parameters and recalculate internal values.

Parameters:

params – VAD parameters for detection configuration.

async analyze_audio(buffer: bytes) VADState[source]

Analyze audio buffer and return current VAD state.

Processes incoming audio data, maintains internal state, and determines voice activity status based on confidence and volume thresholds.

Parameters:

buffer – Audio buffer to analyze.

Returns:

Current VAD state after processing the buffer.

async cleanup()[source]

Clean up resources.

This method should be called when the object is no longer needed. It waits for all currently executing event handler tasks to finish before returning.