vad_analyzer
Voice Activity Detection (VAD) analyzer base classes and utilities.
This module provides the abstract base class for VAD analyzers and associated data structures for voice activity detection in audio streams. Includes state management, parameter configuration, and audio analysis framework.
- class pipecat.audio.vad.vad_analyzer.VADState(*values)[source]
Bases:
EnumVoice Activity Detection states.
- Parameters:
QUIET – No voice activity detected.
STARTING – Voice activity beginning, transitioning from quiet.
SPEAKING – Active voice detected and confirmed.
STOPPING – Voice activity ending, transitioning to quiet.
- QUIET = 1
- STARTING = 2
- SPEAKING = 3
- STOPPING = 4
- class pipecat.audio.vad.vad_analyzer.VADParams(*, confidence: float = 0.7, start_secs: float = 0.2, stop_secs: float = 0.2, min_volume: float = 0.6)[source]
Bases:
BaseModelConfiguration parameters for Voice Activity Detection.
- Parameters:
confidence – Minimum confidence threshold for voice detection.
start_secs – Duration to wait before confirming voice start.
stop_secs – Duration to wait before confirming voice stop.
min_volume – Minimum audio volume threshold for voice detection.
- confidence: float
- start_secs: float
- stop_secs: float
- min_volume: float
- class pipecat.audio.vad.vad_analyzer.VADAnalyzer(*, sample_rate: int | None = None, params: VADParams | None = None)[source]
Bases:
ABCAbstract base class for Voice Activity Detection analyzers.
Provides the framework for implementing VAD analysis with configurable parameters, state management, and audio processing capabilities. Subclasses must implement the core voice confidence calculation.
- __init__(*, sample_rate: int | None = None, params: VADParams | None = None)[source]
Initialize the VAD analyzer.
- Parameters:
sample_rate – Audio sample rate in Hz. If None, will be set later.
params – VAD parameters for detection configuration.
- property sample_rate: int
Get the current sample rate.
- Returns:
Current audio sample rate in Hz.
- property num_channels: int
Get the number of audio channels.
- Returns:
Number of audio channels (always 1 for mono).
- property params: VADParams
Get the current VAD parameters.
- Returns:
Current VAD configuration parameters.
- abstractmethod num_frames_required() int[source]
Get the number of audio frames required for analysis.
- Returns:
Number of frames needed for VAD processing.
- abstractmethod voice_confidence(buffer: bytes) float[source]
Calculate voice activity confidence for the given audio buffer.
- Parameters:
buffer – Audio buffer to analyze.
- Returns:
Voice confidence score between 0.0 and 1.0.
- set_sample_rate(sample_rate: int)[source]
Set the sample rate for audio processing.
- Parameters:
sample_rate – Audio sample rate in Hz.
- set_params(params: VADParams)[source]
Set VAD parameters and recalculate internal values.
- Parameters:
params – VAD parameters for detection configuration.
- async analyze_audio(buffer: bytes) VADState[source]
Analyze audio buffer and return current VAD state.
Processes incoming audio data, maintains internal state, and determines voice activity status based on confidence and volume thresholds.
- Parameters:
buffer – Audio buffer to analyze.
- Returns:
Current VAD state after processing the buffer.