vad_processor

Voice Activity Detection processor for detecting speech in audio streams.

This module provides a VADProcessor that wraps a VADController to process audio frames and push VAD-related frames into the pipeline.

class pipecat.processors.audio.vad_processor.VADProcessor(*, vad_analyzer: VADAnalyzer, speech_activity_period: float = 0.2, audio_idle_timeout: float = 1.0, **kwargs)[source]

Bases: FrameProcessor

Processes audio frames through voice activity detection.

This processor wraps a VADController to detect speech in audio streams and push VAD frames into the pipeline:

VADUserStartedSpeakingFrame: Pushed when speech begins.
VADUserStoppedSpeakingFrame: Pushed when speech ends.
UserSpeakingFrame: Pushed periodically while speech is detected.

Example:

vad_processor = VADProcessor(vad_analyzer=SileroVADAnalyzer())

__init__(*, vad_analyzer: VADAnalyzer, speech_activity_period: float = 0.2, audio_idle_timeout: float = 1.0, **kwargs)[source]

Initialize the VAD processor.

Parameters:

vad_analyzer – The VADAnalyzer instance for processing audio.
speech_activity_period – Minimum interval in seconds between UserSpeakingFrame pushes. Defaults to 0.2.
audio_idle_timeout – Timeout in seconds to force speech stop when no audio frames are received while in SPEAKING state. Set to 0 to disable. Defaults to 1.0.
**kwargs – Additional arguments passed to parent class.

async cleanup()[source]: Clean up VAD controller resources.

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process a frame through VAD and forward it.

Parameters:

frame – The frame to process.
direction – The direction of frame flow in the pipeline.