vad_processor

Voice Activity Detection processor for detecting speech in audio streams.

This module provides a VADProcessor that wraps a VADController to process audio frames and push VAD-related frames into the pipeline.

class pipecat.processors.audio.vad_processor.VADProcessor(*, vad_analyzer: VADAnalyzer, speech_activity_period: float = 0.2, audio_idle_timeout: float = 1.0, **kwargs)[source]

Bases: FrameProcessor

Processes audio frames through voice activity detection.

This processor wraps a VADController to detect speech in audio streams and push VAD frames into the pipeline:

  • VADUserStartedSpeakingFrame: Pushed when speech begins.

  • VADUserStoppedSpeakingFrame: Pushed when speech ends.

  • UserSpeakingFrame: Pushed periodically while speech is detected.

Example:

vad_processor = VADProcessor(vad_analyzer=SileroVADAnalyzer())
__init__(*, vad_analyzer: VADAnalyzer, speech_activity_period: float = 0.2, audio_idle_timeout: float = 1.0, **kwargs)[source]

Initialize the VAD processor.

Parameters:
  • vad_analyzer – The VADAnalyzer instance for processing audio.

  • speech_activity_period – Minimum interval in seconds between UserSpeakingFrame pushes. Defaults to 0.2.

  • audio_idle_timeout – Timeout in seconds to force speech stop when no audio frames are received while in SPEAKING state. Set to 0 to disable. Defaults to 1.0.

  • **kwargs – Additional arguments passed to parent class.

async cleanup()[source]

Clean up VAD controller resources.

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process a frame through VAD and forward it.

Parameters:
  • frame – The frame to process.

  • direction – The direction of frame flow in the pipeline.