vad_controller

Voice Activity Detection controller for managing speech state transitions.

This module provides a controller that wraps a VADAnalyzer to track speech state and emit events when speech starts, stops, or is actively detected.

class pipecat.audio.vad.vad_controller.VADController(vad_analyzer: VADAnalyzer, *, speech_activity_period: float = 0.2, audio_idle_timeout: float = 1.0)[source]

Bases: BaseObject

Manages voice activity detection state and emits speech events.

Wraps a VADAnalyzer to process audio and trigger events based on speech state transitions. Tracks whether the user is speaking, quiet, or transitioning between states.

Event handlers available:

  • on_speech_started: Called when speech begins.

  • on_speech_stopped: Called when speech ends, including forced stop when the audio stream goes idle (no frames received while speaking).

  • on_speech_activity: Called periodically while speech is detected.

  • on_push_frame: Called when the controller wants to push a frame.

  • on_broadcast_frame: Called when the controller wants to broadcast a frame.

Example:

@vad_controller.event_handler("on_speech_started")
async def on_speech_started(controller):
    ...

@vad_controller.event_handler("on_speech_stopped")
async def on_speech_stopped(controller):
    ...

@vad_controller.event_handler("on_speech_activity")
async def on_speech_activity(controller):
    ...

@vad_controller.event_handler("on_push_frame")
async def on_push_frame(controller, frame: Frame, direction: FrameDirection):
    ...

@vad_controller.event_handler("on_broadcast_frame")
async def on_broadcast_frame(controller, frame_cls: Type[Frame], **kwargs):
    ...
__init__(vad_analyzer: VADAnalyzer, *, speech_activity_period: float = 0.2, audio_idle_timeout: float = 1.0)[source]

Initialize the VAD controller.

Parameters:
  • vad_analyzer – The VADAnalyzer instance for processing audio.

  • speech_activity_period – Minimum interval in seconds between on_speech_activity events. Defaults to 0.2.

  • audio_idle_timeout – Timeout in seconds to force speech stop when no audio frames are received while in SPEAKING state. This handles cases like mic mute mid-speech. Set to 0 to disable. Defaults to 1.0.

async setup(task_manager: BaseTaskManager)[source]

Initialize the controller with the given task manager.

Parameters:

task_manager – The task manager to be associated with this instance.

async process_frame(frame: Frame)[source]

Process a frame and handle VAD-related events.

Handles StartFrame to initialize the sample rate and InputAudioRawFrame to analyze audio for voice activity.

Parameters:

frame – The frame to process.

async cleanup()[source]

Clean up resources.

This method should be called when the object is no longer needed. It waits for all currently executing event handler tasks to finish before returning.

async push_frame(frame: Frame, direction: FrameDirection = FrameDirection.DOWNSTREAM)[source]

Request a frame to be pushed through the pipeline.

This emits an on_push_frame event that must be handled by a processor to actually push the frame into the pipeline.

Parameters:
  • frame – The frame to push.

  • direction – The direction to push the frame.

async broadcast_frame(frame_cls: type[Frame], **kwargs)[source]

Request a frame to be broadcast upstream and downstream.

This emits an on_broadcast_frame event that must be handled by a processor to actually broadcast the frame in the pipeline.

Parameters:
  • frame_cls – The class of the frame to broadcast.

  • **kwargs – Arguments to pass to the frame constructor.