vad_controller
Voice Activity Detection controller for managing speech state transitions.
This module provides a controller that wraps a VADAnalyzer to track speech state and emit events when speech starts, stops, or is actively detected.
- class pipecat.audio.vad.vad_controller.VADController(vad_analyzer: VADAnalyzer, *, speech_activity_period: float = 0.2, audio_idle_timeout: float = 1.0)[source]
Bases:
BaseObjectManages voice activity detection state and emits speech events.
Wraps a VADAnalyzer to process audio and trigger events based on speech state transitions. Tracks whether the user is speaking, quiet, or transitioning between states.
Event handlers available:
on_speech_started: Called when speech begins.
on_speech_stopped: Called when speech ends, including forced stop when the audio stream goes idle (no frames received while speaking).
on_speech_activity: Called periodically while speech is detected.
on_push_frame: Called when the controller wants to push a frame.
on_broadcast_frame: Called when the controller wants to broadcast a frame.
Example:
@vad_controller.event_handler("on_speech_started") async def on_speech_started(controller): ... @vad_controller.event_handler("on_speech_stopped") async def on_speech_stopped(controller): ... @vad_controller.event_handler("on_speech_activity") async def on_speech_activity(controller): ... @vad_controller.event_handler("on_push_frame") async def on_push_frame(controller, frame: Frame, direction: FrameDirection): ... @vad_controller.event_handler("on_broadcast_frame") async def on_broadcast_frame(controller, frame_cls: Type[Frame], **kwargs): ...
- __init__(vad_analyzer: VADAnalyzer, *, speech_activity_period: float = 0.2, audio_idle_timeout: float = 1.0)[source]
Initialize the VAD controller.
- Parameters:
vad_analyzer – The VADAnalyzer instance for processing audio.
speech_activity_period – Minimum interval in seconds between on_speech_activity events. Defaults to 0.2.
audio_idle_timeout – Timeout in seconds to force speech stop when no audio frames are received while in SPEAKING state. This handles cases like mic mute mid-speech. Set to 0 to disable. Defaults to 1.0.
- async setup(task_manager: BaseTaskManager)[source]
Initialize the controller with the given task manager.
- Parameters:
task_manager – The task manager to be associated with this instance.
- async process_frame(frame: Frame)[source]
Process a frame and handle VAD-related events.
Handles StartFrame to initialize the sample rate and InputAudioRawFrame to analyze audio for voice activity.
- Parameters:
frame – The frame to process.
- async cleanup()[source]
Clean up resources.
This method should be called when the object is no longer needed. It waits for all currently executing event handler tasks to finish before returning.
- async push_frame(frame: Frame, direction: FrameDirection = FrameDirection.DOWNSTREAM)[source]
Request a frame to be pushed through the pipeline.
This emits an on_push_frame event that must be handled by a processor to actually push the frame into the pipeline.
- Parameters:
frame – The frame to push.
direction – The direction to push the frame.
- async broadcast_frame(frame_cls: type[Frame], **kwargs)[source]
Request a frame to be broadcast upstream and downstream.
This emits an on_broadcast_frame event that must be handled by a processor to actually broadcast the frame in the pipeline.
- Parameters:
frame_cls – The class of the frame to broadcast.
**kwargs – Arguments to pass to the frame constructor.