aic_vad

AIC-integrated VAD analyzer that lazily binds to the AIC SDK backend.

This module provides VAD analyzer implementations that query the AIC SDK’s is_speech_detected() and map it to a float confidence (1.0/0.0).

Classes:: AICVADAnalyzer: For aic-sdk (uses ‘aic_sdk’ module)

class pipecat.audio.vad.aic_vad.AICVADAnalyzer(*, vad_context_factory: Callable[[], Any] | None = None, speech_hold_duration: float | None = None, minimum_speech_duration: float | None = None, sensitivity: float | None = None)[source]

Bases: VADAnalyzer

VAD analyzer that lazily binds to the AIC VadContext via a factory.

The analyzer can be constructed before the AIC Processor exists. Once the filter has started and the Processor is available, the provided factory will succeed and the VadContext will be obtained. The context’s is_speech_detected() boolean state is then mapped to 1.0 (speech) or 0.0 (no speech) to satisfy the VADAnalyzer interface.

AIC VAD runtime parameters:

speech_hold_duration:
Controls for how long the VAD continues to detect speech after the audio signal no longer contains speech (in seconds). Range: 0.0 to 100x model window length Default (SDK): 0.05s (50ms)
minimum_speech_duration:
Controls for how long speech needs to be present in the audio signal before the VAD considers it speech (in seconds). Range: 0.0 to 1.0 Default (SDK): 0.0s
sensitivity:
Controls the sensitivity (energy threshold) of the VAD. This value is used by the VAD as the threshold a speech audio signal’s energy has to exceed in order to be considered speech. Range: 1.0 to 15.0 Formula: Energy threshold = 10 ** (-sensitivity) Default (SDK): 6.0

__init__(*, vad_context_factory: Callable[[], Any] | None = None, speech_hold_duration: float | None = None, minimum_speech_duration: float | None = None, sensitivity: float | None = None)[source]

Create an AIC VAD analyzer.

Parameters:

vad_context_factory – Zero-arg callable that returns the AIC VadContext. This may raise until the filter’s Processor has been created; the analyzer will retry on set_sample_rate/first use.
speech_hold_duration – Optional override for AIC VAD speech hold duration (in seconds). Range: 0.0 to 100x model window length. If None, the SDK default (0.05s) is used.
minimum_speech_duration – Optional override for minimum speech duration before VAD reports speech detected (in seconds). Range: 0.0 to 1.0. If None, the SDK default (0.0s) is used.
sensitivity – Optional override for AIC VAD sensitivity (energy threshold). Range: 1.0 to 15.0. Energy threshold = 10 ** (-sensitivity). If None, the SDK default (6.0) is used.

bind_vad_context_factory(vad_context_factory: Callable[[], Any])[source]: Attach or replace the factory post-construction.

set_sample_rate(sample_rate: int)[source]

Set the sample rate for audio processing.

Parameters:: sample_rate – Audio sample rate in Hz.

num_frames_required() → int[source]

Get the number of audio frames required for analysis.

Returns:: Number of frames needed for VAD processing.

voice_confidence(buffer: bytes) → float[source]

Return voice activity detection result for the given audio buffer.

Note

The AIC SDK provides binary speech detection (not a probability score). This method returns 1.0 when speech is detected and 0.0 otherwise, rather than a true confidence value.

Parameters:: buffer – Audio buffer (unused - AIC VAD state is updated internally by the enhancement pipeline).
Returns:: 1.0 if speech is detected, 0.0 otherwise.