silero

Silero Voice Activity Detection (VAD) implementation for Pipecat.

This module provides a VAD analyzer based on the Silero VAD ONNX model, which can detect voice activity in audio streams with high accuracy. Supports 8kHz and 16kHz sample rates.

class pipecat.audio.vad.silero.SileroOnnxModel(path, force_onnx_cpu=True)[source]

Bases: object

ONNX runtime wrapper for the Silero VAD model.

Provides voice activity detection using the pre-trained Silero VAD model with ONNX runtime for efficient inference. Handles model state management and input validation for audio processing.

__init__(path, force_onnx_cpu=True)[source]

Initialize the Silero ONNX model.

Parameters:
  • path – Path to the ONNX model file.

  • force_onnx_cpu – Whether to force CPU execution provider.

reset_states(batch_size=1)[source]

Reset the internal model states.

Parameters:

batch_size – Batch size for state initialization. Defaults to 1.

class pipecat.audio.vad.silero.SileroVADAnalyzer(*, sample_rate: int | None = None, params: VADParams | None = None)[source]

Bases: VADAnalyzer

Voice Activity Detection analyzer using the Silero VAD model.

Implements VAD analysis using the pre-trained Silero ONNX model for accurate voice activity detection. Supports 8kHz and 16kHz sample rates with automatic model state management and periodic resets.

__init__(*, sample_rate: int | None = None, params: VADParams | None = None)[source]

Initialize the Silero VAD analyzer.

Parameters:
  • sample_rate – Audio sample rate (8000 or 16000 Hz). If None, will be set later.

  • params – VAD parameters for detection thresholds and timing.

set_sample_rate(sample_rate: int)[source]

Set the sample rate for audio processing.

Parameters:

sample_rate – Audio sample rate (must be 8000 or 16000 Hz).

Raises:

ValueError – If sample rate is not 8000 or 16000 Hz.

num_frames_required() int[source]

Get the number of audio frames required for VAD analysis.

Returns:

Number of frames required (512 for 16kHz, 256 for 8kHz).

voice_confidence(buffer) float[source]

Calculate voice activity confidence for the given audio buffer.

Parameters:

buffer – Audio buffer to analyze.

Returns:

Voice confidence score between 0.0 and 1.0.