base_smart_turn

Smart turn analyzer base class using ML models for end-of-turn detection.

This module provides the base implementation for smart turn analyzers that use machine learning models to determine when a user has finished speaking, going beyond simple silence-based detection.

class pipecat.audio.turn.smart_turn.base_smart_turn.SmartTurnParams(*, stop_secs: float = 3, pre_speech_ms: float = 500, max_duration_secs: float = 8)[source]

Bases: BaseTurnParams

Configuration parameters for smart turn analysis.

Parameters:

stop_secs – Maximum silence duration in seconds before ending turn.
pre_speech_ms – Milliseconds of audio to include before speech starts.
max_duration_secs – Maximum duration in seconds for audio segments.

stop_secs: float

pre_speech_ms: float

max_duration_secs: float

exception pipecat.audio.turn.smart_turn.base_smart_turn.SmartTurnTimeoutException[source]

Bases: Exception

Exception raised when smart turn analysis times out.

class pipecat.audio.turn.smart_turn.base_smart_turn.BaseSmartTurn(*, sample_rate: int | None = None, params: SmartTurnParams | None = None)[source]

Bases: BaseTurnAnalyzer

Base class for smart turn analyzers using ML models.

Provides common functionality for smart turn detection including audio buffering, speech tracking, and ML model integration. Subclasses must implement the specific model prediction logic.

__init__(*, sample_rate: int | None = None, params: SmartTurnParams | None = None)[source]

Initialize the smart turn analyzer.

Parameters:

sample_rate – Optional sample rate for audio processing.
params – Configuration parameters for turn analysis behavior.

property speech_triggered: bool

Check if speech has been detected and triggered analysis.

Returns:: True if speech has been detected and turn analysis is active.

property params: SmartTurnParams

Get the current smart turn parameters.

Returns:: Current smart turn configuration parameters.

append_audio(buffer: bytes, is_speech: bool) → EndOfTurnState[source]

Append audio data for turn analysis.

Parameters:

buffer – Raw audio data bytes to append for analysis.
is_speech – Whether the audio buffer contains detected speech.

Returns:

Current end-of-turn state after processing the audio.

async analyze_end_of_turn() → tuple[EndOfTurnState, MetricsData | None][source]

Analyze the current audio state to determine if turn has ended.

Returns:: Tuple containing the end-of-turn state and optional metrics data from the ML model analysis.

update_vad_start_secs(vad_start_secs: float)[source]: Store the new vad_start_secs value.

clear()[source]: Reset the turn analyzer to its initial state.