base_turn_analyzer

Base turn analyzer for determining end-of-turn in audio conversations.

This module provides the abstract base class and enumeration for analyzing when a user has finished speaking in a conversation.

class pipecat.audio.turn.base_turn_analyzer.EndOfTurnState(*values)[source]

Bases: Enum

State enumeration for end-of-turn analysis results.

Parameters:

COMPLETE – The user has finished their turn and stopped speaking.
INCOMPLETE – The user is still speaking or may continue speaking.

COMPLETE = 1

INCOMPLETE = 2

class pipecat.audio.turn.base_turn_analyzer.BaseTurnParams[source]

Bases: BaseModel

Base class for turn analyzer parameters.

class pipecat.audio.turn.base_turn_analyzer.BaseTurnAnalyzer(*, sample_rate: int | None = None)[source]

Bases: ABC

Abstract base class for analyzing user end of turn.

This class inherits from BaseObject to leverage its event handling system while still defining an abstract interface through abstract methods.

__init__(*, sample_rate: int | None = None)[source]

Initialize the turn analyzer.

Parameters:: sample_rate – Optional initial sample rate for audio processing. If provided, this will be used as the fixed sample rate.

property sample_rate: int

Returns the current sample rate.

Returns:: The effective sample rate for audio processing.
Return type:: int

set_sample_rate(sample_rate: int)[source]

Sets the sample rate for audio processing.

If the initial sample rate was provided, it will use that; otherwise, it sets to the provided sample rate.

Parameters:: sample_rate (int) – The sample rate to set.

abstract property speech_triggered: bool

Determines if speech has been detected.

Returns:: True if speech is triggered, otherwise False.
Return type:: bool

abstract property params: BaseTurnParams

Get the current turn analyzer parameters.

Returns:: Current turn analyzer configuration parameters.

abstractmethod append_audio(buffer: bytes, is_speech: bool) → EndOfTurnState[source]

Appends audio data for analysis.

Parameters:

buffer (bytes) – The audio data to append.
is_speech (bool) – Indicates whether the appended audio is speech or not.

Returns:

The resulting state after appending the audio.

Return type:

EndOfTurnState

abstractmethod async analyze_end_of_turn() → tuple[EndOfTurnState, MetricsData | None][source]

Analyzes if an end of turn has occurred based on the audio input.

Returns:: The result of the end of turn analysis.
Return type:: EndOfTurnState

update_vad_start_secs(vad_start_secs: float)[source]

Update the VAD start trigger time.

The turn analyzer may choose to change its buffer size depending on this value.

Parameters:: vad_start_secs (float) – The number of seconds of voice activity before triggering the user speaking event.

abstractmethod clear()[source]: Reset the turn analyzer to its initial state.

async cleanup()[source]: Cleanup the turn analyzer.