base_turn_analyzer

Base turn analyzer for determining end-of-turn in audio conversations.

This module provides the abstract base class and enumeration for analyzing when a user has finished speaking in a conversation.

class pipecat.audio.turn.base_turn_analyzer.EndOfTurnState(*values)[source]

Bases: Enum

State enumeration for end-of-turn analysis results.

Parameters:
  • COMPLETE – The user has finished their turn and stopped speaking.

  • INCOMPLETE – The user is still speaking or may continue speaking.

COMPLETE = 1
INCOMPLETE = 2
class pipecat.audio.turn.base_turn_analyzer.BaseTurnParams[source]

Bases: BaseModel

Base class for turn analyzer parameters.

class pipecat.audio.turn.base_turn_analyzer.BaseTurnAnalyzer(*, sample_rate: int | None = None)[source]

Bases: ABC

Abstract base class for analyzing user end of turn.

This class inherits from BaseObject to leverage its event handling system while still defining an abstract interface through abstract methods.

__init__(*, sample_rate: int | None = None)[source]

Initialize the turn analyzer.

Parameters:

sample_rate – Optional initial sample rate for audio processing. If provided, this will be used as the fixed sample rate.

property sample_rate: int

Returns the current sample rate.

Returns:

The effective sample rate for audio processing.

Return type:

int

set_sample_rate(sample_rate: int)[source]

Sets the sample rate for audio processing.

If the initial sample rate was provided, it will use that; otherwise, it sets to the provided sample rate.

Parameters:

sample_rate (int) – The sample rate to set.

abstract property speech_triggered: bool

Determines if speech has been detected.

Returns:

True if speech is triggered, otherwise False.

Return type:

bool

abstract property params: BaseTurnParams

Get the current turn analyzer parameters.

Returns:

Current turn analyzer configuration parameters.

abstractmethod append_audio(buffer: bytes, is_speech: bool) EndOfTurnState[source]

Appends audio data for analysis.

Parameters:
  • buffer (bytes) – The audio data to append.

  • is_speech (bool) – Indicates whether the appended audio is speech or not.

Returns:

The resulting state after appending the audio.

Return type:

EndOfTurnState

abstractmethod async analyze_end_of_turn() tuple[EndOfTurnState, MetricsData | None][source]

Analyzes if an end of turn has occurred based on the audio input.

Returns:

The result of the end of turn analysis.

Return type:

EndOfTurnState

update_vad_start_secs(vad_start_secs: float)[source]

Update the VAD start trigger time.

The turn analyzer may choose to change its buffer size depending on this value.

Parameters:

vad_start_secs (float) – The number of seconds of voice activity before triggering the user speaking event.

abstractmethod clear()[source]

Reset the turn analyzer to its initial state.

async cleanup()[source]

Cleanup the turn analyzer.