krisp_viva_turn

Krisp turn analyzer for end-of-turn detection using Krisp VIVA SDK.

This module provides a turn analyzer implementation using Krisp’s turn detection v3 (Tt) API to determine when a user has finished speaking in a conversation. The Tt API accepts an external VAD flag alongside audio frames, allowing the model to leverage voice activity information for more accurate turn detection.

Note: This analyzer uses a different model than KrispVivaFilter. The model path can be specified via the KRISP_VIVA_TURN_MODEL_PATH environment variable or passed directly to the constructor.

class pipecat.audio.turn.krisp_viva_turn.KrispTurnParams(*, threshold: float = 0.5, frame_duration_ms: int = 20)[source]

Bases: BaseTurnParams

Configuration parameters for Krisp turn analysis.

Parameters:

threshold – Probability threshold for turn completion (0.0 to 1.0). Higher values require more confidence before marking turn as complete.
frame_duration_ms – Frame duration in milliseconds for turn detection. Supported values: 10, 15, 20, 30, 32.

threshold: float

frame_duration_ms: int

class pipecat.audio.turn.krisp_viva_turn.KrispVivaTurn(*, model_path: str | None = None, sample_rate: int | None = None, params: KrispTurnParams | None = None, api_key: str = '')[source]

Bases: BaseTurnAnalyzer

Turn analyzer using Krisp VIVA SDK for end-of-turn detection.

Uses Krisp’s turn detection v3 (Tt) API to determine when a user has finished speaking. The Tt API receives an external VAD flag with each audio frame, which the is_speech parameter of append_audio provides. This analyzer requires a valid Krisp model file to operate.

__init__(*, model_path: str | None = None, sample_rate: int | None = None, params: KrispTurnParams | None = None, api_key: str = '') → None[source]

Initialize the Krisp turn analyzer.

Parameters:

model_path – Path to the Krisp turn detection model file (.kef extension). If None, uses KRISP_VIVA_TURN_MODEL_PATH environment variable.
sample_rate – Optional initial sample rate for audio processing. If provided, this will be used as the fixed sample rate.
params – Configuration parameters for turn analysis behavior.
api_key – Krisp SDK API key. If empty, falls back to the KRISP_VIVA_API_KEY environment variable.

Raises:

ValueError – If model_path is not provided and KRISP_VIVA_TURN_MODEL_PATH is not set.
Exception – If model file doesn’t have .kef extension.
FileNotFoundError – If model file doesn’t exist.
RuntimeError – If Krisp SDK initialization fails.

async cleanup()[source]: Release SDK reference when analyzer is destroyed.

set_sample_rate(sample_rate: int)[source]

Set the sample rate and create/update the turn detection session.

Parameters:: sample_rate – The sample rate to set.

property frame_probabilities: list

Get all probabilities from the last append_audio call.

Returns:: List of probability values for each frame processed in the last append_audio call.

property last_probability: float | None

Get the last turn probability value computed.

Returns:: Last probability value, or None if no frames have been processed yet.

property speech_triggered: bool

Check if speech has been detected and triggered analysis.

Returns:: True if speech has been detected and turn analysis is active.

property params: KrispTurnParams

Get the current turn analyzer parameters.

Returns:: Current turn analyzer configuration parameters.

append_audio(buffer: bytes, is_speech: bool) → EndOfTurnState[source]

Append audio data for turn analysis.

Parameters:

buffer – Raw audio data bytes to append for analysis.
is_speech – Whether the audio buffer contains detected speech.

Returns:

Current end-of-turn state after processing the audio.

async analyze_end_of_turn() → tuple[EndOfTurnState, MetricsData | None][source]

Analyze the current audio state to determine if turn has ended.

Returns:: Tuple containing the end-of-turn state and optional metrics data. Returns the last state determined by append_audio().

clear()[source]: Reset the turn analyzer to its initial state.