krisp_viva_ip_user_turn_start_strategy
User turn start strategy using Krisp Interruption Prediction (IP).
This strategy uses Krisp’s IP model to distinguish genuine user interruptions from backchannels (e.g. “uh-huh”, “yeah”). Instead of triggering a user turn on every VAD speech event, it collects audio after VAD detects speech and runs the IP model to predict whether the speech is a real interruption.
Only when the IP model’s probability exceeds the configured threshold is
trigger_user_turn_started() called. This prevents the bot from being
interrupted by brief acknowledgements or filler words.
- class pipecat.turns.user_start.krisp_viva_ip_user_turn_start_strategy.KrispVivaIPUserTurnStartStrategy(*, model_path: str | None = None, threshold: float = 0.5, frame_duration_ms: int = 20, api_key: str = '', **kwargs)[source]
Bases:
BaseUserTurnStartStrategyUser turn start strategy using Krisp VIVA Interruption Prediction.
When VAD detects user speech, this strategy feeds audio frames into the Krisp VIVA IP model. The model outputs a probability indicating whether the speech is a genuine interruption (as opposed to a backchannel). A user turn is triggered only when this probability exceeds the configured threshold.
This strategy is designed to work alongside other start strategies (e.g.
TranscriptionUserTurnStartStrategyas a fallback) via the strategy list inUserTurnStrategies.Example:
from pipecat.turns.user_start import KrispVivaIPUserTurnStartStrategy strategies = UserTurnStrategies( start=[ KrispVivaIPUserTurnStartStrategy( model_path="/path/to/ip_model.kef", threshold=0.5, ), TranscriptionUserTurnStartStrategy(), ], )
- __init__(*, model_path: str | None = None, threshold: float = 0.5, frame_duration_ms: int = 20, api_key: str = '', **kwargs)[source]
Initialize the Krisp VIVA IP user turn start strategy.
- Parameters:
model_path – Path to the Krisp VIVA IP model file (.kef). If None, uses the KRISP_VIVA_IP_MODEL_PATH environment variable.
threshold – IP probability threshold (0.0 to 1.0). When the model’s output exceeds this value, the speech is classified as a genuine interruption.
frame_duration_ms – Frame duration in milliseconds for IP processing. Supported values: 10, 15, 20, 30, 32.
api_key – Krisp SDK API key. If empty, falls back to the KRISP_VIVA_API_KEY environment variable.
**kwargs – Additional arguments passed to BaseUserTurnStartStrategy.
- async process_frame(frame: Frame) ProcessFrameResult[source]
Process a frame to detect genuine user interruptions.
On
VADUserStartedSpeakingFrame, begins collecting audio. OnInputAudioRawFrame, feeds audio through the IP model and triggers a user turn if the interruption probability exceeds the threshold. OnVADUserStoppedSpeakingFrameorBotStoppedSpeakingFrame, resets the candidate state.- Parameters:
frame – The incoming frame.
- Returns:
STOP if a genuine interruption was detected, CONTINUE otherwise.