krisp_viva_ip_user_turn_start_strategy

User turn start strategy using Krisp Interruption Prediction (IP).

This strategy uses Krisp’s IP model to distinguish genuine user interruptions from backchannels (e.g. “uh-huh”, “yeah”). Instead of triggering a user turn on every VAD speech event, it collects audio after VAD detects speech and runs the IP model to predict whether the speech is a real interruption.

Only when the IP model’s probability exceeds the configured threshold is trigger_user_turn_started() called. This prevents the bot from being interrupted by brief acknowledgements or filler words.

class pipecat.turns.user_start.krisp_viva_ip_user_turn_start_strategy.KrispVivaIPUserTurnStartStrategy(*, model_path: str | None = None, threshold: float = 0.5, frame_duration_ms: int = 20, api_key: str = '', **kwargs)[source]

Bases: BaseUserTurnStartStrategy

User turn start strategy using Krisp VIVA Interruption Prediction.

When VAD detects user speech, this strategy feeds audio frames into the Krisp VIVA IP model. The model outputs a probability indicating whether the speech is a genuine interruption (as opposed to a backchannel). A user turn is triggered only when this probability exceeds the configured threshold.

This strategy is designed to work alongside other start strategies (e.g. TranscriptionUserTurnStartStrategy as a fallback) via the strategy list in UserTurnStrategies.

Example:

from pipecat.turns.user_start import KrispVivaIPUserTurnStartStrategy

strategies = UserTurnStrategies(
    start=[
        KrispVivaIPUserTurnStartStrategy(
            model_path="/path/to/ip_model.kef",
            threshold=0.5,
        ),
        TranscriptionUserTurnStartStrategy(),
    ],
)
__init__(*, model_path: str | None = None, threshold: float = 0.5, frame_duration_ms: int = 20, api_key: str = '', **kwargs)[source]

Initialize the Krisp VIVA IP user turn start strategy.

Parameters:
  • model_path – Path to the Krisp VIVA IP model file (.kef). If None, uses the KRISP_VIVA_IP_MODEL_PATH environment variable.

  • threshold – IP probability threshold (0.0 to 1.0). When the model’s output exceeds this value, the speech is classified as a genuine interruption.

  • frame_duration_ms – Frame duration in milliseconds for IP processing. Supported values: 10, 15, 20, 30, 32.

  • api_key – Krisp SDK API key. If empty, falls back to the KRISP_VIVA_API_KEY environment variable.

  • **kwargs – Additional arguments passed to BaseUserTurnStartStrategy.

async cleanup()[source]

Release Krisp SDK resources.

async reset()[source]

Reset the strategy to its initial state.

async process_frame(frame: Frame) ProcessFrameResult[source]

Process a frame to detect genuine user interruptions.

On VADUserStartedSpeakingFrame, begins collecting audio. On InputAudioRawFrame, feeds audio through the IP model and triggers a user turn if the interruption probability exceeds the threshold. On VADUserStoppedSpeakingFrame or BotStoppedSpeakingFrame, resets the candidate state.

Parameters:

frame – The incoming frame.

Returns:

STOP if a genuine interruption was detected, CONTINUE otherwise.