user_start

class pipecat.turns.user_start.BaseUserTurnStartStrategy(*, enable_interruptions: bool = True, enable_user_speaking_frames: bool = True, **kwargs)[source]

Bases: BaseObject

Base class for strategies that determine when a user starts speaking.

Subclasses should implement logic to detect the start of a user’s turn. This could be based on voice activity, number of words spoken, or other heuristics.

Events triggered by user turn start strategies:

on_push_frame: Indicates the strategy wants to push a frame.

on_broadcast_frame: Indicates the strategy wants to broadcast a frame.

on_user_turn_started: Signals that a user turn has started.

__init__(*, enable_interruptions: bool = True, enable_user_speaking_frames: bool = True, **kwargs)[source]

Initialize the base user turn start strategy.

Parameters:

enable_interruptions – If True, the user aggregator will emit an interruption frame when the user turn starts.
enable_user_speaking_frames – If True, the user aggregator will emit frames indicating when the user starts speaking, as well as interruption frames. This is enabled by default, but you may want to disable it if another component (e.g., an STT service) is already generating these frames.
**kwargs – Additional keyword arguments.

property task_manager: BaseTaskManager: Returns the configured task manager.

async setup(task_manager: BaseTaskManager)[source]

Initialize the strategy with the given task manager.

Parameters:: task_manager – The task manager to be associated with this instance.

async cleanup()[source]: Cleanup the strategy.

async reset()[source]: Reset the strategy to its initial state.

async process_frame(frame: Frame) → ProcessFrameResult | None[source]

Process an incoming frame.

Subclasses should override this to implement logic that decides whether the user turn has started.

Parameters:: frame – The frame to be processed.
Returns:: A ProcessFrameResult indicating the outcome, or None (treated as CONTINUE for backward compatibility).

async push_frame(frame: Frame, direction: FrameDirection = FrameDirection.DOWNSTREAM)[source]

Emit on_push_frame to push a frame using the user aggreagtor.

Parameters:

frame – The frame to be pushed.
direction – What direction the frame should be pushed to.

async broadcast_frame(frame_cls: type[Frame], **kwargs)[source]

Emit on_broadcast_frame to broadcast a frame using the user aggreagtor.

Parameters:

frame_cls – The class of the frame to be broadcasted.
**kwargs – Keyword arguments to be passed to the frame’s constructor.

async trigger_user_turn_started()[source]: Trigger the on_user_turn_started event.

async trigger_reset_aggregation()[source]: Trigger the on_reset_aggregation event.

class pipecat.turns.user_start.ExternalUserTurnStartStrategy(**kwargs)[source]

Bases: BaseUserTurnStartStrategy

User turn start strategy controlled by an external processor.

This strategy does not determine when a user turn starts on its own, instead it relies on a different processor in the pipeline which is responsible for emitting UserStartedSpeakingFrame frames.

__init__(**kwargs)[source]

Initialize the external user turn start strategy.

Parameters:: **kwargs – Additional keyword arguments.

async process_frame(frame: Frame) → ProcessFrameResult[source]

Process an incoming frame to detect user turn start.

Parameters:: frame – The frame to be analyzed.
Returns:: STOP if a user started speaking frame was received, CONTINUE otherwise.

class pipecat.turns.user_start.KrispVivaIPUserTurnStartStrategy(*, model_path: str | None = None, threshold: float = 0.5, frame_duration_ms: int = 20, api_key: str = '', **kwargs)[source]

Bases: BaseUserTurnStartStrategy

User turn start strategy using Krisp VIVA Interruption Prediction.

When VAD detects user speech, this strategy feeds audio frames into the Krisp VIVA IP model. The model outputs a probability indicating whether the speech is a genuine interruption (as opposed to a backchannel). A user turn is triggered only when this probability exceeds the configured threshold.

This strategy is designed to work alongside other start strategies (e.g. TranscriptionUserTurnStartStrategy as a fallback) via the strategy list in UserTurnStrategies.

Example:

from pipecat.turns.user_start import KrispVivaIPUserTurnStartStrategy

strategies = UserTurnStrategies(
    start=[
        KrispVivaIPUserTurnStartStrategy(
            model_path="/path/to/ip_model.kef",
            threshold=0.5,
        ),
        TranscriptionUserTurnStartStrategy(),
    ],
)

__init__(*, model_path: str | None = None, threshold: float = 0.5, frame_duration_ms: int = 20, api_key: str = '', **kwargs)[source]

Initialize the Krisp VIVA IP user turn start strategy.

Parameters:

model_path – Path to the Krisp VIVA IP model file (.kef). If None, uses the KRISP_VIVA_IP_MODEL_PATH environment variable.
threshold – IP probability threshold (0.0 to 1.0). When the model’s output exceeds this value, the speech is classified as a genuine interruption.
frame_duration_ms – Frame duration in milliseconds for IP processing. Supported values: 10, 15, 20, 30, 32.
api_key – Krisp SDK API key. If empty, falls back to the KRISP_VIVA_API_KEY environment variable.
**kwargs – Additional arguments passed to BaseUserTurnStartStrategy.

async cleanup()[source]: Release Krisp SDK resources.

async reset()[source]: Reset the strategy to its initial state.

async process_frame(frame: Frame) → ProcessFrameResult[source]

Process a frame to detect genuine user interruptions.

On VADUserStartedSpeakingFrame, begins collecting audio. On InputAudioRawFrame, feeds audio through the IP model and triggers a user turn if the interruption probability exceeds the threshold. On VADUserStoppedSpeakingFrame or BotStoppedSpeakingFrame, resets the candidate state.

Parameters:: frame – The incoming frame.
Returns:: STOP if a genuine interruption was detected, CONTINUE otherwise.

class pipecat.turns.user_start.MinWordsUserTurnStartStrategy(*, min_words: int, use_interim: bool = True, **kwargs)[source]

Bases: BaseUserTurnStartStrategy

User turn start strategy based on a minimum number of words spoken by the user.

This strategy signals the start of a user turn once the user has spoken at least a specified number of words, as determined from transcription frames. Optionally, interim transcriptions can be used for earlier detection.

__init__(*, min_words: int, use_interim: bool = True, **kwargs)[source]

Initialize the minimum words bot turn start strategy.

Parameters:

min_words – Minimum number of spoken words required to trigger the start of a user turn.
use_interim – Whether to consider interim transcription frames for earlier detection.
**kwargs – Additional keyword arguments.

async reset()[source]: Reset the strategy to its initial state.

async process_frame(frame: Frame) → ProcessFrameResult[source]

Process an incoming frame to detect the start of a user turn.

This method updates internal state based on transcription frames and triggers the user turn once the minimum word count is reached.

Parameters:: frame – The frame to be analyzed.
Returns:: STOP if the minimum word count was reached, CONTINUE otherwise.

class pipecat.turns.user_start.TranscriptionUserTurnStartStrategy(*, use_interim: bool = True, **kwargs)[source]

Bases: BaseUserTurnStartStrategy

User turn start strategy based on transcriptions.

This strategy signals the start of a user turn when a transcription is received while the bot is speaking. It is useful as a fallback in scenarios where VAD-based detection fails (for example, when the user speaks very softly) but the STT service still produces transcriptions.

__init__(*, use_interim: bool = True, **kwargs)[source]: Initialize transcription-based user turn start strategy.

async process_frame(frame: Frame) → ProcessFrameResult[source]

Process an incoming frame to detect the start of a user turn.

Parameters:: frame – The frame to be processed.
Returns:: STOP if a transcription was received, CONTINUE otherwise.

class pipecat.turns.user_start.UserTurnStartedParams(enable_interruptions: bool, enable_user_speaking_frames: bool)[source]

Bases: object

Parameters emitted when a user turn starts.

These parameters are passed to the on_user_turn_started event and provide contextual information about how the user turn should be handled by the user aggregator.

Parameters:: enable_user_speaking_frames – Whether the user aggregator should emit frames indicating user speaking state (e.g., user started speaking) during the bot’s turn. This is typically enabled by default, but may be disabled when another component (such as an STT service) is already responsible for generating user speaking frames.

enable_interruptions: bool

enable_user_speaking_frames: bool

class pipecat.turns.user_start.VADUserTurnStartStrategy(*, enable_interruptions: bool = True, enable_user_speaking_frames: bool = True, **kwargs)[source]

Bases: BaseUserTurnStartStrategy

User turn start strategy based on VAD (Voice Activity Detection).

This strategy assumes the user turn starts as soon as a VAD frame indicates that the user has started speaking.

async process_frame(frame: Frame) → ProcessFrameResult[source]

Process an incoming frame to detect user turn start.

Parameters:: frame – The frame to be analyzed.
Returns:: STOP if the user started speaking, CONTINUE otherwise.

class pipecat.turns.user_start.WakePhraseUserTurnStartStrategy(*, phrases: list[str], timeout: float = 10.0, single_activation: bool = False, **kwargs)[source]

Bases: BaseUserTurnStartStrategy

User turn start strategy that requires a wake phrase before interaction.

Blocks subsequent strategies until a wake phrase is detected in a final transcription. After detection, allows interaction for a configurable timeout period before requiring the wake phrase again. Use single_activation=True to require the wake phrase before every turn.

This strategy should be placed first in the start strategies list.

Event handlers available:

on_wake_phrase_detected: Called when a wake phrase is matched.
on_wake_phrase_timeout: Called when the inactivity timeout expires (timeout mode only).

Example:

# Timeout mode (default): wake phrase unlocks interaction for 10s
strategy = WakePhraseUserTurnStartStrategy(
    phrases=["hey pipecat", "ok pipecat"],
    timeout=10.0,
)

# Single activation: wake phrase required before every turn
strategy = WakePhraseUserTurnStartStrategy(
    phrases=["hey pipecat"],
    single_activation=True,
)

@strategy.event_handler("on_wake_phrase_detected")
async def on_wake_phrase_detected(strategy, phrase):
    ...

@strategy.event_handler("on_wake_phrase_timeout")
async def on_wake_phrase_timeout(strategy):
    ...

Parameters:

phrases – List of wake phrases to detect.
timeout – Inactivity timeout in seconds before returning to IDLE. In timeout mode, the timer resets on activity (user, bot speech). In single activation mode, acts as a keepalive window — the strategy stays AWAKE for this duration after wake phrase detection, allowing the current turn to complete before returning to IDLE.
single_activation – If True, the wake phrase is required before every turn. The strategy returns to IDLE after each turn completes.
**kwargs – Additional keyword arguments passed to parent.

__init__(*, phrases: list[str], timeout: float = 10.0, single_activation: bool = False, **kwargs)[source]

Initialize the wake phrase user turn start strategy.

Parameters:

phrases – List of wake phrases to detect.
timeout – Inactivity timeout in seconds before returning to IDLE. In timeout mode, the timer resets on activity. In single activation mode, acts as a keepalive window after wake phrase detection.
single_activation – If True, the wake phrase is required before every turn. The strategy returns to IDLE after each turn completes.
**kwargs – Additional keyword arguments passed to parent.

property state: _WakeState: Returns the current wake state.

async setup(task_manager: BaseTaskManager)[source]

Initialize the strategy with the given task manager.

Parameters:: task_manager – The task manager to be associated with this instance.

async cleanup()[source]: Cleanup the strategy.

async reset()[source]

Reset the strategy.

In timeout mode, preserves state and refreshes timeout since reset means a turn started (activity). In single activation mode, does nothing — the keepalive timeout (started when the wake phrase was detected) handles the transition back to IDLE.

async process_frame(frame: Frame) → ProcessFrameResult[source]

Process an incoming frame for wake phrase detection or passthrough.

Parameters:: frame – The frame to be processed.
Returns:: STOP when the wake phrase is detected or when in IDLE state (blocks subsequent strategies), CONTINUE when in AWAKE state (allows subsequent strategies to proceed).

user_start

Submodules