speech_timeout_user_turn_stop_strategy

Speech timeout-based user turn stop strategy.

class pipecat.turns.user_stop.speech_timeout_user_turn_stop_strategy.SpeechTimeoutUserTurnStopStrategy(*, user_speech_timeout: float = 0.6, **kwargs)[source]

Bases: BaseUserTurnStopStrategy

User turn stop strategy using two independent timers after VAD stop.

After the user stops speaking (detected by VAD), this strategy runs two independent timers. The user turn stop is triggered only when both have finished and at least one transcript has been received:

  • user_speech_timeout: Policy floor — the window in which the user may resume speaking after a pause. Always runs to completion.

  • stt_timeout: Safety net for STT latency — the P99 time for the STT service to return a final transcript after VAD stop, adjusted by the VAD stop_secs. Short-circuited when the STT service emits a finalized transcript (TranscriptionFrame.finalized=True), since finalization means STT has nothing more to send.

Fallback: when a transcript arrives without a VAD stop event, the user_speech_timeout timer measures inactivity since the last transcript (rearmed on each transcript). stt_timeout has no meaning here since it is defined relative to VAD stop, and STT has already emitted a transcript — so the stt wait is marked done immediately.

__init__(*, user_speech_timeout: float = 0.6, **kwargs)[source]

Initialize the speech timeout-based user turn stop strategy.

Parameters:
  • user_speech_timeout – Time to wait for the user to potentially say more after they pause speaking. Defaults to 0.6 seconds.

  • **kwargs – Additional keyword arguments.

async reset()[source]

Reset the strategy to its initial state.

async setup(task_manager: BaseTaskManager)[source]

Initialize the strategy with the given task manager.

Parameters:

task_manager – The task manager to be associated with this instance.

async cleanup()[source]

Cleanup the strategy.

async process_frame(frame: Frame) ProcessFrameResult[source]

Process an incoming frame to update strategy state.

Updates internal transcription text and VAD state. The user end turn will be triggered when appropriate based on the collected frames.

Parameters:

frame – The frame to be analyzed.

Returns:

Always returns CONTINUE so subsequent stop strategies are evaluated.