voicemail_detector

Voicemail detection module for Pipecat.

This module provides voicemail detection capabilities using parallel pipeline processing to classify incoming calls as either voicemail messages or live conversations. It’s specifically designed for outbound calling scenarios where a bot needs to determine if a human answered or if the call went to voicemail.

Note

The voicemail module is optimized for text LLMs only.

class pipecat.extensions.voicemail.voicemail_detector.NotifierGate(notifier: BaseNotifier, task_name: str = 'gate')[source]

Bases: FrameProcessor

Base gate processor that controls frame flow based on notifier signals.

This base class provides common gate functionality for processors that need to start open and close permanently when a notifier signals. Subclasses define which frames are allowed through when the gate is closed.

The gate starts open to allow initial processing and closes permanently once the notifier signals. This ensures controlled frame flow based on external decisions or events.

__init__(notifier: BaseNotifier, task_name: str = 'gate')[source]

Initialize the notifier gate.

Parameters:

notifier – Notifier that signals when the gate should close.
task_name – Name for the notification waiting task (for debugging).

async setup(setup: FrameProcessorSetup)[source]

Set up the processor with required components.

Parameters:: setup – Configuration object containing setup parameters.

async cleanup()[source]: Clean up the processor resources.

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process frames and control gate state based on notifier signals.

Parameters:

frame – The frame to process.
direction – The direction of frame flow in the pipeline.

class pipecat.extensions.voicemail.voicemail_detector.ClassifierGate(gate_notifier: BaseNotifier, conversation_notifier: BaseNotifier)[source]

Bases: NotifierGate

Gate processor that controls frame flow based on classification decisions.

Inherits from NotifierGate and starts open to allow initial classification processing. Closes permanently once a classification decision is made (CONVERSATION or VOICEMAIL). This ensures the classifier only runs until a definitive decision is reached, preventing unnecessary LLM calls and maintaining system efficiency.

When closed, only allows system frames and user speaking frames to continue. Speaking frames are needed for voicemail timing control, but not for conversation.

__init__(gate_notifier: BaseNotifier, conversation_notifier: BaseNotifier)[source]

Initialize the classifier gate.

Parameters:

gate_notifier – Notifier that signals when a classification decision has been made and the gate should close.
conversation_notifier – Notifier that signals when conversation is detected.

async setup(setup: FrameProcessorSetup)[source]

Set up the processor with required components.

Parameters:: setup – Configuration object containing setup parameters.

async cleanup()[source]: Clean up the processor resources.

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process frames and control gate state based on notifier signals.

Parameters:

frame – The frame to process.
direction – The direction of frame flow in the pipeline.

class pipecat.extensions.voicemail.voicemail_detector.ConversationGate(voicemail_notifier: BaseNotifier)[source]

Bases: NotifierGate

Gate processor that blocks conversation flow when voicemail is detected.

Inherits from NotifierGate and starts open to allow normal conversation processing. Closes permanently when voicemail is detected to prevent the main conversation LLM from processing additional input after voicemail classification.

When closed, only allows system frames and user speaking frames to continue.

__init__(voicemail_notifier: BaseNotifier)[source]

Initialize the conversation gate.

Parameters:: voicemail_notifier – Notifier that signals when voicemail has been detected and the conversation should be blocked.

class pipecat.extensions.voicemail.voicemail_detector.ClassificationProcessor(*, gate_notifier: BaseNotifier, conversation_notifier: BaseNotifier, voicemail_notifier: BaseNotifier, voicemail_response_delay: float)[source]

Bases: FrameProcessor

Processor that handles LLM classification responses and triggers events.

This processor aggregates LLM text tokens into complete responses and analyzes them to determine if the call reached a voicemail system or a live person. It uses the LLM response frame delimiters (LLMFullResponseStartFrame and LLMFullResponseEndFrame) to ensure complete token aggregation regardless of how the LLM tokenizes the response words.

The processor expects responses containing either “CONVERSATION” (indicating a human answered) or “VOICEMAIL” (indicating an automated system). Once a decision is made, it triggers the appropriate notifications and event handlers.

For voicemail detection, the event handler timer starts immediately and is cancelled and restarted based on user speech patterns to ensure proper timing.

__init__(*, gate_notifier: BaseNotifier, conversation_notifier: BaseNotifier, voicemail_notifier: BaseNotifier, voicemail_response_delay: float)[source]

Initialize the voicemail processor.

Parameters:

gate_notifier – Notifier to signal the ClassifierGate about classification decisions so it can close and stop processing.
conversation_notifier – Notifier to signal the TTSGate to release all gated TTS frames for normal conversation flow.
voicemail_notifier – Notifier to signal the TTSGate to clear gated TTS frames since voicemail was detected.
voicemail_response_delay – Delay in seconds after user stops speaking before triggering the voicemail event handler. This ensures the voicemail greeting or user message is complete before responding.

async setup(setup: FrameProcessorSetup)[source]

Set up the processor with required components.

Parameters:: setup – Configuration object containing setup parameters.

async cleanup()[source]: Clean up the processor resources.

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process frames and handle LLM classification responses.

This method implements a state machine for aggregating LLM responses: 1. LLMFullResponseStartFrame: Begin collecting tokens 2. LLMTextFrame: Accumulate text tokens into buffer 3. LLMFullResponseEndFrame: Process complete response and make decision 4. UserStartedSpeakingFrame/UserStoppedSpeakingFrame: Manage voicemail timing

Parameters:

frame – The frame to process.
direction – The direction of frame flow in the pipeline.

class pipecat.extensions.voicemail.voicemail_detector.TTSGate(conversation_notifier: BaseNotifier, voicemail_notifier: BaseNotifier)[source]

Bases: FrameProcessor

Gates TTS frames until voicemail classification decision is made.

This processor holds TTS output frames in a gate while the voicemail classification is in progress. This prevents audio from being played to the caller before determining if they’re human or a voicemail system.

The gate operates in two modes based on the classification result:

CONVERSATION: Opens the gate to release all held frames for normal dialogue
VOICEMAIL: Clears held frames since they’re not needed for voicemail

The gating only applies to TTS-related frames (TTSTextFrame, TTSAudioRawFrame). All other frames pass through immediately to maintain proper pipeline flow.

__init__(conversation_notifier: BaseNotifier, voicemail_notifier: BaseNotifier)[source]

Initialize the TTS gate.

Parameters:

conversation_notifier – Notifier that signals when a conversation is detected and gated frames should be released for playback.
voicemail_notifier – Notifier that signals when voicemail is detected and gated frames should be cleared (not played).

async setup(setup: FrameProcessorSetup)[source]

Set up the processor with required components.

Parameters:: setup – Configuration object containing setup parameters.

async cleanup()[source]: Clean up the processor resources.

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process frames and handle gating logic based on frame type.

TTS frames are gated while classification is active. All other frames pass through immediately. The gating state is controlled by the classification notifications.

Parameters:

frame – The frame to process.
direction – The direction of frame flow in the pipeline.

class pipecat.extensions.voicemail.voicemail_detector.VoicemailDetector(*, llm: LLMService, voicemail_response_delay: float = 2.0, custom_system_prompt: str | None = None)[source]

Bases: ParallelPipeline

Parallel pipeline for detecting voicemail vs. live conversation in outbound calls.

This detector uses a parallel pipeline architecture to perform real-time classification of outbound phone calls without interrupting the conversation flow. It determines whether a human answered the phone or if the call went to a voicemail system.

Architecture:

Conversation branch: Empty pipeline that allows normal frame flow
Classification branch: Contains the LLM classifier and decision logic

The system uses a gate mechanism to control when classification runs and a gating system to prevent TTS output until classification is complete. Once a decision is made, the appropriate action is taken:

CONVERSATION: Continue normal bot dialogue
VOICEMAIL: Trigger developer event handler for custom voicemail handling

Example:

classification_llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
detector = VoicemailDetector(llm=classification_llm)

@detector.event_handler("on_voicemail_detected")
async def handle_voicemail(processor):
    await processor.push_frame(TTSSpeakFrame("Please leave a message."))

pipeline = Pipeline([
    transport.input(),
    stt,
    detector.detector(),          # Classification
    context_aggregator.user(),
    llm,
    tts,
    detector.gate(),              # TTS gating
    transport.output(),
    context_aggregator.assistant(),
])

# For custom prompts, append the required response instruction:
custom_prompt = "Your custom classification logic here. " + VoicemailDetector.CLASSIFIER_RESPONSE_INSTRUCTION

Events:

on_conversation_detected: Triggered when a human conversation is detected. The: event handler receives one argument: the ClassificationProcessor instance which can be used to push frames.
on_voicemail_detected: Triggered when voicemail is detected after the configured: delay. The event handler receives one argument: the ClassificationProcessor instance which can be used to push frames.

Constants:

CLASSIFIER_RESPONSE_INSTRUCTION: The exact text that must be included in custom: system prompts to ensure proper classification functionality.

CLASSIFIER_RESPONSE_INSTRUCTION = 'Respond with ONLY "CONVERSATION" if a person answered, or "VOICEMAIL" if it\'s voicemail/recording.'

DEFAULT_SYSTEM_PROMPT = 'You are a voicemail detection classifier for an OUTBOUND calling system. A bot has called a phone number and you need to determine if a human answered or if the call went to voicemail based on the provided text.\n\nHUMAN ANSWERED - LIVE CONVERSATION (respond "CONVERSATION"):\n- Personal greetings: "Hello?", "Hi", "Yeah?", "John speaking"\n- Interactive responses: "Who is this?", "What do you want?", "Can I help you?"\n- Conversational tone expecting back-and-forth dialogue\n- Questions directed at the caller: "Hello? Anyone there?"\n- Informal responses: "Yep", "What\'s up?", "Speaking"\n- Natural, spontaneous speech patterns\n- Immediate acknowledgment of the call\n\nVOICEMAIL SYSTEM (respond "VOICEMAIL"):\n- Automated voicemail greetings: "Hi, you\'ve reached [name], please leave a message"\n- Phone carrier messages: "The number you have dialed is not in service", "Please leave a message", "All circuits are busy"\n- Professional voicemail: "This is [name], I\'m not available right now"\n- Instructions about leaving messages: "leave a message", "leave your name and number"\n- References to callback or messaging: "call me back", "I\'ll get back to you"\n- Carrier system messages: "mailbox is full", "has not been set up"\n- Business hours messages: "our office is currently closed"\n\nRespond with ONLY "CONVERSATION" if a person answered, or "VOICEMAIL" if it\'s voicemail/recording.'

__init__(*, llm: LLMService, voicemail_response_delay: float = 2.0, custom_system_prompt: str | None = None)[source]

Initialize the voicemail detector with classification and buffering components.

Parameters:

llm – LLM service used for voicemail vs conversation classification. Should be fast and reliable for real-time classification.
voicemail_response_delay – Delay in seconds after user stops speaking before triggering the voicemail event handler. This allows voicemail responses to be played back after a short delay to ensure the response occurs during the voicemail recording. Default is 2.0 seconds.
custom_system_prompt – Optional custom system prompt for classification. If None, uses the default prompt optimized for outbound calling scenarios. Custom prompts should instruct the LLM to respond with exactly “CONVERSATION” or “VOICEMAIL” for proper detection functionality.

detector() → VoicemailDetector[source]

Get the detector pipeline for placement after STT in the main pipeline.

This should be placed after the STT service and before the context aggregator in your main pipeline to enable voicemail classification.

Returns:: The VoicemailDetector instance itself (which is a ParallelPipeline).

gate() → TTSGate[source]

Get the gate processor for placement after TTS in the main pipeline.

This should be placed after the TTS service and before the transport output to enable TTS frame gating during classification.

Returns:: The TTSGate processor instance.

add_event_handler(event_name: str, handler)[source]

Add an event handler for voicemail detection events.

Parameters:

event_name – The name of the event to handle.
handler – The function to call when the event occurs.