video

HeyGen implementation for Pipecat.

This module provides integration with the HeyGen platform for creating conversational AI applications with avatars. It manages conversation sessions and provides real-time audio/video streaming capabilities through the HeyGen API.

class pipecat.services.heygen.video.HeyGenVideoSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>)[source]

Bases: ServiceSettings

Settings for the HeyGen video service.

class pipecat.services.heygen.video.HeyGenVideoService(*, api_key: str, session: ClientSession, session_request: LiveAvatarNewSessionRequest | NewSessionRequest | None = None, service_type: ServiceType | None = None, settings: HeyGenVideoSettings | None = None, **kwargs)[source]

Bases: AIService

A service that integrates HeyGen’s interactive avatar capabilities into the pipeline.

This service manages the lifecycle of a HeyGen avatar session by handling bidirectional audio/video streaming, avatar animations, and user interactions. It processes various frame types to coordinate the avatar’s behavior and maintains synchronization between audio and video streams.

The service supports:

  • Real-time avatar animation based on audio input

  • Voice activity detection for natural interactions

  • Interrupt handling for more natural conversations

  • Audio resampling for optimal quality

  • Automatic session management

Parameters:
  • api_key (str) – HeyGen API key for authentication

  • session (aiohttp.ClientSession) – HTTP client session for API requests

  • session_request (NewSessionRequest, optional) – Configuration for the HeyGen session. Defaults to using the “Shawn_Therapist_public” avatar with “v2” version.

Settings

alias of HeyGenVideoSettings

__init__(*, api_key: str, session: ClientSession, session_request: LiveAvatarNewSessionRequest | NewSessionRequest | None = None, service_type: ServiceType | None = None, settings: HeyGenVideoSettings | None = None, **kwargs) None[source]

Initialize the HeyGen video service.

Parameters:
  • api_key – HeyGen API key for authentication

  • session – HTTP client session for API requests

  • session_request – Configuration for the HeyGen session

  • service_type – Service type for the avatar session

  • settings – Runtime-updatable settings. HeyGen has no model concept, so this is primarily used for the extra dict.

  • **kwargs – Additional arguments passed to parent AIService

async setup(setup: FrameProcessorSetup)[source]

Set up the HeyGen video service with necessary configuration.

Initializes the HeyGen client, establishes connections, and prepares the service for audio/video processing. This includes setting up audio/video streams, configuring callbacks, and initializing the resampler.

Parameters:

setup – Configuration parameters for the frame processor.

async cleanup()[source]

Clean up the service and release resources.

Terminates the HeyGen client session and cleans up associated resources.

async start(frame: StartFrame)[source]

Start the HeyGen video service and initialize the avatar session.

Creates necessary tasks for audio/video processing and establishes the connection with the HeyGen service.

Parameters:

frame – The start frame containing initialization parameters.

async stop(frame: EndFrame)[source]

Stop the HeyGen video service gracefully.

Performs cleanup by ending the conversation and cancelling ongoing tasks in a controlled manner.

Parameters:

frame – The end frame.

async cancel(frame: CancelFrame)[source]

Cancel the HeyGen video service.

Performs an immediate termination of the service, cleaning up resources without waiting for ongoing operations to complete.

Parameters:

frame – The cancel frame.

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process incoming frames and coordinate avatar behavior.

Handles different types of frames to manage avatar interactions: - UserStartedSpeakingFrame: Activates avatar’s listening animation - UserStoppedSpeakingFrame: Deactivates avatar’s listening state - TTSAudioRawFrame: Processes audio for avatar speech - Other frames: Forwards them through the pipeline

Parameters:
  • frame – The frame to be processed.

  • direction – The direction of frame processing (input/output).

can_generate_metrics() bool[source]

Check if the service can generate metrics.

Returns:

True if metrics generation is supported.