client

HeyGen implementation for Pipecat.

This module provides integration with the HeyGen platform for creating conversational AI applications with avatars. It manages conversation sessions and provides real-time audio/video streaming capabilities through the HeyGen API.

class pipecat.services.heygen.client.ServiceType(*values)[source]

Bases: Enum

Enum for HeyGen service types.

INTERACTIVE_AVATAR = 'INTERACTIVE_AVATAR'
LIVE_AVATAR = 'LIVE_AVATAR'
class pipecat.services.heygen.client.HeyGenCallbacks(*, on_connected: Callable[[], Awaitable[None]], on_participant_connected: Callable[[str], Awaitable[None]], on_participant_disconnected: Callable[[str], Awaitable[None]])[source]

Bases: BaseModel

Callback handlers for HeyGen events.

Parameters:
  • on_connected – Called when the bot connects to the LiveKit room.

  • on_participant_connected – Called when a participant connects.

  • on_participant_disconnected – Called when a participant disconnects.

on_connected: Callable[[], Awaitable[None]]
on_participant_connected: Callable[[str], Awaitable[None]]
on_participant_disconnected: Callable[[str], Awaitable[None]]
class pipecat.services.heygen.client.HeyGenClient(*, api_key: str, session: ClientSession, params: TransportParams, session_request: LiveAvatarNewSessionRequest | NewSessionRequest | None = None, service_type: ServiceType | None = None, callbacks: HeyGenCallbacks, connect_as_user: bool = False)[source]

Bases: object

A client for interacting with HeyGen’s Interactive Avatar Realtime API.

This client manages both WebSocket and LiveKit connections for real-time avatar streaming, handling bi-directional audio/video communication and avatar control. It implements the API defined in https://docs.heygen.com/docs/interactive-avatar-realtime-api

The client manages the following connections: 1. WebSocket connection for avatar control and audio streaming 2. LiveKit connection for receiving avatar video and audio

Parameters:

HEY_GEN_SAMPLE_RATE (int) – The required sample rate for HeyGen’s audio processing (24000 Hz)

__init__(*, api_key: str, session: ClientSession, params: TransportParams, session_request: LiveAvatarNewSessionRequest | NewSessionRequest | None = None, service_type: ServiceType | None = None, callbacks: HeyGenCallbacks, connect_as_user: bool = False) None[source]

Initialize the HeyGen client.

Parameters:
  • api_key – HeyGen API key for authentication

  • session – HTTP client session for API requests

  • params – Transport configuration parameters

  • session_request – Configuration for the HeyGen session (optional)

  • service_type – Type of service to use

  • callbacks – Callback handlers for HeyGen events

  • connect_as_user – Whether to connect using the user token or not (default: False)

async setup(setup: FrameProcessorSetup) None[source]

Setup the client and initialize the conversation.

Establishes a new session with HeyGen’s API if one doesn’t exist.

Parameters:

setup – The frame processor setup configuration.

async cleanup() None[source]

Cleanup client resources.

Closes the active HeyGen session and resets internal state.

async start(frame: StartFrame) None[source]

Start the client and establish all necessary connections.

Initializes WebSocket and LiveKit connections using the provided configuration. Sets up audio processing with the specified sample rates.

Parameters:

frame – Initial configuration frame containing audio parameters

async stop() None[source]

Stop the client and terminate all connections.

Disconnects from WebSocket and LiveKit endpoints, and performs cleanup.

async interrupt(event_id: str) None[source]

Interrupt the avatar’s current action.

Stops the current animation/speech and returns the avatar to idle state. Useful for handling user interruptions during avatar speech.

async start_agent_listening() None[source]

Start the avatar’s listening animation.

Triggers visual cues indicating the avatar is listening to user input.

async stop_agent_listening() None[source]

Stop the avatar’s listening animation.

Returns the avatar to idle state from listening state.

transport_ready() None[source]

Indicates that the output transport is ready and able to receive frames.

property out_sample_rate: int

Get the output sample rate.

Returns:

The output sample rate in Hz.

property in_sample_rate: int

Get the input sample rate.

Returns:

The input sample rate in Hz.

async agent_speak(audio: bytes, event_id: str) None[source]

Send audio data to the agent speak.

Parameters:
  • audio – Audio data as raw bytes (will be base64 encoded)

  • event_id – Unique identifier for the event

async agent_speak_end(event_id: str) None[source]

Send signaling that the agent has finished speaking.

Parameters:

event_id – Unique identifier for the event

async capture_participant_audio(participant_id: str, callback) None[source]

Capture audio frames from the HeyGen avatar.

Parameters:
  • participant_id – Identifier of the participant to capture audio from

  • callback – Async function to handle received audio frames

async capture_participant_video(participant_id: str, callback) None[source]

Capture video frames from the HeyGen avatar.

Parameters:
  • participant_id – Identifier of the participant to capture video from

  • callback – Async function to handle received video frames