events

Event models and data structures for OpenAI Realtime API communication.

class pipecat.services.openai.realtime.events.AudioFormat(*, type: str)[source]

Bases: BaseModel

Base class for audio format configuration.

type: str

class pipecat.services.openai.realtime.events.PCMAudioFormat(*, type: Literal['audio/pcm'] = 'audio/pcm', rate: Literal[24000] = 24000)[source]

Bases: AudioFormat

PCM audio format configuration.

Parameters:

type – Audio format type, always “audio/pcm”.
rate – Sample rate, always 24000 for PCM.

type: Literal['audio/pcm']

rate: Literal[24000]

class pipecat.services.openai.realtime.events.PCMUAudioFormat(*, type: Literal['audio/pcmu'] = 'audio/pcmu')[source]

Bases: AudioFormat

PCMU (G.711 μ-law) audio format configuration.

Parameters:: type – Audio format type, always “audio/pcmu”.

type: Literal['audio/pcmu']

class pipecat.services.openai.realtime.events.PCMAAudioFormat(*, type: Literal['audio/pcma'] = 'audio/pcma')[source]

Bases: AudioFormat

PCMA (G.711 A-law) audio format configuration.

Parameters:: type – Audio format type, always “audio/pcma”.

type: Literal['audio/pcma']

class pipecat.services.openai.realtime.events.InputAudioTranscription(model: str | None = 'gpt-4o-transcribe', language: str | None = None, prompt: str | None = None)[source]

Bases: BaseModel

Configuration for audio transcription settings.

model: str

language: str | None

prompt: str | None

__init__(model: str | None = 'gpt-4o-transcribe', language: str | None = None, prompt: str | None = None)[source]

Initialize InputAudioTranscription.

Parameters:

model – Transcription model to use (e.g., “gpt-4o-transcribe”, “whisper-1”).
language – Optional language code for transcription.
prompt – Optional transcription hint text.

class pipecat.services.openai.realtime.events.TurnDetection(*, type: Literal['server_vad'] | None = 'server_vad', threshold: float | None = 0.5, prefix_padding_ms: int | None = 300, silence_duration_ms: int | None = 500)[source]

Bases: BaseModel

Server-side voice activity detection configuration.

Parameters:

type – Detection type, must be “server_vad”.
threshold – Voice activity detection threshold (0.0-1.0). Defaults to 0.5.
prefix_padding_ms – Padding before speech starts in milliseconds. Defaults to 300.
silence_duration_ms – Silence duration to detect speech end in milliseconds. Defaults to 500.

type: Literal['server_vad'] | None

threshold: float | None

prefix_padding_ms: int | None

silence_duration_ms: int | None

class pipecat.services.openai.realtime.events.SemanticTurnDetection(*, type: Literal['semantic_vad'] | None = 'semantic_vad', eagerness: Literal['low', 'medium', 'high', 'auto'] | None = None, create_response: bool | None = None, interrupt_response: bool | None = None)[source]

Bases: BaseModel

Semantic-based turn detection configuration.

Parameters:

type – Detection type, must be “semantic_vad”.
eagerness – Turn detection eagerness level. Can be “low”, “medium”, “high”, or “auto”.
create_response – Whether to automatically create responses on turn detection.
interrupt_response – Whether to interrupt ongoing responses on turn detection.

type: Literal['semantic_vad'] | None

eagerness: Literal['low', 'medium', 'high', 'auto'] | None

create_response: bool | None

interrupt_response: bool | None

class pipecat.services.openai.realtime.events.InputAudioNoiseReduction(*, type: Literal['near_field', 'far_field'] | None)[source]

Bases: BaseModel

Input audio noise reduction configuration.

Parameters:: type – Noise reduction type for different microphone scenarios.

type: Literal['near_field', 'far_field'] | None

Bases: BaseModel

Audio input configuration.

Parameters:

format – The format of the input audio.
transcription – Configuration for input audio transcription.
noise_reduction – Configuration for input audio noise reduction.
turn_detection – Configuration for turn detection, or False to disable.

format: PCMAudioFormat | PCMUAudioFormat | PCMAAudioFormat | None

transcription: InputAudioTranscription | None

noise_reduction: InputAudioNoiseReduction | None

turn_detection: TurnDetection | SemanticTurnDetection | bool | None

Bases: BaseModel

Audio output configuration.

Parameters:

format – The format of the output audio.
voice – The voice the model uses to respond.
speed – The speed of the model’s spoken response.

format: PCMAudioFormat | PCMUAudioFormat | PCMAAudioFormat | None

voice: str | None

speed: float | None

class pipecat.services.openai.realtime.events.AudioConfiguration(*, input: AudioInput | None = None, output: AudioOutput | None = None)[source]

Bases: BaseModel

Audio configuration for input and output.

Parameters:

input – Configuration for input audio.
output – Configuration for output audio.

input: AudioInput | None

output: AudioOutput | None

Bases: BaseModel

Configuration properties for an OpenAI Realtime session.

Parameters:

type – The type of session, always “realtime”.
object – Object type identifier, always “realtime.session”.
id – Unique identifier for the session.
model – The Realtime model used for this session. Note: The model is set at connection time via model arg in __init__ and cannot be changed during the session.
output_modalities – The set of modalities the model can respond with.
instructions – System instructions for the assistant.
audio – Configuration for input and output audio.
tools – Available function tools for the assistant.
tool_choice – Tool usage strategy (“auto”, “none”, or “required”).
max_output_tokens – Maximum tokens in response or “inf” for unlimited.
tracing – Configuration options for tracing.
prompt – Reference to a prompt template and its variables.
expires_at – Session expiration timestamp.
include – Additional fields to include in server outputs.

type: Literal['realtime'] | None

object: Literal['realtime.session'] | None

id: str | None

model: str | None

output_modalities: list[Literal['text', 'audio']] | None

instructions: str | None

audio: AudioConfiguration | None

tools: ToolsSchema | list[dict] | None

tool_choice: Literal['auto', 'none', 'required'] | None

max_output_tokens: int | Literal['inf'] | None

tracing: Literal['auto'] | dict | None

prompt: dict | None

expires_at: int | None

include: list[str] | None

class pipecat.services.openai.realtime.events.ItemContent(*, type: Literal['text', 'audio', 'input_text', 'input_audio', 'input_image', 'output_text', 'output_audio'], text: str | None = None, audio: str | None = None, transcript: str | None = None, image_url: str | None = None, detail: Literal['auto', 'low', 'high'] | None = None)[source]

Bases: BaseModel

Content within a conversation item.

Parameters:

type – Content type (text, audio, input_text, input_audio, input_image, output_text, or output_audio).
text – Text content for text-based items.
audio – Base64-encoded audio data for audio items.
transcript – Transcribed text for audio items.
image_url – Base64-encoded image data as a data URI for input_image items.
detail – Detail level for image processing (“auto”, “low”, or “high”).

type: Literal['text', 'audio', 'input_text', 'input_audio', 'input_image', 'output_text', 'output_audio']

text: str | None

audio: str | None

transcript: str | None

image_url: str | None

detail: Literal['auto', 'low', 'high'] | None

class pipecat.services.openai.realtime.events.ConversationItem(*, id: str = <factory>, object: ~typing.Literal['realtime.item'] | None = None, type: ~typing.Literal['message', 'function_call', 'function_call_output'], status: ~typing.Literal['completed', 'in_progress', 'incomplete'] | None = None, role: ~typing.Literal['user', 'assistant', 'system'] | None = None, content: list[~pipecat.services.openai.realtime.events.ItemContent] | None = None, call_id: str | None = None, name: str | None = None, arguments: str | None = None, output: str | None = None)[source]

Bases: BaseModel

A conversation item in the realtime session.

Parameters:

id – Unique identifier for the item, auto-generated if not provided.
object – Object type identifier for the realtime API.
type – Item type (message, function_call, or function_call_output).
status – Current status of the item.
role – Speaker role for message items (user, assistant, or system).
content – Content list for message items.
call_id – Function call identifier for function_call items.
name – Function name for function_call items.
arguments – Function arguments as JSON string for function_call items.
output – Function output as JSON string for function_call_output items.

id: str

object: Literal['realtime.item'] | None

type: Literal['message', 'function_call', 'function_call_output']

status: Literal['completed', 'in_progress', 'incomplete'] | None

role: Literal['user', 'assistant', 'system'] | None

content: list[ItemContent] | None

call_id: str | None

name: str | None

arguments: str | None

output: str | None

class pipecat.services.openai.realtime.events.RealtimeConversation(*, id: str, object: Literal['realtime.conversation'])[source]

Bases: BaseModel

A realtime conversation session.

Parameters:

id – Unique identifier for the conversation.
object – Object type identifier, always “realtime.conversation”.

id: str

object: Literal['realtime.conversation']

class pipecat.services.openai.realtime.events.ResponseProperties(*, output_modalities: list[Literal['text', 'audio']] | None = ['audio'], instructions: str | None = None, audio: AudioConfiguration | None = None, tools: list[dict] | None = None, tool_choice: Literal['auto', 'none', 'required'] | None = None, temperature: float | None = None, max_output_tokens: int | Literal['inf'] | None = None)[source]

Bases: BaseModel

Properties for configuring assistant responses.

Parameters:

output_modalities – Output modalities for the response. Must be either [“text”] or [“audio”]. Defaults to [“audio”].
instructions – Specific instructions for this response.
audio – Audio configuration for this response.
tools – Available tools for this response.
tool_choice – Tool usage strategy for this response.
temperature – Sampling temperature for this response.
max_output_tokens – Maximum tokens for this response.

output_modalities: list[Literal['text', 'audio']] | None

instructions: str | None

audio: AudioConfiguration | None

tools: list[dict] | None

tool_choice: Literal['auto', 'none', 'required'] | None

temperature: float | None

max_output_tokens: int | Literal['inf'] | None

class pipecat.services.openai.realtime.events.RealtimeError(*, type: str, code: str | None = '', message: str, param: str | None = None, event_id: str | None = None)[source]

Bases: BaseModel

Error information from the realtime API.

Parameters:

type – Error type identifier.
code – Specific error code.
message – Human-readable error message.
param – Parameter name that caused the error, if applicable.
event_id – Event ID associated with the error, if applicable.

type: str

code: str | None

message: str

param: str | None

event_id: str | None

class pipecat.services.openai.realtime.events.ClientEvent(*, event_id: str = <factory>)[source]

Bases: BaseModel

Base class for client events sent to the realtime API.

Parameters:: event_id – Unique identifier for the event, auto-generated if not provided.

event_id: str

class pipecat.services.openai.realtime.events.SessionUpdateEvent(*, event_id: str = <factory>, type: Literal['session.update'] = 'session.update', session: SessionProperties)[source]

Bases: ClientEvent

Event to update session properties.

Parameters:

type – Event type, always “session.update”.
session – Updated session properties.

type: Literal['session.update']

session: SessionProperties

model_dump(*args, **kwargs) → dict[str, Any][source]

Serialize the event to a dictionary.

Handles special serialization for turn_detection where False becomes null.

Parameters:

*args – Positional arguments passed to parent model_dump.
**kwargs – Keyword arguments passed to parent model_dump.

Returns:

Dictionary representation of the event.

class pipecat.services.openai.realtime.events.InputAudioBufferAppendEvent(*, event_id: str = <factory>, type: Literal['input_audio_buffer.append'] = 'input_audio_buffer.append', audio: str)[source]

Bases: ClientEvent

Event to append audio data to the input buffer.

Parameters:

type – Event type, always “input_audio_buffer.append”.
audio – Base64-encoded audio data to append.

type: Literal['input_audio_buffer.append']

audio: str

class pipecat.services.openai.realtime.events.InputAudioBufferCommitEvent(*, event_id: str = <factory>, type: Literal['input_audio_buffer.commit'] = 'input_audio_buffer.commit')[source]

Bases: ClientEvent

Event to commit the current input audio buffer.

Parameters:: type – Event type, always “input_audio_buffer.commit”.

type: Literal['input_audio_buffer.commit']

class pipecat.services.openai.realtime.events.InputAudioBufferClearEvent(*, event_id: str = <factory>, type: Literal['input_audio_buffer.clear'] = 'input_audio_buffer.clear')[source]

Bases: ClientEvent

Event to clear the input audio buffer.

Parameters:: type – Event type, always “input_audio_buffer.clear”.

type: Literal['input_audio_buffer.clear']

class pipecat.services.openai.realtime.events.ConversationItemCreateEvent(*, event_id: str = <factory>, type: Literal['conversation.item.create'] = 'conversation.item.create', previous_item_id: str | None = None, item: ConversationItem)[source]

Bases: ClientEvent

Event to create a new conversation item.

Parameters:

type – Event type, always “conversation.item.create”.
previous_item_id – ID of the item to insert after, if any.
item – The conversation item to create.

type: Literal['conversation.item.create']

previous_item_id: str | None

item: ConversationItem

class pipecat.services.openai.realtime.events.ConversationItemTruncateEvent(*, event_id: str = <factory>, type: Literal['conversation.item.truncate'] = 'conversation.item.truncate', item_id: str, content_index: int, audio_end_ms: int)[source]

Bases: ClientEvent

Event to truncate a conversation item’s audio content.

Parameters:

type – Event type, always “conversation.item.truncate”.
item_id – ID of the item to truncate.
content_index – Index of the content to truncate within the item.
audio_end_ms – End time in milliseconds for the truncated audio.

type: Literal['conversation.item.truncate']

item_id: str

content_index: int

audio_end_ms: int

class pipecat.services.openai.realtime.events.ConversationItemDeleteEvent(*, event_id: str = <factory>, type: Literal['conversation.item.delete'] = 'conversation.item.delete', item_id: str)[source]

Bases: ClientEvent

Event to delete a conversation item.

Parameters:

type – Event type, always “conversation.item.delete”.
item_id – ID of the item to delete.

type: Literal['conversation.item.delete']

item_id: str

class pipecat.services.openai.realtime.events.ConversationItemRetrieveEvent(*, event_id: str = <factory>, type: Literal['conversation.item.retrieve'] = 'conversation.item.retrieve', item_id: str)[source]

Bases: ClientEvent

Event to retrieve a conversation item by ID.

Parameters:

type – Event type, always “conversation.item.retrieve”.
item_id – ID of the item to retrieve.

type: Literal['conversation.item.retrieve']

item_id: str

class pipecat.services.openai.realtime.events.ResponseCreateEvent(*, event_id: str = <factory>, type: Literal['response.create'] = 'response.create', response: ResponseProperties | None = None)[source]

Bases: ClientEvent

Event to create a new assistant response.

Parameters:

type – Event type, always “response.create”.
response – Optional response configuration properties.

type: Literal['response.create']

response: ResponseProperties | None

class pipecat.services.openai.realtime.events.ResponseCancelEvent(*, event_id: str = <factory>, type: Literal['response.cancel'] = 'response.cancel')[source]

Bases: ClientEvent

Event to cancel the current assistant response.

Parameters:: type – Event type, always “response.cancel”.

type: Literal['response.cancel']

class pipecat.services.openai.realtime.events.ServerEvent(*, event_id: str, type: str)[source]

Bases: BaseModel

Base class for server events received from the realtime API.

Parameters:

event_id – Unique identifier for the event.
type – Type of the server event.

event_id: str

type: str

class pipecat.services.openai.realtime.events.SessionCreatedEvent(*, event_id: str, type: Literal['session.created'], session: SessionProperties)[source]