tts
OpenAI text-to-speech service implementation.
This module provides integration with OpenAI’s text-to-speech API for generating high-quality synthetic speech from text input.
- class pipecat.services.openai.tts.OpenAITTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>, instructions: str | None | _NotGiven = <factory>, speed: float | None | _NotGiven = <factory>)[source]
Bases:
TTSSettingsSettings for OpenAITTSService.
- Parameters:
instructions – Instructions to guide voice synthesis behavior.
speed – Voice speed control (0.25 to 4.0, default 1.0).
- instructions: str | None | _NotGiven
- speed: float | None | _NotGiven
- class pipecat.services.openai.tts.OpenAITTSService(*, api_key: str | None = None, base_url: str | None = None, voice: str | None = None, model: str | None = None, sample_rate: int | None = None, instructions: str | None = None, speed: float | None = None, params: InputParams | None = None, settings: OpenAITTSSettings | None = None, **kwargs)[source]
Bases:
TTSServiceOpenAI Text-to-Speech service that generates audio from text.
This service uses the OpenAI TTS API to generate PCM-encoded audio at 24kHz. Supports multiple voice models and configurable parameters for high-quality speech synthesis with streaming audio output.
- Settings
alias of
OpenAITTSSettings
- OPENAI_SAMPLE_RATE = 24000
- class InputParams(*, instructions: str | None = None, speed: float | None = None)[source]
Bases:
BaseModelInput parameters for OpenAI TTS configuration.
Deprecated since version 0.0.105: Use
settings=OpenAITTSService.Settings(...)instead.- Parameters:
instructions – Instructions to guide voice synthesis behavior.
speed – Voice speed control (0.25 to 4.0, default 1.0).
- instructions: str | None
- speed: float | None
- __init__(*, api_key: str | None = None, base_url: str | None = None, voice: str | None = None, model: str | None = None, sample_rate: int | None = None, instructions: str | None = None, speed: float | None = None, params: InputParams | None = None, settings: OpenAITTSSettings | None = None, **kwargs)[source]
Initialize OpenAI TTS service.
- Parameters:
api_key – OpenAI API key for authentication. If None, uses environment variable.
base_url – Custom base URL for OpenAI API. If None, uses default.
voice –
Voice ID to use for synthesis. Defaults to “alloy”.
Deprecated since version 0.0.105: Use
settings=OpenAITTSService.Settings(voice=...)instead.model –
TTS model to use. Defaults to “gpt-4o-mini-tts”.
Deprecated since version 0.0.105: Use
settings=OpenAITTSService.Settings(model=...)instead.sample_rate – Output audio sample rate in Hz. If None, uses OpenAI’s default 24kHz.
instructions –
Optional instructions to guide voice synthesis behavior.
Deprecated since version 0.0.105: Use
settings=OpenAITTSService.Settings(instructions=...)instead.speed –
Voice speed control (0.25 to 4.0, default 1.0).
Deprecated since version 0.0.105: Use
settings=OpenAITTSService.Settings(speed=...)instead.params –
Optional synthesis controls (acting instructions, speed, …).
Deprecated since version 0.0.105: Use
settings=OpenAITTSService.Settings(...)instead.settings – Runtime-updatable settings. When provided alongside deprecated parameters,
settingsvalues take precedence.**kwargs – Additional keyword arguments passed to TTSService.
- can_generate_metrics() bool[source]
Check if this service can generate processing metrics.
- Returns:
True, as OpenAI TTS service supports metrics generation.
- async start(frame: StartFrame)[source]
Start the OpenAI TTS service.
- Parameters:
frame – The start frame containing initialization parameters.
- async run_tts(text: str, context_id: str) AsyncGenerator[Frame, None][source]
Generate speech from text using OpenAI’s TTS API.
- Parameters:
text – The text to synthesize into speech.
context_id – The context ID for tracking audio frames.
- Yields:
Frame – Audio frames containing the synthesized speech data.