tts
Hume Text-to-Speech service implementation.
- class pipecat.services.hume.tts.HumeTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>, description: str | None | _NotGiven = <factory>, speed: float | None | _NotGiven = <factory>, trailing_silence: float | None | _NotGiven = <factory>)[source]
Bases:
TTSSettingsSettings for HumeTTSService.
- Parameters:
description – Natural-language acting directions (up to 100 characters).
speed – Speaking-rate multiplier (0.5-2.0).
trailing_silence – Seconds of silence to append at the end (0-5).
- description: str | None | _NotGiven
- speed: float | None | _NotGiven
- trailing_silence: float | None | _NotGiven
- class pipecat.services.hume.tts.HumeTTSService(*, api_key: str | None = None, voice_id: str | None = None, params: InputParams | None = None, sample_rate: int | None = 48000, settings: HumeTTSSettings | None = None, **kwargs)[source]
Bases:
TTSServiceHume Octave Text-to-Speech service.
Streams PCM audio via Hume’s HTTP output streaming (JSON chunks) endpoint using the Python SDK and emits
TTSAudioRawFrameframes suitable for Pipecat transports.Supported features:
Generates speech from text using Hume TTS.
Streams PCM audio.
Supports word-level timestamps for precise audio-text synchronization.
Supports dynamic updates of voice and synthesis parameters at runtime.
Provides metrics for Time To First Byte (TTFB) and TTS usage.
- Settings
alias of
HumeTTSSettings
- class InputParams(*, description: str | None = None, speed: float | None = None, trailing_silence: float | None = None)[source]
Bases:
BaseModelOptional synthesis parameters for Hume TTS.
Deprecated since version 0.0.105: Use
settings=HumeTTSService.Settings(...)instead.- Parameters:
description – Natural-language acting directions (up to 100 characters).
speed – Speaking-rate multiplier (0.5-2.0).
trailing_silence – Seconds of silence to append at the end (0-5).
- description: str | None
- speed: float | None
- trailing_silence: float | None
- __init__(*, api_key: str | None = None, voice_id: str | None = None, params: InputParams | None = None, sample_rate: int | None = 48000, settings: HumeTTSSettings | None = None, **kwargs) None[source]
Initialize the HumeTTSService.
- Parameters:
api_key – Hume API key. If omitted, reads the
HUME_API_KEYenvironment variable.voice_id –
ID of the voice to use. Only voice IDs are supported; voice names are not.
Deprecated since version 0.0.105: Use
settings=HumeTTSService.Settings(voice=...)instead.params –
Optional synthesis controls (acting instructions, speed, trailing silence).
Deprecated since version 0.0.105: Use
settings=HumeTTSService.Settings(...)instead.sample_rate – Output sample rate for emitted PCM frames. Defaults to 48_000 (Hume).
settings – Runtime-updatable settings. When provided alongside deprecated parameters,
settingsvalues take precedence.**kwargs – Additional arguments passed to the parent class.
- can_generate_metrics() bool[source]
Can generate metrics.
- Returns:
True if metrics can be generated, False otherwise.
- async start(frame: StartFrame) None[source]
Start the service.
- Parameters:
frame – The start frame.
- async stop(frame: EndFrame) None[source]
Stop the service and cleanup resources.
- Parameters:
frame – The end frame.
- async cancel(frame: CancelFrame) None[source]
Cancel the service and cleanup resources.
- Parameters:
frame – The cancel frame.
- async push_frame(frame: Frame, direction: FrameDirection = FrameDirection.DOWNSTREAM)[source]
Push a frame and handle state changes.
- Parameters:
frame – The frame to push.
direction – The direction to push the frame.
- async update_setting(key: str, value: Any) None[source]
Runtime updates via key/value pair.
Deprecated since version 0.0.104: Use
TTSUpdateSettingsFrame(delta=HumeTTSService.Settings(...))instead.- Parameters:
key – The name of the setting to update. Recognized keys are: - “voice_id” - “description” - “speed” - “trailing_silence”
value – The new value for the setting.
- async run_tts(text: str, context_id: str) AsyncGenerator[Frame, None][source]
Generate speech from text using Hume TTS with word timestamps.
- Parameters:
text – The text to be synthesized.
context_id – Unique identifier for this TTS context.
- Returns:
An async generator that yields Frame objects, including TTSStartedFrame, TTSAudioRawFrame, ErrorFrame, and TTSStoppedFrame.