tts

Hume Text-to-Speech service implementation.

Bases: TTSSettings

Settings for HumeTTSService.

Parameters:

description – Natural-language acting directions (up to 100 characters).
speed – Speaking-rate multiplier (0.5-2.0).
trailing_silence – Seconds of silence to append at the end (0-5).

description: str | None | _NotGiven

speed: float | None | _NotGiven

trailing_silence: float | None | _NotGiven

Bases: TTSService

Hume Octave Text-to-Speech service.

Streams PCM audio via Hume’s HTTP output streaming (JSON chunks) endpoint using the Python SDK and emits TTSAudioRawFrame frames suitable for Pipecat transports.

Supported features:

Generates speech from text using Hume TTS.
Streams PCM audio.
Supports word-level timestamps for precise audio-text synchronization.
Supports dynamic updates of voice and synthesis parameters at runtime.
Provides metrics for Time To First Byte (TTFB) and TTS usage.

Settings: alias of HumeTTSSettings

class InputParams(*, description: str | None = None, speed: float | None = None, trailing_silence: float | None = None)[source]

Bases: BaseModel

Optional synthesis parameters for Hume TTS.

Deprecated since version 0.0.105: Use settings=HumeTTSService.Settings(...) instead.

Parameters:

description – Natural-language acting directions (up to 100 characters).
speed – Speaking-rate multiplier (0.5-2.0).
trailing_silence – Seconds of silence to append at the end (0-5).

description: str | None

speed: float | None

trailing_silence: float | None

Initialize the HumeTTSService.

Parameters:

api_key – Hume API key. If omitted, reads the HUME_API_KEY environment variable.
voice_id –
ID of the voice to use. Only voice IDs are supported; voice names are not.

Deprecated since version 0.0.105: Use settings=HumeTTSService.Settings(voice=...) instead.
params –
Optional synthesis controls (acting instructions, speed, trailing silence).

Deprecated since version 0.0.105: Use settings=HumeTTSService.Settings(...) instead.
sample_rate – Output sample rate for emitted PCM frames. Defaults to 48_000 (Hume).
settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.
**kwargs – Additional arguments passed to the parent class.

can_generate_metrics() → bool[source]

Can generate metrics.

Returns:: True if metrics can be generated, False otherwise.

async start(frame: StartFrame) → None[source]

Start the service.

Parameters:: frame – The start frame.

async stop(frame: EndFrame) → None[source]

Stop the service and cleanup resources.

Parameters:: frame – The end frame.

async cancel(frame: CancelFrame) → None[source]

Cancel the service and cleanup resources.

Parameters:: frame – The cancel frame.

async push_frame(frame: Frame, direction: FrameDirection = FrameDirection.DOWNSTREAM)[source]

Push a frame and handle state changes.

Parameters:

frame – The frame to push.
direction – The direction to push the frame.

async update_setting(key: str, value: Any) → None[source]

Runtime updates via key/value pair.

Deprecated since version 0.0.104: Use TTSUpdateSettingsFrame(delta=HumeTTSService.Settings(...)) instead.

Parameters:

key – The name of the setting to update. Recognized keys are: - “voice_id” - “description” - “speed” - “trailing_silence”
value – The new value for the setting.

async run_tts(text: str, context_id: str) → AsyncGenerator[Frame, None][source]

Generate speech from text using Hume TTS with word timestamps.

Parameters:

text – The text to be synthesized.
context_id – Unique identifier for this TTS context.

Returns:

An async generator that yields Frame objects, including TTSStartedFrame, TTSAudioRawFrame, ErrorFrame, and TTSStoppedFrame.