tts

Hume Text-to-Speech service implementation.

class pipecat.services.hume.tts.HumeTTSSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, voice: str | None | _NotGiven = <factory>, language: Language | str | None | _NotGiven = <factory>, description: str | None | _NotGiven = <factory>, speed: float | None | _NotGiven = <factory>, trailing_silence: float | None | _NotGiven = <factory>)[source]

Bases: TTSSettings

Settings for HumeTTSService.

Parameters:
  • description – Natural-language acting directions (up to 100 characters).

  • speed – Speaking-rate multiplier (0.5-2.0).

  • trailing_silence – Seconds of silence to append at the end (0-5).

description: str | None | _NotGiven
speed: float | None | _NotGiven
trailing_silence: float | None | _NotGiven
class pipecat.services.hume.tts.HumeTTSService(*, api_key: str | None = None, voice_id: str | None = None, params: InputParams | None = None, sample_rate: int | None = 48000, settings: HumeTTSSettings | None = None, **kwargs)[source]

Bases: TTSService

Hume Octave Text-to-Speech service.

Streams PCM audio via Hume’s HTTP output streaming (JSON chunks) endpoint using the Python SDK and emits TTSAudioRawFrame frames suitable for Pipecat transports.

Supported features:

  • Generates speech from text using Hume TTS.

  • Streams PCM audio.

  • Supports word-level timestamps for precise audio-text synchronization.

  • Supports dynamic updates of voice and synthesis parameters at runtime.

  • Provides metrics for Time To First Byte (TTFB) and TTS usage.

Settings

alias of HumeTTSSettings

class InputParams(*, description: str | None = None, speed: float | None = None, trailing_silence: float | None = None)[source]

Bases: BaseModel

Optional synthesis parameters for Hume TTS.

Deprecated since version 0.0.105: Use settings=HumeTTSService.Settings(...) instead.

Parameters:
  • description – Natural-language acting directions (up to 100 characters).

  • speed – Speaking-rate multiplier (0.5-2.0).

  • trailing_silence – Seconds of silence to append at the end (0-5).

description: str | None
speed: float | None
trailing_silence: float | None
__init__(*, api_key: str | None = None, voice_id: str | None = None, params: InputParams | None = None, sample_rate: int | None = 48000, settings: HumeTTSSettings | None = None, **kwargs) None[source]

Initialize the HumeTTSService.

Parameters:
  • api_key – Hume API key. If omitted, reads the HUME_API_KEY environment variable.

  • voice_id

    ID of the voice to use. Only voice IDs are supported; voice names are not.

    Deprecated since version 0.0.105: Use settings=HumeTTSService.Settings(voice=...) instead.

  • params

    Optional synthesis controls (acting instructions, speed, trailing silence).

    Deprecated since version 0.0.105: Use settings=HumeTTSService.Settings(...) instead.

  • sample_rate – Output sample rate for emitted PCM frames. Defaults to 48_000 (Hume).

  • settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.

  • **kwargs – Additional arguments passed to the parent class.

can_generate_metrics() bool[source]

Can generate metrics.

Returns:

True if metrics can be generated, False otherwise.

async start(frame: StartFrame) None[source]

Start the service.

Parameters:

frame – The start frame.

async stop(frame: EndFrame) None[source]

Stop the service and cleanup resources.

Parameters:

frame – The end frame.

async cancel(frame: CancelFrame) None[source]

Cancel the service and cleanup resources.

Parameters:

frame – The cancel frame.

async push_frame(frame: Frame, direction: FrameDirection = FrameDirection.DOWNSTREAM)[source]

Push a frame and handle state changes.

Parameters:
  • frame – The frame to push.

  • direction – The direction to push the frame.

async update_setting(key: str, value: Any) None[source]

Runtime updates via key/value pair.

Deprecated since version 0.0.104: Use TTSUpdateSettingsFrame(delta=HumeTTSService.Settings(...)) instead.

Parameters:
  • key – The name of the setting to update. Recognized keys are: - “voice_id” - “description” - “speed” - “trailing_silence”

  • value – The new value for the setting.

async run_tts(text: str, context_id: str) AsyncGenerator[Frame, None][source]

Generate speech from text using Hume TTS with word timestamps.

Parameters:
  • text – The text to be synthesized.

  • context_id – Unique identifier for this TTS context.

Returns:

An async generator that yields Frame objects, including TTSStartedFrame, TTSAudioRawFrame, ErrorFrame, and TTSStoppedFrame.