base_stt

Base class for Whisper-based speech-to-text services.

This module provides common functionality for services implementing the Whisper API interface, including language mapping, metrics generation, and error handling.

class pipecat.services.whisper.base_stt.BaseWhisperSTTSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, language: Language | str | None | _NotGiven = <factory>, prompt: str | None | _NotGiven = <factory>, temperature: float | None | _NotGiven = <factory>)[source]

Bases: STTSettings

Settings for BaseWhisperSTTService.

Parameters:
  • prompt – Optional text to guide the model’s style or continue a previous segment.

  • temperature – Sampling temperature between 0 and 1.

prompt: str | None | _NotGiven
temperature: float | None | _NotGiven
pipecat.services.whisper.base_stt.language_to_whisper_language(language: Language) str | None[source]

Maps pipecat Language enum to Whisper API language codes.

Language support for Whisper API. Docs: https://platform.openai.com/docs/guides/speech-to-text#supported-languages

Parameters:

language – A Language enum value representing the input language.

Returns:

The corresponding Whisper language code, or None if not supported.

Return type:

str or None

class pipecat.services.whisper.base_stt.BaseWhisperSTTService(*, model: str | None = None, api_key: str | None = None, base_url: str | None = None, language: Language | None = None, prompt: str | None = None, temperature: float | None = None, include_prob_metrics: bool = False, push_empty_transcripts: bool = False, settings: BaseWhisperSTTSettings | None = None, ttfs_p99_latency: float | None = 1.0, **kwargs)[source]

Bases: SegmentedSTTService

Base class for Whisper-based speech-to-text services.

Provides common functionality for services implementing the Whisper API interface, including metrics generation and error handling.

Settings

alias of BaseWhisperSTTSettings

__init__(*, model: str | None = None, api_key: str | None = None, base_url: str | None = None, language: Language | None = None, prompt: str | None = None, temperature: float | None = None, include_prob_metrics: bool = False, push_empty_transcripts: bool = False, settings: BaseWhisperSTTSettings | None = None, ttfs_p99_latency: float | None = 1.0, **kwargs)[source]

Initialize the Whisper STT service.

Parameters:
  • model

    Name of the Whisper model to use.

    Deprecated since version 0.0.105: Use settings=BaseWhisperSTTService.Settings(model=...) instead.

  • api_key – Service API key. Defaults to None.

  • base_url – Service API base URL. Defaults to None.

  • language

    Language of the audio input.

    Deprecated since version 0.0.105: Use settings=BaseWhisperSTTService.Settings(language=...) instead.

  • prompt

    Optional text to guide the model’s style or continue a previous segment.

    Deprecated since version 0.0.105: Use settings=BaseWhisperSTTService.Settings(prompt=...) instead.

  • temperature

    Sampling temperature between 0 and 1.

    Deprecated since version 0.0.105: Use settings=BaseWhisperSTTService.Settings(temperature=...) instead.

  • include_prob_metrics – If True, enables probability metrics in API response. Each service implements this differently (see child classes). Defaults to False.

  • push_empty_transcripts – If true, allow empty TranscriptionFrame frames to be pushed downstream instead of discarding them. This is intended for situations where VAD fires even though the user did not speak. In these cases, it is useful to know that nothing was transcribed so that the agent can resume speaking, instead of waiting longer for a transcription. Defaults to False.

  • settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.

  • ttfs_p99_latency – P99 latency from speech end to final transcript in seconds. Override for your deployment. See https://github.com/pipecat-ai/stt-benchmark

  • **kwargs – Additional arguments passed to SegmentedSTTService.

can_generate_metrics() bool[source]

Whether this service can generate processing metrics.

Returns:

True, as this service supports metric generation.

Return type:

bool

language_to_service_language(language: Language) str | None[source]

Convert from pipecat Language to service language code.

Parameters:

language – The Language enum value to convert.

Returns:

The corresponding service language code, or None if not supported.

Return type:

str or None

async run_stt(audio: bytes) AsyncGenerator[Frame, None][source]

Transcribe audio data to text.

Parameters:

audio – Raw audio data to transcribe.

Yields:

Frame

Either a TranscriptionFrame containing the transcribed text

or an ErrorFrame if transcription fails.