base_stt
Base class for Whisper-based speech-to-text services.
This module provides common functionality for services implementing the Whisper API interface, including language mapping, metrics generation, and error handling.
- class pipecat.services.whisper.base_stt.BaseWhisperSTTSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, language: Language | str | None | _NotGiven = <factory>, prompt: str | None | _NotGiven = <factory>, temperature: float | None | _NotGiven = <factory>)[source]
Bases:
STTSettingsSettings for BaseWhisperSTTService.
- Parameters:
prompt – Optional text to guide the model’s style or continue a previous segment.
temperature – Sampling temperature between 0 and 1.
- prompt: str | None | _NotGiven
- temperature: float | None | _NotGiven
- pipecat.services.whisper.base_stt.language_to_whisper_language(language: Language) str | None[source]
Maps pipecat Language enum to Whisper API language codes.
Language support for Whisper API. Docs: https://platform.openai.com/docs/guides/speech-to-text#supported-languages
- Parameters:
language – A Language enum value representing the input language.
- Returns:
The corresponding Whisper language code, or None if not supported.
- Return type:
str or None
- class pipecat.services.whisper.base_stt.BaseWhisperSTTService(*, model: str | None = None, api_key: str | None = None, base_url: str | None = None, language: Language | None = None, prompt: str | None = None, temperature: float | None = None, include_prob_metrics: bool = False, push_empty_transcripts: bool = False, settings: BaseWhisperSTTSettings | None = None, ttfs_p99_latency: float | None = 1.0, **kwargs)[source]
Bases:
SegmentedSTTServiceBase class for Whisper-based speech-to-text services.
Provides common functionality for services implementing the Whisper API interface, including metrics generation and error handling.
- Settings
alias of
BaseWhisperSTTSettings
- __init__(*, model: str | None = None, api_key: str | None = None, base_url: str | None = None, language: Language | None = None, prompt: str | None = None, temperature: float | None = None, include_prob_metrics: bool = False, push_empty_transcripts: bool = False, settings: BaseWhisperSTTSettings | None = None, ttfs_p99_latency: float | None = 1.0, **kwargs)[source]
Initialize the Whisper STT service.
- Parameters:
model –
Name of the Whisper model to use.
Deprecated since version 0.0.105: Use
settings=BaseWhisperSTTService.Settings(model=...)instead.api_key – Service API key. Defaults to None.
base_url – Service API base URL. Defaults to None.
language –
Language of the audio input.
Deprecated since version 0.0.105: Use
settings=BaseWhisperSTTService.Settings(language=...)instead.prompt –
Optional text to guide the model’s style or continue a previous segment.
Deprecated since version 0.0.105: Use
settings=BaseWhisperSTTService.Settings(prompt=...)instead.temperature –
Sampling temperature between 0 and 1.
Deprecated since version 0.0.105: Use
settings=BaseWhisperSTTService.Settings(temperature=...)instead.include_prob_metrics – If True, enables probability metrics in API response. Each service implements this differently (see child classes). Defaults to False.
push_empty_transcripts – If true, allow empty TranscriptionFrame frames to be pushed downstream instead of discarding them. This is intended for situations where VAD fires even though the user did not speak. In these cases, it is useful to know that nothing was transcribed so that the agent can resume speaking, instead of waiting longer for a transcription. Defaults to False.
settings – Runtime-updatable settings. When provided alongside deprecated parameters,
settingsvalues take precedence.ttfs_p99_latency – P99 latency from speech end to final transcript in seconds. Override for your deployment. See https://github.com/pipecat-ai/stt-benchmark
**kwargs – Additional arguments passed to SegmentedSTTService.
- can_generate_metrics() bool[source]
Whether this service can generate processing metrics.
- Returns:
True, as this service supports metric generation.
- Return type:
bool