stt
Fal speech-to-text service implementation.
This module provides integration with Fal’s Wizper API for speech-to-text transcription using segmented audio processing.
- pipecat.services.fal.stt.language_to_fal_language(language: Language) str | None[source]
Convert a Language enum to Fal’s Wizper language code.
- Parameters:
language – The Language enum value to convert.
- Returns:
The corresponding Fal Wizper language code, or None if not supported.
- class pipecat.services.fal.stt.FalSTTSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, language: Language | str | None | _NotGiven = <factory>)[source]
Bases:
STTSettingsSettings for FalSTTService.
- class pipecat.services.fal.stt.FalSTTService(*, api_key: str | None = None, aiohttp_session: ClientSession | None = None, task: str = 'transcribe', chunk_level: str = 'segment', version: str = '3', sample_rate: int | None = None, params: InputParams | None = None, settings: FalSTTSettings | None = None, ttfs_p99_latency: float | None = 2.07, **kwargs)[source]
Bases:
SegmentedSTTServiceSpeech-to-text service using Fal’s Wizper API.
This service uses Fal’s Wizper API to perform speech-to-text transcription on audio segments. It inherits from SegmentedSTTService to handle audio buffering and speech detection.
- Settings
alias of
FalSTTSettings
- class InputParams(*, language: Language | None = Language.EN, task: str = 'transcribe', chunk_level: str = 'segment', version: str = '3')[source]
Bases:
BaseModelConfiguration parameters for Fal’s Wizper API.
Deprecated since version 0.0.105: Use
settings=FalSTTService.Settings(...)instead.- Parameters:
language – Language of the audio input. Defaults to English.
task – Task to perform (‘transcribe’ or ‘translate’). Defaults to ‘transcribe’.
chunk_level – Level of chunking (‘segment’). Defaults to ‘segment’.
version – Version of Wizper model to use. Defaults to ‘3’.
- task: str
- chunk_level: str
- version: str
- __init__(*, api_key: str | None = None, aiohttp_session: ClientSession | None = None, task: str = 'transcribe', chunk_level: str = 'segment', version: str = '3', sample_rate: int | None = None, params: InputParams | None = None, settings: FalSTTSettings | None = None, ttfs_p99_latency: float | None = 2.07, **kwargs)[source]
Initialize the FalSTTService with API key and parameters.
- Parameters:
api_key – Fal API key. If not provided, will check FAL_KEY environment variable.
aiohttp_session – Optional aiohttp ClientSession for HTTP requests. If not provided, a session will be created and managed internally.
task – Task to perform (
"transcribe"or"translate"). Defaults to"transcribe".chunk_level – Level of chunking (
"segment"). Defaults to"segment".version – Version of Wizper model to use. Defaults to
"3".sample_rate – Audio sample rate in Hz. If not provided, uses the pipeline’s rate.
params –
Configuration parameters for the Wizper API.
Deprecated since version 0.0.105: Use
settings=FalSTTService.Settings(...)for model/language and direct init parameters for task/chunk_level/version instead.settings – Runtime-updatable settings. When provided alongside deprecated parameters,
settingsvalues take precedence.ttfs_p99_latency – P99 latency from speech end to final transcript in seconds. Override for your deployment. See https://github.com/pipecat-ai/stt-benchmark
**kwargs – Additional arguments passed to SegmentedSTTService.
- can_generate_metrics() bool[source]
Check if the service can generate processing metrics.
- Returns:
True, as Fal STT service supports metrics generation.
- language_to_service_language(language: Language) str | None[source]
Convert a Language enum to Fal’s service-specific language code.
- Parameters:
language – The language to convert.
- Returns:
The Fal-specific language code, or None if not supported.
- async run_stt(audio: bytes) AsyncGenerator[Frame, None][source]
Transcribes an audio segment using Fal’s Wizper API.
- Parameters:
audio – Raw audio bytes in WAV format (already converted by base class).
- Yields:
Frame – TranscriptionFrame containing the transcribed text, or ErrorFrame on failure.
Note
The audio is already in WAV format from the SegmentedSTTService. Only non-empty transcriptions are yielded.