stt

Fal speech-to-text service implementation.

This module provides integration with Fal’s Wizper API for speech-to-text transcription using segmented audio processing.

pipecat.services.fal.stt.language_to_fal_language(language: Language) str | None[source]

Convert a Language enum to Fal’s Wizper language code.

Parameters:

language – The Language enum value to convert.

Returns:

The corresponding Fal Wizper language code, or None if not supported.

class pipecat.services.fal.stt.FalSTTSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, language: Language | str | None | _NotGiven = <factory>)[source]

Bases: STTSettings

Settings for FalSTTService.

class pipecat.services.fal.stt.FalSTTService(*, api_key: str | None = None, aiohttp_session: ClientSession | None = None, task: str = 'transcribe', chunk_level: str = 'segment', version: str = '3', sample_rate: int | None = None, params: InputParams | None = None, settings: FalSTTSettings | None = None, ttfs_p99_latency: float | None = 2.07, **kwargs)[source]

Bases: SegmentedSTTService

Speech-to-text service using Fal’s Wizper API.

This service uses Fal’s Wizper API to perform speech-to-text transcription on audio segments. It inherits from SegmentedSTTService to handle audio buffering and speech detection.

Settings

alias of FalSTTSettings

class InputParams(*, language: Language | None = Language.EN, task: str = 'transcribe', chunk_level: str = 'segment', version: str = '3')[source]

Bases: BaseModel

Configuration parameters for Fal’s Wizper API.

Deprecated since version 0.0.105: Use settings=FalSTTService.Settings(...) instead.

Parameters:
  • language – Language of the audio input. Defaults to English.

  • task – Task to perform (‘transcribe’ or ‘translate’). Defaults to ‘transcribe’.

  • chunk_level – Level of chunking (‘segment’). Defaults to ‘segment’.

  • version – Version of Wizper model to use. Defaults to ‘3’.

language: Language | None
task: str
chunk_level: str
version: str
__init__(*, api_key: str | None = None, aiohttp_session: ClientSession | None = None, task: str = 'transcribe', chunk_level: str = 'segment', version: str = '3', sample_rate: int | None = None, params: InputParams | None = None, settings: FalSTTSettings | None = None, ttfs_p99_latency: float | None = 2.07, **kwargs)[source]

Initialize the FalSTTService with API key and parameters.

Parameters:
  • api_key – Fal API key. If not provided, will check FAL_KEY environment variable.

  • aiohttp_session – Optional aiohttp ClientSession for HTTP requests. If not provided, a session will be created and managed internally.

  • task – Task to perform ("transcribe" or "translate"). Defaults to "transcribe".

  • chunk_level – Level of chunking ("segment"). Defaults to "segment".

  • version – Version of Wizper model to use. Defaults to "3".

  • sample_rate – Audio sample rate in Hz. If not provided, uses the pipeline’s rate.

  • params

    Configuration parameters for the Wizper API.

    Deprecated since version 0.0.105: Use settings=FalSTTService.Settings(...) for model/language and direct init parameters for task/chunk_level/version instead.

  • settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.

  • ttfs_p99_latency – P99 latency from speech end to final transcript in seconds. Override for your deployment. See https://github.com/pipecat-ai/stt-benchmark

  • **kwargs – Additional arguments passed to SegmentedSTTService.

can_generate_metrics() bool[source]

Check if the service can generate processing metrics.

Returns:

True, as Fal STT service supports metrics generation.

language_to_service_language(language: Language) str | None[source]

Convert a Language enum to Fal’s service-specific language code.

Parameters:

language – The language to convert.

Returns:

The Fal-specific language code, or None if not supported.

async run_stt(audio: bytes) AsyncGenerator[Frame, None][source]

Transcribes an audio segment using Fal’s Wizper API.

Parameters:

audio – Raw audio bytes in WAV format (already converted by base class).

Yields:

Frame – TranscriptionFrame containing the transcribed text, or ErrorFrame on failure.

Note

The audio is already in WAV format from the SegmentedSTTService. Only non-empty transcriptions are yielded.