stt
Gladia Speech-to-Text (STT) service implementation.
This module provides a Speech-to-Text service using Gladia’s real-time WebSocket API, supporting multiple languages, custom vocabulary, and various audio processing options.
- pipecat.services.gladia.stt.language_to_gladia_language(language: Language) str | None[source]
Convert a Language enum to Gladia’s language code format.
- Parameters:
language – The Language enum value to convert.
- Returns:
The Gladia language code string or None if not supported.
- class pipecat.services.gladia.stt.GladiaSTTSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, language: Language | str | None | _NotGiven = <factory>, language_config: LanguageConfig | None | _NotGiven = <factory>, custom_metadata: dict[str, ~typing.Any] | None | ~pipecat.services.settings._NotGiven=<factory>, endpointing: float | None | _NotGiven = <factory>, maximum_duration_without_endpointing: int | None | _NotGiven = <factory>, pre_processing: PreProcessingConfig | None | _NotGiven = <factory>, realtime_processing: RealtimeProcessingConfig | None | _NotGiven = <factory>, messages_config: MessagesConfig | None | _NotGiven = <factory>, enable_vad: bool | None | _NotGiven = <factory>)[source]
Bases:
STTSettingsSettings for GladiaSTTService.
- Parameters:
language_config – Language detection and handling configuration.
custom_metadata – Additional metadata to include with requests.
endpointing – Silence duration in seconds to mark end of speech.
maximum_duration_without_endpointing – Maximum utterance duration without silence.
pre_processing – Audio pre-processing options.
realtime_processing – Real-time processing features.
messages_config – WebSocket message filtering options.
enable_vad – Enable VAD to trigger end of utterance detection.
- language_config: LanguageConfig | None | _NotGiven
- custom_metadata: dict[str, Any] | None | _NotGiven
- endpointing: float | None | _NotGiven
- maximum_duration_without_endpointing: int | None | _NotGiven
- pre_processing: PreProcessingConfig | None | _NotGiven
- realtime_processing: RealtimeProcessingConfig | None | _NotGiven
- messages_config: MessagesConfig | None | _NotGiven
- enable_vad: bool | None | _NotGiven
- class pipecat.services.gladia.stt.GladiaSTTService(*, api_key: str, region: Literal['us-west', 'eu-west'] | None = None, url: str = 'https://api.gladia.io/v2/live', encoding: str = 'wav/pcm', bit_depth: int = 16, channels: int = 1, sample_rate: int | None = None, model: str | None = None, params: GladiaInputParams | None = None, max_buffer_size: int = 20971520, should_interrupt: bool = True, settings: GladiaSTTSettings | None = None, ttfs_p99_latency: float | None = 1.49, **kwargs)[source]
Bases:
WebsocketSTTServiceSpeech-to-Text service using Gladia’s API.
This service connects to Gladia’s WebSocket API for real-time transcription with support for multiple languages, custom vocabulary, and various processing options. Provides automatic reconnection, audio buffering, and comprehensive error handling.
For complete API documentation, see: https://docs.gladia.io/api-reference/v2/live/init
- Settings
alias of
GladiaSTTSettings
- __init__(*, api_key: str, region: Literal['us-west', 'eu-west'] | None = None, url: str = 'https://api.gladia.io/v2/live', encoding: str = 'wav/pcm', bit_depth: int = 16, channels: int = 1, sample_rate: int | None = None, model: str | None = None, params: GladiaInputParams | None = None, max_buffer_size: int = 20971520, should_interrupt: bool = True, settings: GladiaSTTSettings | None = None, ttfs_p99_latency: float | None = 1.49, **kwargs)[source]
Initialize the Gladia STT service.
- Parameters:
api_key – Gladia API key for authentication.
region – Region used to process audio. eu-west or us-west. Defaults to eu-west.
url – Gladia API URL. Defaults to “https://api.gladia.io/v2/live”.
encoding – Audio encoding format. Defaults to
"wav/pcm".bit_depth – Audio bit depth. Defaults to 16.
channels – Number of audio channels. Defaults to 1.
sample_rate – Audio sample rate in Hz. If None, uses service default.
model –
Model to use for transcription.
Deprecated since version 0.0.105: Use
settings=GladiaSTTService.Settings(model=...)instead.params –
Additional configuration parameters for Gladia service.
Deprecated since version 0.0.105: Use
settings=GladiaSTTService.Settings(...)for runtime-updatable fields and direct init parameters for encoding/bit_depth/channels.max_buffer_size – Maximum size of audio buffer in bytes. Defaults to 20MB.
should_interrupt – Determine whether the bot should be interrupted when Gladia VAD detects user speech. Defaults to True.
settings – Runtime-updatable settings. When provided alongside deprecated parameters,
settingsvalues take precedence.ttfs_p99_latency – P99 latency from speech end to final transcript in seconds. Override for your deployment. See https://github.com/pipecat-ai/stt-benchmark
**kwargs – Additional arguments passed to the STTService parent class.
- can_generate_metrics() bool[source]
Check if the service can generate performance metrics.
- Returns:
True, indicating this service supports metrics generation.
- language_to_service_language(language: Language) str | None[source]
Convert pipecat Language enum to Gladia’s language code.
- Parameters:
language – The Language enum value to convert.
- Returns:
The Gladia language code string or None if not supported.
- async start(frame: StartFrame)[source]
Start the Gladia STT websocket connection.
- Parameters:
frame – The start frame triggering service startup.
- async stop(frame: EndFrame)[source]
Stop the Gladia STT websocket connection.
- Parameters:
frame – The end frame triggering service shutdown.
- async cancel(frame: CancelFrame)[source]
Cancel the Gladia STT websocket connection.
- Parameters:
frame – The cancel frame triggering service cancellation.