stt

Gladia Speech-to-Text (STT) service implementation.

This module provides a Speech-to-Text service using Gladia’s real-time WebSocket API, supporting multiple languages, custom vocabulary, and various audio processing options.

pipecat.services.gladia.stt.language_to_gladia_language(language: Language) → str | None[source]

Convert a Language enum to Gladia’s language code format.

Parameters:: language – The Language enum value to convert.
Returns:: The Gladia language code string or None if not supported.

Bases: STTSettings

Settings for GladiaSTTService.

Parameters:

language_config – Language detection and handling configuration.
custom_metadata – Additional metadata to include with requests.
endpointing – Silence duration in seconds to mark end of speech.
maximum_duration_without_endpointing – Maximum utterance duration without silence.
pre_processing – Audio pre-processing options.
realtime_processing – Real-time processing features.
messages_config – WebSocket message filtering options.
enable_vad – Enable VAD to trigger end of utterance detection.

language_config: LanguageConfig | None | _NotGiven

custom_metadata: dict[str, Any] | None | _NotGiven

endpointing: float | None | _NotGiven

maximum_duration_without_endpointing: int | None | _NotGiven

pre_processing: PreProcessingConfig | None | _NotGiven

realtime_processing: RealtimeProcessingConfig | None | _NotGiven

messages_config: MessagesConfig | None | _NotGiven

enable_vad: bool | None | _NotGiven

class pipecat.services.gladia.stt.GladiaSTTService(*, api_key: str, region: Literal['us-west', 'eu-west'] | None = None, url: str = 'https://api.gladia.io/v2/live', encoding: str = 'wav/pcm', bit_depth: int = 16, channels: int = 1, sample_rate: int | None = None, model: str | None = None, params: GladiaInputParams | None = None, max_buffer_size: int = 20971520, should_interrupt: bool = True, settings: GladiaSTTSettings | None = None, ttfs_p99_latency: float | None = 1.49, **kwargs)[source]

Bases: WebsocketSTTService

Speech-to-Text service using Gladia’s API.

This service connects to Gladia’s WebSocket API for real-time transcription with support for multiple languages, custom vocabulary, and various processing options. Provides automatic reconnection, audio buffering, and comprehensive error handling.

For complete API documentation, see: https://docs.gladia.io/api-reference/v2/live/init

Settings: alias of GladiaSTTSettings

__init__(*, api_key: str, region: Literal['us-west', 'eu-west'] | None = None, url: str = 'https://api.gladia.io/v2/live', encoding: str = 'wav/pcm', bit_depth: int = 16, channels: int = 1, sample_rate: int | None = None, model: str | None = None, params: GladiaInputParams | None = None, max_buffer_size: int = 20971520, should_interrupt: bool = True, settings: GladiaSTTSettings | None = None, ttfs_p99_latency: float | None = 1.49, **kwargs)[source]

Initialize the Gladia STT service.

Parameters:

api_key – Gladia API key for authentication.
region – Region used to process audio. eu-west or us-west. Defaults to eu-west.
url – Gladia API URL. Defaults to “https://api.gladia.io/v2/live”.
encoding – Audio encoding format. Defaults to "wav/pcm".
bit_depth – Audio bit depth. Defaults to 16.
channels – Number of audio channels. Defaults to 1.
sample_rate – Audio sample rate in Hz. If None, uses service default.
model –
Model to use for transcription.

Deprecated since version 0.0.105: Use settings=GladiaSTTService.Settings(model=...) instead.
params –
Additional configuration parameters for Gladia service.

Deprecated since version 0.0.105: Use settings=GladiaSTTService.Settings(...) for runtime-updatable fields and direct init parameters for encoding/bit_depth/channels.
max_buffer_size – Maximum size of audio buffer in bytes. Defaults to 20MB.
should_interrupt – Determine whether the bot should be interrupted when Gladia VAD detects user speech. Defaults to True.
settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.
ttfs_p99_latency – P99 latency from speech end to final transcript in seconds. Override for your deployment. See https://github.com/pipecat-ai/stt-benchmark
**kwargs – Additional arguments passed to the STTService parent class.

can_generate_metrics() → bool[source]

Check if the service can generate performance metrics.

Returns:: True, indicating this service supports metrics generation.

language_to_service_language(language: Language) → str | None[source]

Convert pipecat Language enum to Gladia’s language code.

Parameters:: language – The Language enum value to convert.
Returns:: The Gladia language code string or None if not supported.

async start(frame: StartFrame)[source]

Start the Gladia STT websocket connection.

Parameters:: frame – The start frame triggering service startup.

async stop(frame: EndFrame)[source]

Stop the Gladia STT websocket connection.

Parameters:: frame – The end frame triggering service shutdown.

async cancel(frame: CancelFrame)[source]

Cancel the Gladia STT websocket connection.

Parameters:: frame – The cancel frame triggering service cancellation.

async run_stt(audio: bytes) → AsyncGenerator[Frame | None, None][source]

Run speech-to-text on audio data.

Parameters:: audio – Raw audio bytes to transcribe.
Yields:: None (processing is handled asynchronously via WebSocket).