stt
Google Cloud Speech-to-Text V2 service implementation for Pipecat.
This module provides a Google Cloud Speech-to-Text V2 service with streaming support, enabling real-time speech recognition with features like automatic punctuation, voice activity detection, and multi-language support.
- pipecat.services.google.stt.language_to_google_stt_language(language: Language) str | None[source]
Maps Language enum to Google Speech-to-Text V2 language codes.
- Parameters:
language – Language enum value.
- Returns:
Google STT language code or None if not supported.
- Return type:
Optional[str]
- class pipecat.services.google.stt.GoogleSTTSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, language: Language | str | None | _NotGiven = <factory>, languages: list[Language] | _NotGiven = <factory>, language_codes: list[str] | None | _NotGiven = <factory>, use_separate_recognition_per_channel: bool | _NotGiven = <factory>, enable_automatic_punctuation: bool | _NotGiven = <factory>, enable_spoken_punctuation: bool | _NotGiven = <factory>, enable_spoken_emojis: bool | _NotGiven = <factory>, profanity_filter: bool | _NotGiven = <factory>, enable_word_time_offsets: bool | _NotGiven = <factory>, enable_word_confidence: bool | _NotGiven = <factory>, enable_interim_results: bool | _NotGiven = <factory>, enable_voice_activity_events: bool | _NotGiven = <factory>)[source]
Bases:
STTSettingsSettings for GoogleSTTService.
- Parameters:
languages – List of
Languageenums for recognition (e.g.[Language.EN_US]). Preferred overlanguage_codes.language_codes –
List of Google STT language code strings (e.g.
["en-US"]).Deprecated since version 0.0.104: Use
languagesinstead. If both are provided,languagestakes precedence. This field is here just for backward compatibility with dict-based settings updates.use_separate_recognition_per_channel – Process each audio channel separately.
enable_automatic_punctuation – Add punctuation to transcripts.
enable_spoken_punctuation – Include spoken punctuation in transcript.
enable_spoken_emojis – Include spoken emojis in transcript.
profanity_filter – Filter profanity from transcript.
enable_word_time_offsets – Include timing information for each word.
enable_word_confidence – Include confidence scores for each word.
enable_interim_results – Stream partial recognition results.
enable_voice_activity_events – Detect voice activity in audio.
- language_codes: list[str] | None | _NotGiven
- use_separate_recognition_per_channel: bool | _NotGiven
- enable_automatic_punctuation: bool | _NotGiven
- enable_spoken_punctuation: bool | _NotGiven
- enable_spoken_emojis: bool | _NotGiven
- profanity_filter: bool | _NotGiven
- enable_word_time_offsets: bool | _NotGiven
- enable_word_confidence: bool | _NotGiven
- enable_interim_results: bool | _NotGiven
- enable_voice_activity_events: bool | _NotGiven
- class pipecat.services.google.stt.GoogleSTTService(*, credentials: str | None = None, credentials_path: str | None = None, location: str = 'global', sample_rate: int | None = None, params: InputParams | None = None, settings: GoogleSTTSettings | None = None, ttfs_p99_latency: float | None = 1.57, **kwargs)[source]
Bases:
STTServiceGoogle Cloud Speech-to-Text V2 service implementation.
Provides real-time speech recognition using Google Cloud’s Speech-to-Text V2 API with streaming support. Handles audio transcription and optional voice activity detection. Implements automatic stream reconnection to handle Google’s 4-minute streaming limit.
- Parameters:
InputParams – Configuration parameters for the STT service.
STREAMING_LIMIT – Google Cloud’s streaming limit in milliseconds (4 minutes).
- Raises:
ValueError – If neither credentials nor credentials_path is provided.
ValueError – If project ID is not found in credentials.
- Settings
alias of
GoogleSTTSettings
- STREAMING_LIMIT = 240000
- class InputParams(*, languages: Language | list[Language] = <factory>, model: str | None = 'latest_long', use_separate_recognition_per_channel: bool | None = False, enable_automatic_punctuation: bool | None = True, enable_spoken_punctuation: bool | None = False, enable_spoken_emojis: bool | None = False, profanity_filter: bool | None = False, enable_word_time_offsets: bool | None = False, enable_word_confidence: bool | None = False, enable_interim_results: bool | None = True, enable_voice_activity_events: bool | None = False)[source]
Bases:
BaseModelConfiguration parameters for Google Speech-to-Text.
Deprecated since version 0.0.105: Use
settings=GoogleSTTService.Settings(...)instead.- Parameters:
languages – Single language or list of recognition languages. First language is primary.
model – Speech recognition model to use.
use_separate_recognition_per_channel – Process each audio channel separately.
enable_automatic_punctuation – Add punctuation to transcripts.
enable_spoken_punctuation – Include spoken punctuation in transcript.
enable_spoken_emojis – Include spoken emojis in transcript.
profanity_filter – Filter profanity from transcript.
enable_word_time_offsets – Include timing information for each word.
enable_word_confidence – Include confidence scores for each word.
enable_interim_results – Stream partial recognition results.
enable_voice_activity_events – Detect voice activity in audio.
- model: str | None
- use_separate_recognition_per_channel: bool | None
- enable_automatic_punctuation: bool | None
- enable_spoken_punctuation: bool | None
- enable_spoken_emojis: bool | None
- profanity_filter: bool | None
- enable_word_time_offsets: bool | None
- enable_word_confidence: bool | None
- enable_interim_results: bool | None
- enable_voice_activity_events: bool | None
- __init__(*, credentials: str | None = None, credentials_path: str | None = None, location: str = 'global', sample_rate: int | None = None, params: InputParams | None = None, settings: GoogleSTTSettings | None = None, ttfs_p99_latency: float | None = 1.57, **kwargs)[source]
Initialize the Google STT service.
- Parameters:
credentials – JSON string containing Google Cloud service account credentials.
credentials_path – Path to service account credentials JSON file.
location – Google Cloud location (e.g., “global”, “us-central1”).
sample_rate – Audio sample rate in Hertz.
params –
Configuration parameters for the service.
Deprecated since version 0.0.105: Use
settings=GoogleSTTService.Settings(...)instead.settings – Runtime-updatable settings. When provided alongside deprecated
params,settingsvalues take precedence.ttfs_p99_latency – P99 latency from speech end to final transcript in seconds. Override for your deployment. See https://github.com/pipecat-ai/stt-benchmark
**kwargs – Additional arguments passed to STTService.
- can_generate_metrics() bool[source]
Check if the service can generate metrics.
- Returns:
True, as this service supports metrics generation.
- Return type:
bool
- language_to_service_language(language: Language | list[Language]) str | list[str][source]
Convert Language enum(s) to Google STT language code(s).
- Parameters:
language – Single Language enum or list of Language enums.
- Returns:
Google STT language code(s).
- Return type:
str | List[str]
- async set_languages(languages: list[Language])[source]
Update the service’s recognition languages.
Deprecated since version 0.0.104: Use
STTUpdateSettingsFramewithGoogleSTTService.Settings(languages=...)instead.- Parameters:
languages – List of languages for recognition. First language is primary.
- async start(frame: StartFrame)[source]
Start the STT service and establish connection.
- Parameters:
frame – The start frame triggering the service start.
- async stop(frame: EndFrame)[source]
Stop the STT service and clean up resources.
- Parameters:
frame – The end frame triggering the service stop.
- async cancel(frame: CancelFrame)[source]
Cancel the STT service and clean up resources.
- Parameters:
frame – The cancel frame triggering the service cancellation.
- async update_options(*, languages: list[Language] | None = None, model: str | None = None, enable_automatic_punctuation: bool | None = None, enable_spoken_punctuation: bool | None = None, enable_spoken_emojis: bool | None = None, profanity_filter: bool | None = None, enable_word_time_offsets: bool | None = None, enable_word_confidence: bool | None = None, enable_interim_results: bool | None = None, enable_voice_activity_events: bool | None = None, location: str | None = None) None[source]
Update service options dynamically.
Deprecated since version 0.0.104: Use
STTUpdateSettingsFramewithGoogleSTTService.Settings(...)instead.- Parameters:
languages – New list of recognition languages.
model – New recognition model.
enable_automatic_punctuation – Enable/disable automatic punctuation.
enable_spoken_punctuation – Enable/disable spoken punctuation.
enable_spoken_emojis – Enable/disable spoken emojis.
profanity_filter – Enable/disable profanity filter.
enable_word_time_offsets – Enable/disable word timing info.
enable_word_confidence – Enable/disable word confidence scores.
enable_interim_results – Enable/disable interim results.
enable_voice_activity_events – Enable/disable voice activity detection.
location – New Google Cloud location.
Note
Changes that affect the streaming configuration will cause the stream to be reconnected.