stt

AWS Transcribe Speech-to-Text service implementation.

This module provides a WebSocket-based connection to AWS Transcribe for real-time speech-to-text transcription with support for multiple languages and audio formats.

class pipecat.services.aws.stt.AWSTranscribeSTTSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, language: Language | str | None | _NotGiven = <factory>)[source]

Bases: STTSettings

Settings for AWSTranscribeSTTService.

class pipecat.services.aws.stt.AWSTranscribeSTTService(*, api_key: str | None = None, aws_access_key_id: str | None = None, aws_session_token: str | None = None, region: str | None = None, sample_rate: int | None = None, language: Language | None = None, settings: AWSTranscribeSTTSettings | None = None, ttfs_p99_latency: float | None = 1.9, **kwargs)[source]

Bases: WebsocketSTTService

AWS Transcribe Speech-to-Text service using WebSocket streaming.

Provides real-time speech transcription using AWS Transcribe’s streaming API. Supports multiple languages, configurable sample rates, and both interim and final transcription results.

Settings

alias of AWSTranscribeSTTSettings

__init__(*, api_key: str | None = None, aws_access_key_id: str | None = None, aws_session_token: str | None = None, region: str | None = None, sample_rate: int | None = None, language: Language | None = None, settings: AWSTranscribeSTTSettings | None = None, ttfs_p99_latency: float | None = 1.9, **kwargs)[source]

Initialize the AWS Transcribe STT service.

Parameters:
  • api_key – AWS secret access key. If None, uses AWS_SECRET_ACCESS_KEY environment variable.

  • aws_access_key_id – AWS access key ID. If None, uses AWS_ACCESS_KEY_ID environment variable.

  • aws_session_token – AWS session token for temporary credentials. If None, uses AWS_SESSION_TOKEN environment variable.

  • region – AWS region for the service.

  • sample_rate – Audio sample rate in Hz. If None, uses the pipeline sample rate. AWS Transcribe only supports 8000 or 16000 Hz; other values are clamped to 16000 Hz at connect time.

  • language

    Language for transcription.

    Deprecated since version 0.0.105: Use settings=AWSTranscribeSTTService.Settings(language=...) instead.

  • settings – Runtime-updatable settings. When provided alongside deprecated parameters, settings values take precedence.

  • ttfs_p99_latency – P99 latency from speech end to final transcript in seconds. Override for your deployment. See https://github.com/pipecat-ai/stt-benchmark

  • **kwargs – Additional arguments passed to parent STTService class.

can_generate_metrics() bool[source]

Check if this service can generate processing metrics.

Returns:

True, as AWS Transcribe STT supports metrics generation.

get_service_encoding(encoding: str) str[source]

Convert internal encoding format to AWS Transcribe format.

Parameters:

encoding – Internal encoding format string.

Returns:

AWS Transcribe compatible encoding format.

async start(frame: StartFrame)[source]

Initialize the connection when the service starts.

Parameters:

frame – Start frame signaling service initialization.

async stop(frame: EndFrame)[source]

Stop the service and disconnect from AWS Transcribe.

Parameters:

frame – End frame signaling service shutdown.

async cancel(frame: CancelFrame)[source]

Cancel the service and disconnect from AWS Transcribe.

Parameters:

frame – Cancel frame signaling service cancellation.

async run_stt(audio: bytes) AsyncGenerator[Frame | None, None][source]

Process audio data and send to AWS Transcribe.

Parameters:

audio – Raw audio bytes to transcribe.

Yields:

ErrorFrame – If processing fails or connection issues occur.

language_to_service_language(language: Language) str | None[source]

Convert internal language enum to AWS Transcribe language code.

Source: https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html All language codes that support streaming are included.

Parameters:

language – Internal language enumeration value.

Returns:

AWS Transcribe compatible language code, or None if unsupported.