stt
Azure Speech-to-Text service implementation for Pipecat.
This module provides speech-to-text functionality using Azure Cognitive Services Speech SDK for real-time audio transcription.
- class pipecat.services.azure.stt.AzureSTTSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, Any]=<factory>, language: Language | str | None | _NotGiven = <factory>)[source]
Bases:
STTSettingsSettings for AzureSTTService.
- class pipecat.services.azure.stt.AzureSTTService(*, api_key: str, region: str | None = None, language: Language | None = Language.EN_US, sample_rate: int | None = None, private_endpoint: str | None = None, endpoint_id: str | None = None, settings: AzureSTTSettings | None = None, ttfs_p99_latency: float | None = 1.8, **kwargs)[source]
Bases:
STTServiceAzure Speech-to-Text service for real-time audio transcription.
This service uses Azure Cognitive Services Speech SDK to convert speech audio into text transcriptions. It supports continuous recognition and provides real-time transcription results with timing information.
- Settings
alias of
AzureSTTSettings
- __init__(*, api_key: str, region: str | None = None, language: Language | None = Language.EN_US, sample_rate: int | None = None, private_endpoint: str | None = None, endpoint_id: str | None = None, settings: AzureSTTSettings | None = None, ttfs_p99_latency: float | None = 1.8, **kwargs)[source]
Initialize the Azure STT service.
- Parameters:
api_key – Azure Cognitive Services subscription key.
region – Azure region for the Speech service (e.g., ‘eastus’). Required unless
private_endpointis provided.language –
Language for speech recognition. Defaults to English (US).
Deprecated since version 0.0.105: Use
settings=AzureSTTService.Settings(language=...)instead.sample_rate – Audio sample rate in Hz. If None, uses service default.
private_endpoint – Private endpoint for STT behind firewall. See https://learn.microsoft.com/en-us/azure/ai-services/speech-service/speech-services-private-link?tabs=portal
endpoint_id – Custom model endpoint id.
settings – Runtime-updatable settings. When provided alongside deprecated parameters,
settingsvalues take precedence.ttfs_p99_latency – P99 latency from speech end to final transcript in seconds. Override for your deployment. See https://github.com/pipecat-ai/stt-benchmark
**kwargs – Additional arguments passed to parent STTService.
- can_generate_metrics() bool[source]
Check if this service can generate performance metrics.
- Returns:
True as this service supports metrics generation.
- language_to_service_language(language: Language) str | None[source]
Convert a Language enum to Azure service-specific language code.
- Parameters:
language – The language to convert.
- Returns:
The Azure-specific language identifier, or None if not supported.
- async run_stt(audio: bytes) AsyncGenerator[Frame | None, None][source]
Process audio data for speech-to-text conversion.
Feeds audio data to the Azure speech recognizer for processing. Recognition results are handled asynchronously through callbacks.
- Parameters:
audio – Raw audio bytes to process.
- Yields:
Frame – Either None for successful processing or ErrorFrame on failure.
- async start(frame: StartFrame)[source]
Start the speech recognition service.
- Parameters:
frame – Frame indicating the start of processing.
- async stop(frame: EndFrame)[source]
Stop the speech recognition service.
- Parameters:
frame – Frame indicating the end of processing.
- async cancel(frame: CancelFrame)[source]
Cancel the speech recognition service.
- Parameters:
frame – Frame indicating cancellation.