stt
Smallest AI speech-to-text service implementation.
This module provides a STT service using Smallest AI’s Waves API:
SmallestSTTService: WebSocket-based real-time STT. Streams audio continuously and receives interim/final transcripts with low latency.
- pipecat.services.smallest.stt.language_to_smallest_stt_language(language: Language) str[source]
Convert a Language enum to Smallest STT language code.
- Parameters:
language – The Language enum value to convert.
- Returns:
The Smallest language code string.
- class pipecat.services.smallest.stt.SmallestSTTModel(*values)[source]
Bases:
StrEnumAvailable Smallest AI STT models.
- PULSE = 'pulse'
- class pipecat.services.smallest.stt.SmallestSTTSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, language: Language | str | None | _NotGiven = <factory>, word_timestamps: bool | _NotGiven = <factory>, full_transcript: bool | _NotGiven = <factory>, sentence_timestamps: bool | _NotGiven = <factory>, redact_pii: bool | _NotGiven = <factory>, redact_pci: bool | _NotGiven = <factory>, numerals: str | _NotGiven = <factory>, diarize: bool | _NotGiven = <factory>)[source]
Bases:
STTSettingsSettings for SmallestSTTService.
- Parameters:
word_timestamps – Include word-level timestamps.
full_transcript – Include cumulative transcript.
sentence_timestamps – Include sentence-level timestamps.
redact_pii – Redact personally identifiable information.
redact_pci – Redact payment card information.
numerals – Convert spoken numerals to digits.
diarize – Enable speaker diarization.
- word_timestamps: bool | _NotGiven
- full_transcript: bool | _NotGiven
- sentence_timestamps: bool | _NotGiven
- redact_pii: bool | _NotGiven
- redact_pci: bool | _NotGiven
- numerals: str | _NotGiven
- diarize: bool | _NotGiven
- class pipecat.services.smallest.stt.SmallestSTTService(*, api_key: str, base_url: str = 'wss://api.smallest.ai', encoding: str = 'linear16', sample_rate: int | None = None, settings: SmallestSTTSettings | None = None, ttfs_p99_latency: float | None = 1.0, **kwargs)[source]
Bases:
WebsocketSTTServiceSmallest AI real-time speech-to-text service using the Pulse WebSocket API.
Streams audio continuously over a WebSocket connection and receives interim and final transcription results with low latency. Best suited for real-time voice applications where immediate feedback is needed.
Uses Pipecat’s VAD to detect when the user stops speaking and sends a finalize message to flush the final transcript.
Example:
stt = SmallestSTTService( api_key="your-api-key", settings=SmallestSTTService.Settings( language=Language.EN, word_timestamps=True, ), )
- Settings
alias of
SmallestSTTSettings
- __init__(*, api_key: str, base_url: str = 'wss://api.smallest.ai', encoding: str = 'linear16', sample_rate: int | None = None, settings: SmallestSTTSettings | None = None, ttfs_p99_latency: float | None = 1.0, **kwargs)[source]
Initialize the Smallest AI STT service.
- Parameters:
api_key – Smallest AI API key for authentication.
base_url – Base WebSocket URL for the Smallest API.
encoding – Audio encoding format. Defaults to “linear16”.
sample_rate – Audio sample rate in Hz. If None, uses the pipeline’s rate.
settings – Runtime-updatable settings for the STT service.
ttfs_p99_latency – P99 latency from speech end to final transcript in seconds.
**kwargs – Additional arguments passed to WebsocketSTTService.
- language_to_service_language(language: Language) str | None[source]
Convert a Language enum to Smallest service language format.
- Parameters:
language – The language to convert.
- Returns:
The Smallest-specific language code, or None if not supported.
- async start(frame: StartFrame)[source]
Start the service and connect to the WebSocket.
- async cancel(frame: CancelFrame)[source]
Cancel the service and disconnect from the WebSocket.
- async process_frame(frame: Frame, direction: FrameDirection)[source]
Process frames, handling VAD events for finalization.