stt

Smallest AI speech-to-text service implementation.

This module provides a STT service using Smallest AI’s Waves API:

SmallestSTTService: WebSocket-based real-time STT. Streams audio continuously and receives interim/final transcripts with low latency.

pipecat.services.smallest.stt.language_to_smallest_stt_language(language: Language) → str[source]

Convert a Language enum to Smallest STT language code.

Parameters:: language – The Language enum value to convert.
Returns:: The Smallest language code string.

class pipecat.services.smallest.stt.SmallestSTTModel(*values)[source]

Bases: StrEnum

Available Smallest AI STT models.

PULSE = 'pulse'

Bases: STTSettings

Settings for SmallestSTTService.

Parameters:

word_timestamps – Include word-level timestamps.
full_transcript – Include cumulative transcript.
sentence_timestamps – Include sentence-level timestamps.
redact_pii – Redact personally identifiable information.
redact_pci – Redact payment card information.
numerals – Convert spoken numerals to digits.
diarize – Enable speaker diarization.

word_timestamps: bool | _NotGiven

full_transcript: bool | _NotGiven

sentence_timestamps: bool | _NotGiven

redact_pii: bool | _NotGiven

redact_pci: bool | _NotGiven

numerals: str | _NotGiven

diarize: bool | _NotGiven

class pipecat.services.smallest.stt.SmallestSTTService(*, api_key: str, base_url: str = 'wss://api.smallest.ai', encoding: str = 'linear16', sample_rate: int | None = None, settings: SmallestSTTSettings | None = None, ttfs_p99_latency: float | None = 1.0, **kwargs)[source]

Bases: WebsocketSTTService

Smallest AI real-time speech-to-text service using the Pulse WebSocket API.

Streams audio continuously over a WebSocket connection and receives interim and final transcription results with low latency. Best suited for real-time voice applications where immediate feedback is needed.

Uses Pipecat’s VAD to detect when the user stops speaking and sends a finalize message to flush the final transcript.

Example:

stt = SmallestSTTService(
    api_key="your-api-key",
    settings=SmallestSTTService.Settings(
        language=Language.EN,
        word_timestamps=True,
    ),
)

Settings: alias of SmallestSTTSettings

__init__(*, api_key: str, base_url: str = 'wss://api.smallest.ai', encoding: str = 'linear16', sample_rate: int | None = None, settings: SmallestSTTSettings | None = None, ttfs_p99_latency: float | None = 1.0, **kwargs)[source]

Initialize the Smallest AI STT service.

Parameters:

api_key – Smallest AI API key for authentication.
base_url – Base WebSocket URL for the Smallest API.
encoding – Audio encoding format. Defaults to “linear16”.
sample_rate – Audio sample rate in Hz. If None, uses the pipeline’s rate.
settings – Runtime-updatable settings for the STT service.
ttfs_p99_latency – P99 latency from speech end to final transcript in seconds.
**kwargs – Additional arguments passed to WebsocketSTTService.

can_generate_metrics() → bool[source]: Check if this service can generate processing metrics.

language_to_service_language(language: Language) → str | None[source]

Convert a Language enum to Smallest service language format.

Parameters:: language – The language to convert.
Returns:: The Smallest-specific language code, or None if not supported.

async start(frame: StartFrame)[source]: Start the service and connect to the WebSocket.

async stop(frame: EndFrame)[source]: Stop the service and disconnect from the WebSocket.

async cancel(frame: CancelFrame)[source]: Cancel the service and disconnect from the WebSocket.

async process_frame(frame: Frame, direction: FrameDirection)[source]: Process frames, handling VAD events for finalization.

async run_stt(audio: bytes) → AsyncGenerator[Frame | None, None][source]

Send audio to the Smallest Pulse WebSocket for transcription.

Parameters:: audio – Raw PCM audio bytes.
Yields:: None – transcription results arrive via WebSocket messages.