stt

Smallest AI speech-to-text service implementation.

This module provides a STT service using Smallest AI’s Waves API:

  • SmallestSTTService: WebSocket-based real-time STT. Streams audio continuously and receives interim/final transcripts with low latency.

pipecat.services.smallest.stt.language_to_smallest_stt_language(language: Language) str[source]

Convert a Language enum to Smallest STT language code.

Parameters:

language – The Language enum value to convert.

Returns:

The Smallest language code string.

class pipecat.services.smallest.stt.SmallestSTTModel(*values)[source]

Bases: StrEnum

Available Smallest AI STT models.

PULSE = 'pulse'
class pipecat.services.smallest.stt.SmallestSTTSettings(model: str | None | _NotGiven = <factory>, extra: dict[str, ~typing.Any]=<factory>, language: Language | str | None | _NotGiven = <factory>, word_timestamps: bool | _NotGiven = <factory>, full_transcript: bool | _NotGiven = <factory>, sentence_timestamps: bool | _NotGiven = <factory>, redact_pii: bool | _NotGiven = <factory>, redact_pci: bool | _NotGiven = <factory>, numerals: str | _NotGiven = <factory>, diarize: bool | _NotGiven = <factory>)[source]

Bases: STTSettings

Settings for SmallestSTTService.

Parameters:
  • word_timestamps – Include word-level timestamps.

  • full_transcript – Include cumulative transcript.

  • sentence_timestamps – Include sentence-level timestamps.

  • redact_pii – Redact personally identifiable information.

  • redact_pci – Redact payment card information.

  • numerals – Convert spoken numerals to digits.

  • diarize – Enable speaker diarization.

word_timestamps: bool | _NotGiven
full_transcript: bool | _NotGiven
sentence_timestamps: bool | _NotGiven
redact_pii: bool | _NotGiven
redact_pci: bool | _NotGiven
numerals: str | _NotGiven
diarize: bool | _NotGiven
class pipecat.services.smallest.stt.SmallestSTTService(*, api_key: str, base_url: str = 'wss://api.smallest.ai', encoding: str = 'linear16', sample_rate: int | None = None, settings: SmallestSTTSettings | None = None, ttfs_p99_latency: float | None = 1.0, **kwargs)[source]

Bases: WebsocketSTTService

Smallest AI real-time speech-to-text service using the Pulse WebSocket API.

Streams audio continuously over a WebSocket connection and receives interim and final transcription results with low latency. Best suited for real-time voice applications where immediate feedback is needed.

Uses Pipecat’s VAD to detect when the user stops speaking and sends a finalize message to flush the final transcript.

Example:

stt = SmallestSTTService(
    api_key="your-api-key",
    settings=SmallestSTTService.Settings(
        language=Language.EN,
        word_timestamps=True,
    ),
)
Settings

alias of SmallestSTTSettings

__init__(*, api_key: str, base_url: str = 'wss://api.smallest.ai', encoding: str = 'linear16', sample_rate: int | None = None, settings: SmallestSTTSettings | None = None, ttfs_p99_latency: float | None = 1.0, **kwargs)[source]

Initialize the Smallest AI STT service.

Parameters:
  • api_key – Smallest AI API key for authentication.

  • base_url – Base WebSocket URL for the Smallest API.

  • encoding – Audio encoding format. Defaults to “linear16”.

  • sample_rate – Audio sample rate in Hz. If None, uses the pipeline’s rate.

  • settings – Runtime-updatable settings for the STT service.

  • ttfs_p99_latency – P99 latency from speech end to final transcript in seconds.

  • **kwargs – Additional arguments passed to WebsocketSTTService.

can_generate_metrics() bool[source]

Check if this service can generate processing metrics.

language_to_service_language(language: Language) str | None[source]

Convert a Language enum to Smallest service language format.

Parameters:

language – The language to convert.

Returns:

The Smallest-specific language code, or None if not supported.

async start(frame: StartFrame)[source]

Start the service and connect to the WebSocket.

async stop(frame: EndFrame)[source]

Stop the service and disconnect from the WebSocket.

async cancel(frame: CancelFrame)[source]

Cancel the service and disconnect from the WebSocket.

async process_frame(frame: Frame, direction: FrameDirection)[source]

Process frames, handling VAD events for finalization.

async run_stt(audio: bytes) AsyncGenerator[Frame | None, None][source]

Send audio to the Smallest Pulse WebSocket for transcription.

Parameters:

audio – Raw PCM audio bytes.

Yields:

None – transcription results arrive via WebSocket messages.