simple_text_aggregator

Simple text aggregator for basic sentence-boundary text processing.

This module provides a straightforward text aggregator that accumulates text until it finds an end-of-sentence marker, making it suitable for basic TTS text processing scenarios.

class pipecat.utils.text.simple_text_aggregator.SimpleTextAggregator(**kwargs)[source]

Bases: BaseTextAggregator

Simple text aggregator that accumulates text until sentence boundaries.

This aggregator provides basic functionality for accumulating text tokens and releasing them when an end-of-sentence marker is detected. It’s the most straightforward implementation of text aggregation for TTS processing.

__init__(**kwargs)[source]

Initialize the simple text aggregator.

Creates an empty text buffer ready to begin accumulating text tokens.

Parameters:: **kwargs – Additional arguments passed to BaseTextAggregator (e.g. aggregation_type).

property text: Aggregation

Get the currently aggregated text.

Returns:: The text that has been accumulated in the buffer.

async aggregate(text: str) → AsyncIterator[Aggregation][source]

Aggregate text and yield completed aggregations.

In SENTENCE mode, processes the input text character-by-character. When sentence-ending punctuation is detected, it waits for non-whitespace lookahead before calling NLTK.

In TOKEN mode, yields the text immediately without buffering.

Parameters:: text – Text to aggregate.
Yields:: Aggregation objects (sentences in SENTENCE mode, tokens in TOKEN mode).

async flush() → Aggregation | None[source]

Flush any remaining text in the buffer.

Returns any text remaining in the buffer. This is called at the end of a stream to ensure all text is processed. In TOKEN mode, returns None since tokens are yielded immediately.

Returns:: Any remaining text as a sentence, or None if buffer is empty or in TOKEN mode.

async handle_interruption()[source]

Handle interruptions by clearing the text buffer.

Called when an interruption occurs in the processing pipeline, discarding any partially accumulated text.

async reset()[source]

Clear the internally aggregated text.

Resets the aggregator to its initial empty state, discarding any accumulated text content.