skip_tags_aggregator

Skip tags aggregator for preventing sentence boundaries within tagged content.

This module provides a text aggregator that prevents end-of-sentence matching between specified start/end tag pairs, ensuring that tagged content is processed as a unit regardless of internal punctuation.

class pipecat.utils.text.skip_tags_aggregator.SkipTagsAggregator(tags: Sequence[tuple[str, str]], **kwargs)[source]

Bases: SimpleTextAggregator

Aggregator that prevents end of sentence matching between start/end tags.

This aggregator buffers text until it finds an end of sentence or a start tag. If a start tag is found the aggregator will keep aggregating text unconditionally until the corresponding end tag is found. It’s particularly useful for processing content with custom delimiters that should prevent text from being considered for end of sentence matching.

The aggregator ensures that tags spanning multiple text chunks are correctly identified and that content within tags is never split at sentence boundaries.

__init__(tags: Sequence[tuple[str, str]], **kwargs)[source]

Initialize the skip tags aggregator.

Parameters:
  • tags – Sequence of StartEndTags objects defining the tag pairs that should prevent sentence boundary detection.

  • **kwargs – Additional arguments passed to SimpleTextAggregator (e.g. aggregation_type).

async aggregate(text: str) AsyncIterator[Aggregation][source]

Aggregate text while respecting tag boundaries.

Processes the input text character-by-character, updates tag state, and uses the parent’s lookahead logic for sentence detection when not inside tags.

In TOKEN mode, text is passed through immediately unless we’re inside a tag, in which case we buffer until the closing tag is found.

Parameters:

text – Text to aggregate.

Yields:

Aggregation objects containing text up to a sentence boundary, marked as SENTENCE type (or TOKEN type in TOKEN mode).

async handle_interruption()[source]

Handle interruptions by clearing the buffer and tag state.

Called when an interruption occurs in the processing pipeline, to reset the state and discard any partially aggregated text.

async reset()[source]

Clear the internally aggregated text and tag state.

Resets the aggregator to its initial state, discarding any buffered text.