utils

Audio utility functions for Pipecat.

This module provides common audio processing utilities including mixing, format conversion, volume calculation, and codec transformations for various audio formats used in Pipecat pipelines.

pipecat.audio.utils.create_file_resampler(**kwargs) → BaseAudioResampler[source]

Create an audio resampler instance for batch processing of complete audio files.

Parameters:: **kwargs – Additional keyword arguments passed to the resampler constructor.
Returns:: A configured SOXRAudioResampler instance.

pipecat.audio.utils.create_stream_resampler(**kwargs) → BaseAudioResampler[source]

Create a stream audio resampler instance.

Parameters:: **kwargs – Additional keyword arguments passed to the resampler constructor.
Returns:: A configured SOXRStreamAudioResampler instance.

pipecat.audio.utils.mix_audio(audio1: bytes, audio2: bytes) → bytes[source]

Mix two audio streams together by adding their samples.

Both audio streams are assumed to be 16-bit signed integer PCM data. If the streams have different lengths, the shorter one is zero-padded to match the longer stream.

Parameters:

audio1 – First audio stream as raw bytes (16-bit signed integers).
audio2 – Second audio stream as raw bytes (16-bit signed integers).

Returns:

Mixed audio data as raw bytes with samples clipped to 16-bit range.

pipecat.audio.utils.interleave_stereo_audio(left_audio: bytes, right_audio: bytes) → bytes[source]

Interleave left and right mono audio channels into stereo audio.

Takes two mono audio streams and combines them into a single stereo stream by interleaving the samples (L, R, L, R, …). If the channels have different lengths, both are truncated to the shorter length.

Parameters:

left_audio – Left channel audio as raw bytes (16-bit signed integers).
right_audio – Right channel audio as raw bytes (16-bit signed integers).

Returns:

Interleaved stereo audio data as raw bytes.

pipecat.audio.utils.normalize_value(value, min_value, max_value)[source]

Normalize a value to the range [0, 1] and clamp it to bounds.

Parameters:

value – The value to normalize.
min_value – The minimum value of the input range.
max_value – The maximum value of the input range.

Returns:

Normalized value clamped to the range [0, 1].

pipecat.audio.utils.calculate_audio_volume(audio: bytes, sample_rate: int) → float[source]

Calculate the loudness level of audio data using EBU R128 standard.

Uses the pyloudnorm library to calculate integrated loudness according to the EBU R128 recommendation, then normalizes the result to [0, 1].

Parameters:

audio – Audio data as raw bytes (16-bit signed integers).
sample_rate – Sample rate of the audio in Hz.

Returns:

Normalized loudness value between 0 (quiet) and 1 (loud).

pipecat.audio.utils.exp_smoothing(value: float, prev_value: float, factor: float) → float[source]

Apply exponential smoothing to a value.

Exponential smoothing is used to reduce noise in time-series data by giving more weight to recent values while still considering historical data.

Parameters:

value – The new value to incorporate.
prev_value – The previous smoothed value.
factor – Smoothing factor between 0 and 1. Higher values give more weight to the new value.

Returns:

The exponentially smoothed value.

async pipecat.audio.utils.ulaw_to_pcm(ulaw_bytes: bytes, in_rate: int, out_rate: int, resampler: BaseAudioResampler)[source]

Convert μ-law encoded audio to PCM and optionally resample.

Parameters:

ulaw_bytes – μ-law encoded audio data as raw bytes.
in_rate – Original sample rate of the μ-law audio in Hz.
out_rate – Desired output sample rate in Hz.
resampler – Audio resampler instance for rate conversion.

Returns:

PCM audio data as raw bytes at the specified output rate.

async pipecat.audio.utils.pcm_to_ulaw(pcm_bytes: bytes, in_rate: int, out_rate: int, resampler: BaseAudioResampler)[source]

Convert PCM audio to μ-law encoding and optionally resample.

Parameters:

pcm_bytes – PCM audio data as raw bytes (16-bit signed integers).
in_rate – Original sample rate of the PCM audio in Hz.
out_rate – Desired output sample rate in Hz.
resampler – Audio resampler instance for rate conversion.

Returns:

μ-law encoded audio data as raw bytes at the specified output rate.

async pipecat.audio.utils.alaw_to_pcm(alaw_bytes: bytes, in_rate: int, out_rate: int, resampler: BaseAudioResampler) → bytes[source]

Convert A-law encoded audio to PCM and optionally resample.

Parameters:

alaw_bytes – A-law encoded audio data as raw bytes.
in_rate – Original sample rate of the A-law audio in Hz.
out_rate – Desired output sample rate in Hz.
resampler – Audio resampler instance for rate conversion.

Returns:

PCM audio data as raw bytes at the specified output rate.

async pipecat.audio.utils.pcm_to_alaw(pcm_bytes: bytes, in_rate: int, out_rate: int, resampler: BaseAudioResampler)[source]

Convert PCM audio to A-law encoding and optionally resample.

Parameters:

pcm_bytes – PCM audio data as raw bytes (16-bit signed integers).
in_rate – Original sample rate of the PCM audio in Hz.
out_rate – Desired output sample rate in Hz.
resampler – Audio resampler instance for rate conversion.

Returns:

A-law encoded audio data as raw bytes at the specified output rate.

pipecat.audio.utils.is_silence(pcm_bytes: bytes) → bool[source]

Determine if an audio sample contains silence by checking amplitude levels.

This function analyzes raw PCM audio data to detect silence by comparing the maximum absolute amplitude against a predefined threshold. The audio is expected to be clean speech or complete silence without background noise.

Parameters:

pcm_bytes – Raw PCM audio data as bytes (16-bit signed integers).

Returns:

True if the audio sample is considered silence (below threshold),: False otherwise.

Return type:

bool

Note

Normal speech typically produces amplitude values between ±500 to ±5000, depending on factors like loudness and microphone gain. The threshold (SPEAKING_THRESHOLD) is set well below typical speech levels to reliably detect silence vs. speech.