utils
Audio utility functions for Pipecat.
This module provides common audio processing utilities including mixing, format conversion, volume calculation, and codec transformations for various audio formats used in Pipecat pipelines.
- pipecat.audio.utils.create_file_resampler(**kwargs) BaseAudioResampler[source]
Create an audio resampler instance for batch processing of complete audio files.
- Parameters:
**kwargs – Additional keyword arguments passed to the resampler constructor.
- Returns:
A configured SOXRAudioResampler instance.
- pipecat.audio.utils.create_stream_resampler(**kwargs) BaseAudioResampler[source]
Create a stream audio resampler instance.
- Parameters:
**kwargs – Additional keyword arguments passed to the resampler constructor.
- Returns:
A configured SOXRStreamAudioResampler instance.
- pipecat.audio.utils.mix_audio(audio1: bytes, audio2: bytes) bytes[source]
Mix two audio streams together by adding their samples.
Both audio streams are assumed to be 16-bit signed integer PCM data. If the streams have different lengths, the shorter one is zero-padded to match the longer stream.
- Parameters:
audio1 – First audio stream as raw bytes (16-bit signed integers).
audio2 – Second audio stream as raw bytes (16-bit signed integers).
- Returns:
Mixed audio data as raw bytes with samples clipped to 16-bit range.
- pipecat.audio.utils.interleave_stereo_audio(left_audio: bytes, right_audio: bytes) bytes[source]
Interleave left and right mono audio channels into stereo audio.
Takes two mono audio streams and combines them into a single stereo stream by interleaving the samples (L, R, L, R, …). If the channels have different lengths, both are truncated to the shorter length.
- Parameters:
left_audio – Left channel audio as raw bytes (16-bit signed integers).
right_audio – Right channel audio as raw bytes (16-bit signed integers).
- Returns:
Interleaved stereo audio data as raw bytes.
- pipecat.audio.utils.normalize_value(value, min_value, max_value)[source]
Normalize a value to the range [0, 1] and clamp it to bounds.
- Parameters:
value – The value to normalize.
min_value – The minimum value of the input range.
max_value – The maximum value of the input range.
- Returns:
Normalized value clamped to the range [0, 1].
- pipecat.audio.utils.calculate_audio_volume(audio: bytes, sample_rate: int) float[source]
Calculate the loudness level of audio data using EBU R128 standard.
Uses the pyloudnorm library to calculate integrated loudness according to the EBU R128 recommendation, then normalizes the result to [0, 1].
- Parameters:
audio – Audio data as raw bytes (16-bit signed integers).
sample_rate – Sample rate of the audio in Hz.
- Returns:
Normalized loudness value between 0 (quiet) and 1 (loud).
- pipecat.audio.utils.exp_smoothing(value: float, prev_value: float, factor: float) float[source]
Apply exponential smoothing to a value.
Exponential smoothing is used to reduce noise in time-series data by giving more weight to recent values while still considering historical data.
- Parameters:
value – The new value to incorporate.
prev_value – The previous smoothed value.
factor – Smoothing factor between 0 and 1. Higher values give more weight to the new value.
- Returns:
The exponentially smoothed value.
- async pipecat.audio.utils.ulaw_to_pcm(ulaw_bytes: bytes, in_rate: int, out_rate: int, resampler: BaseAudioResampler)[source]
Convert μ-law encoded audio to PCM and optionally resample.
- Parameters:
ulaw_bytes – μ-law encoded audio data as raw bytes.
in_rate – Original sample rate of the μ-law audio in Hz.
out_rate – Desired output sample rate in Hz.
resampler – Audio resampler instance for rate conversion.
- Returns:
PCM audio data as raw bytes at the specified output rate.
- async pipecat.audio.utils.pcm_to_ulaw(pcm_bytes: bytes, in_rate: int, out_rate: int, resampler: BaseAudioResampler)[source]
Convert PCM audio to μ-law encoding and optionally resample.
- Parameters:
pcm_bytes – PCM audio data as raw bytes (16-bit signed integers).
in_rate – Original sample rate of the PCM audio in Hz.
out_rate – Desired output sample rate in Hz.
resampler – Audio resampler instance for rate conversion.
- Returns:
μ-law encoded audio data as raw bytes at the specified output rate.
- async pipecat.audio.utils.alaw_to_pcm(alaw_bytes: bytes, in_rate: int, out_rate: int, resampler: BaseAudioResampler) bytes[source]
Convert A-law encoded audio to PCM and optionally resample.
- Parameters:
alaw_bytes – A-law encoded audio data as raw bytes.
in_rate – Original sample rate of the A-law audio in Hz.
out_rate – Desired output sample rate in Hz.
resampler – Audio resampler instance for rate conversion.
- Returns:
PCM audio data as raw bytes at the specified output rate.
- async pipecat.audio.utils.pcm_to_alaw(pcm_bytes: bytes, in_rate: int, out_rate: int, resampler: BaseAudioResampler)[source]
Convert PCM audio to A-law encoding and optionally resample.
- Parameters:
pcm_bytes – PCM audio data as raw bytes (16-bit signed integers).
in_rate – Original sample rate of the PCM audio in Hz.
out_rate – Desired output sample rate in Hz.
resampler – Audio resampler instance for rate conversion.
- Returns:
A-law encoded audio data as raw bytes at the specified output rate.
- pipecat.audio.utils.is_silence(pcm_bytes: bytes) bool[source]
Determine if an audio sample contains silence by checking amplitude levels.
This function analyzes raw PCM audio data to detect silence by comparing the maximum absolute amplitude against a predefined threshold. The audio is expected to be clean speech or complete silence without background noise.
- Parameters:
pcm_bytes – Raw PCM audio data as bytes (16-bit signed integers).
- Returns:
- True if the audio sample is considered silence (below threshold),
False otherwise.
- Return type:
bool
Note
Normal speech typically produces amplitude values between ±500 to ±5000, depending on factors like loudness and microphone gain. The threshold (SPEAKING_THRESHOLD) is set well below typical speech levels to reliably detect silence vs. speech.