utils

Audio utility functions for Pipecat.

This module provides common audio processing utilities including mixing, format conversion, volume calculation, and codec transformations for various audio formats used in Pipecat pipelines.

pipecat.audio.utils.create_file_resampler(**kwargs) BaseAudioResampler[source]

Create an audio resampler instance for batch processing of complete audio files.

Parameters:

**kwargs – Additional keyword arguments passed to the resampler constructor.

Returns:

A configured SOXRAudioResampler instance.

pipecat.audio.utils.create_stream_resampler(**kwargs) BaseAudioResampler[source]

Create a stream audio resampler instance.

Parameters:

**kwargs – Additional keyword arguments passed to the resampler constructor.

Returns:

A configured SOXRStreamAudioResampler instance.

pipecat.audio.utils.mix_audio(audio1: bytes, audio2: bytes) bytes[source]

Mix two audio streams together by adding their samples.

Both audio streams are assumed to be 16-bit signed integer PCM data. If the streams have different lengths, the shorter one is zero-padded to match the longer stream.

Parameters:
  • audio1 – First audio stream as raw bytes (16-bit signed integers).

  • audio2 – Second audio stream as raw bytes (16-bit signed integers).

Returns:

Mixed audio data as raw bytes with samples clipped to 16-bit range.

pipecat.audio.utils.interleave_stereo_audio(left_audio: bytes, right_audio: bytes) bytes[source]

Interleave left and right mono audio channels into stereo audio.

Takes two mono audio streams and combines them into a single stereo stream by interleaving the samples (L, R, L, R, …). If the channels have different lengths, both are truncated to the shorter length.

Parameters:
  • left_audio – Left channel audio as raw bytes (16-bit signed integers).

  • right_audio – Right channel audio as raw bytes (16-bit signed integers).

Returns:

Interleaved stereo audio data as raw bytes.

pipecat.audio.utils.normalize_value(value, min_value, max_value)[source]

Normalize a value to the range [0, 1] and clamp it to bounds.

Parameters:
  • value – The value to normalize.

  • min_value – The minimum value of the input range.

  • max_value – The maximum value of the input range.

Returns:

Normalized value clamped to the range [0, 1].

pipecat.audio.utils.calculate_audio_volume(audio: bytes, sample_rate: int) float[source]

Calculate the loudness level of audio data using EBU R128 standard.

Uses the pyloudnorm library to calculate integrated loudness according to the EBU R128 recommendation, then normalizes the result to [0, 1].

Parameters:
  • audio – Audio data as raw bytes (16-bit signed integers).

  • sample_rate – Sample rate of the audio in Hz.

Returns:

Normalized loudness value between 0 (quiet) and 1 (loud).

pipecat.audio.utils.exp_smoothing(value: float, prev_value: float, factor: float) float[source]

Apply exponential smoothing to a value.

Exponential smoothing is used to reduce noise in time-series data by giving more weight to recent values while still considering historical data.

Parameters:
  • value – The new value to incorporate.

  • prev_value – The previous smoothed value.

  • factor – Smoothing factor between 0 and 1. Higher values give more weight to the new value.

Returns:

The exponentially smoothed value.

async pipecat.audio.utils.ulaw_to_pcm(ulaw_bytes: bytes, in_rate: int, out_rate: int, resampler: BaseAudioResampler)[source]

Convert μ-law encoded audio to PCM and optionally resample.

Parameters:
  • ulaw_bytes – μ-law encoded audio data as raw bytes.

  • in_rate – Original sample rate of the μ-law audio in Hz.

  • out_rate – Desired output sample rate in Hz.

  • resampler – Audio resampler instance for rate conversion.

Returns:

PCM audio data as raw bytes at the specified output rate.

async pipecat.audio.utils.pcm_to_ulaw(pcm_bytes: bytes, in_rate: int, out_rate: int, resampler: BaseAudioResampler)[source]

Convert PCM audio to μ-law encoding and optionally resample.

Parameters:
  • pcm_bytes – PCM audio data as raw bytes (16-bit signed integers).

  • in_rate – Original sample rate of the PCM audio in Hz.

  • out_rate – Desired output sample rate in Hz.

  • resampler – Audio resampler instance for rate conversion.

Returns:

μ-law encoded audio data as raw bytes at the specified output rate.

async pipecat.audio.utils.alaw_to_pcm(alaw_bytes: bytes, in_rate: int, out_rate: int, resampler: BaseAudioResampler) bytes[source]

Convert A-law encoded audio to PCM and optionally resample.

Parameters:
  • alaw_bytes – A-law encoded audio data as raw bytes.

  • in_rate – Original sample rate of the A-law audio in Hz.

  • out_rate – Desired output sample rate in Hz.

  • resampler – Audio resampler instance for rate conversion.

Returns:

PCM audio data as raw bytes at the specified output rate.

async pipecat.audio.utils.pcm_to_alaw(pcm_bytes: bytes, in_rate: int, out_rate: int, resampler: BaseAudioResampler)[source]

Convert PCM audio to A-law encoding and optionally resample.

Parameters:
  • pcm_bytes – PCM audio data as raw bytes (16-bit signed integers).

  • in_rate – Original sample rate of the PCM audio in Hz.

  • out_rate – Desired output sample rate in Hz.

  • resampler – Audio resampler instance for rate conversion.

Returns:

A-law encoded audio data as raw bytes at the specified output rate.

pipecat.audio.utils.is_silence(pcm_bytes: bytes) bool[source]

Determine if an audio sample contains silence by checking amplitude levels.

This function analyzes raw PCM audio data to detect silence by comparing the maximum absolute amplitude against a predefined threshold. The audio is expected to be clean speech or complete silence without background noise.

Parameters:

pcm_bytes – Raw PCM audio data as bytes (16-bit signed integers).

Returns:

True if the audio sample is considered silence (below threshold),

False otherwise.

Return type:

bool

Note

Normal speech typically produces amplitude values between ±500 to ±5000, depending on factors like loudness and microphone gain. The threshold (SPEAKING_THRESHOLD) is set well below typical speech levels to reliably detect silence vs. speech.