Skip to content

vllm_omni.utils.audio

Audio utility functions shared across models and entrypoints.

audio_chunk_pcm_bytes

audio_chunk_pcm_bytes(omni_res: OmniRequestOutput) -> int

Best-effort PCM byte count of the last audio chunk in omni_res.

Used by the audio-streaming continuity tracker to size the player buffer. Returns 0 when the chunk shape can't be interpreted — caller drops the sample rather than recording a wrong byte count.

audio_chunk_sample_rate

audio_chunk_sample_rate(omni_res: OmniRequestOutput) -> int

Resolve audio sample rate for the request's audio stream.

mel_filter_bank

mel_filter_bank(
    sr: int,
    n_fft: int,
    n_mels: int,
    fmin: float = 0.0,
    fmax: float | None = None,
) -> Tensor

Compute a mel filterbank matrix.

Drop-in replacement for librosa.filters.mel using torchaudio.functional.melscale_fbanks.

Parameters:

Name Type Description Default
sr int

Sample rate of the audio.

required
n_fft int

FFT window size.

required
n_mels int

Number of mel bands.

required
fmin float

Minimum frequency (Hz).

0.0
fmax float | None

Maximum frequency (Hz). Defaults to sr / 2.

None

Returns:

Type Description
Tensor

Tensor of shape (n_mels, n_fft // 2 + 1).

peak_normalize

peak_normalize(
    audio: ndarray, db_level: float = -6.0
) -> ndarray

Normalize audio so peak amplitude reaches a target dB level.

Drop-in replacement for sox.Transformer().norm(db_level=...).

Parameters:

Name Type Description Default
audio ndarray

Input waveform as a 1-D numpy array.

required
db_level float

Target peak amplitude in dBFS.

-6.0

Returns:

Type Description
ndarray

Normalized waveform with the same dtype as audio.