vllm_omni.utils.audio ¶

Audio utility functions shared across models and entrypoints.

audio_chunk_pcm_bytes ¶

audio_chunk_pcm_bytes(omni_res: OmniRequestOutput) -> int

Best-effort PCM byte count of the last audio chunk in omni_res.

Used by the audio-streaming continuity tracker to size the player buffer. Returns 0 when the chunk shape can't be interpreted — caller drops the sample rather than recording a wrong byte count.

audio_chunk_sample_rate ¶

audio_chunk_sample_rate(omni_res: OmniRequestOutput) -> int

Resolve audio sample rate for the request's audio stream.

mel_filter_bank ¶

mel_filter_bank(
    sr: int,
    n_fft: int,
    n_mels: int,
    fmin: float = 0.0,
    fmax: float | None = None,
) -> Tensor

Compute a mel filterbank matrix.

Drop-in replacement for librosa.filters.mel using torchaudio.functional.melscale_fbanks.

Parameters:

Name	Type	Description	Default
`sr`	`int`	Sample rate of the audio.	required
`n_fft`	`int`	FFT window size.	required
`n_mels`	`int`	Number of mel bands.	required
`fmin`	`float`	Minimum frequency (Hz).	`0.0`
`fmax`	`float \| None`	Maximum frequency (Hz). Defaults to `sr / 2`.	`None`

Returns:

Type	Description
`Tensor`	Tensor of shape `(n_mels, n_fft // 2 + 1)`.

peak_normalize ¶

peak_normalize(
    audio: ndarray, db_level: float = -6.0
) -> ndarray

Normalize audio so peak amplitude reaches a target dB level.

Drop-in replacement for sox.Transformer().norm(db_level=...).

Parameters:

Name	Type	Description	Default
`audio`	`ndarray`	Input waveform as a 1-D numpy array.	required
`db_level`	`float`	Target peak amplitude in dBFS.	`-6.0`

Returns:

Type	Description
`ndarray`	Normalized waveform with the same dtype as audio.