vllm_omni.utils.audio ¶
Audio utility functions shared across models and entrypoints.
audio_chunk_pcm_bytes ¶
audio_chunk_pcm_bytes(omni_res: OmniRequestOutput) -> int
Best-effort PCM byte count of the last audio chunk in omni_res.
Used by the audio-streaming continuity tracker to size the player buffer. Returns 0 when the chunk shape can't be interpreted — caller drops the sample rather than recording a wrong byte count.
audio_chunk_sample_rate ¶
audio_chunk_sample_rate(omni_res: OmniRequestOutput) -> int
Resolve audio sample rate for the request's audio stream.
mel_filter_bank ¶
mel_filter_bank(
sr: int,
n_fft: int,
n_mels: int,
fmin: float = 0.0,
fmax: float | None = None,
) -> Tensor
Compute a mel filterbank matrix.
Drop-in replacement for librosa.filters.mel using torchaudio.functional.melscale_fbanks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sr | int | Sample rate of the audio. | required |
n_fft | int | FFT window size. | required |
n_mels | int | Number of mel bands. | required |
fmin | float | Minimum frequency (Hz). | 0.0 |
fmax | float | None | Maximum frequency (Hz). Defaults to | None |
Returns:
| Type | Description |
|---|---|
Tensor | Tensor of shape |
peak_normalize ¶
Normalize audio so peak amplitude reaches a target dB level.
Drop-in replacement for sox.Transformer().norm(db_level=...).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
audio | ndarray | Input waveform as a 1-D numpy array. | required |
db_level | float | Target peak amplitude in dBFS. | -6.0 |
Returns:
| Type | Description |
|---|---|
ndarray | Normalized waveform with the same dtype as audio. |