Skip to content

vllm_omni.diffusion.utils.media_utils

Video/audio muxing utilities using PyAV (no ffmpeg binary dependency).

mux_video_audio_bytes

mux_video_audio_bytes(
    video_frames: ndarray,
    audio_waveform: ndarray | None = None,
    *,
    fps: float = 25.0,
    audio_sample_rate: int = 44100,
    video_codec: str = "h264",
    audio_codec: str = "aac",
    crf: str = "18",
    video_codec_options: dict[str, str] | None = None,
) -> bytes

Mux video frames and optional audio waveform into MP4 bytes.

Parameters:

Name Type Description Default
video_frames ndarray

uint8 array of shape (T, H, W, 3) (RGB).

required
audio_waveform ndarray | None

float32 array – mono (N,) or (N, C) / (C, N).

None
fps float

Video frame rate.

25.0
audio_sample_rate int

Audio sample rate in Hz.

44100
video_codec str

Video codec name.

'h264'
audio_codec str

Audio codec name.

'aac'
crf str

Constant rate factor for the video encoder.

'18'

Returns:

Type Description
bytes

Raw MP4 bytes ready to be written to disk or streamed.