vllm_omni.transformers_utils.processors.audiox ¶
Input transform utilities for the AudioX diffusion pipeline.
Loads and normalizes the raw audio/video conditioning signals (file path / URL / data: URI / np.ndarray / torch.Tensor) into the (channels, samples) and [T, C, H, W] tensors the pipeline needs. The pipeline itself stays focused on model forward + sampling logic.
VIDEO_CONDITIONED_TASKS module-attribute ¶
VIDEO_CONDITIONED_TASKS = (
VIDEO_ONLY_TASKS | TEXT_VIDEO_TASKS
)
adjust_video_duration ¶
load_video_source ¶
load_video_source(
source: Any,
*,
target_fps: int,
duration: float,
seek_time: float = 0.0,
) -> Tensor
materialize_media_source ¶
Return a local filesystem path for source.
Accepts a local path, a data:<mime>;base64,... URI, or an http(s):// URL. Anything non-local is fetched into a NamedTemporaryFile and that path is returned; callers don't need to clean the tempfile up (the OS does on exit).
normalize_prompts ¶
Coerce raw prompt entries into {"prompt": str, ...} dicts (preserves extras).