Image-To-Video¶
Source https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_video.
This example demonstrates how to generate videos from images using Wan2.2 Image-to-Video models with vLLM-Omni's offline inference API.
Local CLI Usage¶
Download the example image:
Wan2.2-I2V-A14B-Diffusers (MoE)¶
python image_to_video.py \
--model Wan-AI/Wan2.2-I2V-A14B-Diffusers \
--image cherry_blossom.jpg \
--prompt "Cherry blossoms swaying gently in the breeze, petals falling, smooth motion" \
--negative-prompt "<optional quality filter>" \
--height 480 \
--width 832 \
--num-frames 48 \
--guidance-scale 5.0 \
--guidance-scale-high 6.0 \
--num-inference-steps 40 \
--boundary-ratio 0.875 \
--flow-shift 12.0 \
--fps 16 \
--output i2v_output.mp4
Wan2.2-TI2V-5B-Diffusers (Unified)¶
python image_to_video.py \
--model Wan-AI/Wan2.2-TI2V-5B-Diffusers \
--image cherry_blossom.jpg \
--prompt "Cherry blossoms swaying gently in the breeze, petals falling, smooth motion" \
--negative-prompt "<optional quality filter>" \
--height 480 \
--width 832 \
--num-frames 48 \
--guidance-scale 4.0 \
--num-inference-steps 40 \
--flow-shift 12.0 \
--fps 16 \
--output i2v_output.mp4
Key arguments:
--model: Model ID (I2V-A14B for MoE, TI2V-5B for unified T2V+I2V).--image: Path to input image (required).--prompt: Text description of desired motion/animation.--height/--width: Output resolution (auto-calculated from image if not set). Dimensions should be multiples of 16.--num-frames: Number of frames (default 81).--guidance-scaleand--guidance-scale-high: CFG scale (applied to low/high-noise stages for MoE).--negative-prompt: Optional list of artifacts to suppress.--boundary-ratio: Boundary split ratio for two-stage MoE models.--flow-shift: Scheduler flow shift (5.0 for 720p, 12.0 for 480p).--sample-solver: Wan2.2 sampling solver. Useunipcfor the default multistep solver, oreulerfor Lightning/Distill checkpoints.--num-inference-steps: Number of denoising steps (default 50).--fps: Frames per second for the saved MP4 (requiresdiffusersexport_to_video).--output: Path to save the generated video.--vae-use-slicing: Enable VAE slicing for memory optimization.--vae-use-tiling: Enable VAE tiling for memory optimization.--cfg-parallel-size: set it to 2 to enable CFG Parallel. See more examples inuser_guide.--tensor-parallel-size: tensor parallel size (effective for models that support TP, e.g. LTX2).--enable-cpu-offload: enable CPU offloading for diffusion models.--use-hsdp: Enable Hybrid Sharded Data Parallel to shard model weights across GPUs.--hsdp-shard-size: Number of GPUs to shard model weights across within each replica group. -1 (default) auto-calculates as world_size / replicate_size.--hsdp-replicate-size: Number of replica groups for HSDP. Each replica holds a full sharded copy. Default 1 means pure sharding (no replication).
ℹ️ If you encounter OOM errors, try using
--vae-use-slicingand--vae-use-tilingto reduce memory usage.
For Wan2.2 LightX2V-converted local Diffusers directories and related LoRA assets, see the LoRA guide.
Example materials¶
image_to_video.py
Large file omitted from the rendered docs. View it on GitHub: https://github.com/vllm-project/vllm-omni/blob/main/examples/offline_inference/image_to_video/image_to_video.py.