Wan2.2 Usage Guide¶
This guide provides instructions for running Wan2.2 video generation models using vLLM-Omni with Cache-DiT acceleration.
Supported Models¶
- Wan-AI/Wan2.2-T2V-A14B-Diffusers: Text-to-Video (MoE architecture, 14B active parameters)
- Wan-AI/Wan2.2-I2V-A14B-Diffusers: Image-to-Video (MoE architecture, 14B active parameters)
- Wan-AI/Wan2.2-TI2V-5B-Diffusers: Unified Text-to-Video + Image-to-Video (dense 5B)
Installing vLLM-Omni¶
uv venv
source .venv/bin/activate
uv pip install vllm==0.12.0
uv pip install git+https://github.com/vllm-project/vllm-omni.git@ef01223c42be10ee260b9f6e5ec31894cd09d86e
The CLI examples below are from the vLLM-Omni repo. If you want to run them directly, clone that repo and run the scripts from its examples/offline_inference directory.
Text-to-Video Generation¶
Basic Usage¶
from vllm_omni.entrypoints.omni import Omni
omni = Omni(model="Wan-AI/Wan2.2-T2V-A14B-Diffusers")
frames = omni.generate(
"Two anthropomorphic cats in comfy boxing gear fight on a spotlighted stage.",
height=720,
width=1280,
num_frames=81,
num_inference_steps=40,
guidance_scale=4.0,
)
CLI Usage¶
python examples/offline_inference/text_to_video/text_to_video.py \
--model Wan-AI/Wan2.2-T2V-A14B-Diffusers \
--prompt "A serene lakeside sunrise with mist over the water." \
--height 720 \
--width 1280 \
--num_frames 81 \
--num_inference_steps 40 \
--guidance_scale 4.0 \
--fps 24 \
--output t2v_output.mp4
Image-to-Video Generation¶
Basic Usage¶
import PIL.Image
from vllm_omni.entrypoints.omni import Omni
omni = Omni(model="Wan-AI/Wan2.2-I2V-A14B-Diffusers")
image = PIL.Image.open("input.jpg").convert("RGB")
frames = omni.generate(
"A cat playing with yarn",
pil_image=image,
height=480,
width=832,
num_frames=81,
num_inference_steps=50,
guidance_scale=5.0,
)
CLI Usage¶
python examples/offline_inference/image_to_video/image_to_video.py \
--model Wan-AI/Wan2.2-I2V-A14B-Diffusers \
--image input.jpg \
--prompt "A cat playing with yarn" \
--num_frames 81 \
--num_inference_steps 50 \
--guidance_scale 5.0 \
--fps 16 \
--output i2v_output.mp4
TI2V CLI Usage¶
python examples/offline_inference/image_to_video/image_to_video.py \
--model Wan-AI/Wan2.2-TI2V-5B-Diffusers \
--image input.jpg \
--prompt "A cat playing with yarn" \
--num_frames 81 \
--num_inference_steps 50 \
--guidance_scale 5.0 \
--fps 16 \
--output ti2v_output.mp4
Cache-DiT Acceleration¶
vLLM-Omni supports Cache-DiT acceleration for Wan2.2 models, which can significantly speed up video generation through caching mechanisms.
Enabling Cache-DiT¶
from vllm_omni.entrypoints.omni import Omni
omni = Omni(
model="Wan-AI/Wan2.2-T2V-A14B-Diffusers",
cache_backend="cache_dit",
)
frames = omni.generate(
"A beautiful sunset over the ocean",
height=720,
width=1280,
num_frames=81,
num_inference_steps=40,
)
Custom Cache-DiT Configuration¶
For fine-tuned control over the acceleration:
omni = Omni(
model="Wan-AI/Wan2.2-T2V-A14B-Diffusers",
cache_backend="cache_dit",
cache_config={
"Fn_compute_blocks": 8,
"Bn_compute_blocks": 0,
"max_warmup_steps": 4,
"residual_diff_threshold": 0.12,
},
)
Key Parameters¶
| Parameter | Default | Description |
|---|---|---|
height |
720 (T2V) / auto (I2V) | Video height (multiples of 16) |
width |
1280 (T2V) / auto (I2V) | Video width (multiples of 16) |
num_frames |
81 | Number of frames to generate |
num_inference_steps |
40-50 | Denoising steps |
guidance_scale |
4.0-5.0 | Classifier-free guidance scale |
boundary_ratio |
0.875 | Boundary split ratio for MoE models |
flow_shift |
5.0 (720p) / 12.0 (480p) | Scheduler flow shift |
Notes¶
- The CLI scripts use
diffusers.utils.export_to_video, sodiffusersmust be installed in the environment where you run them.