Qwen-Image Usage Guide¶
Qwen-Image models include the following models:
| Model | HuggingFace | Description |
|---|---|---|
| Qwen-Image | 🤗 Qwen/Qwen-Image | Text-to-image generation (20B parameters, Aug 2025) |
| Qwen-Image-2512 | 🤗 Qwen/Qwen-Image-2512 | Updated T2I with enhanced realism and text rendering (Dec 2025) |
| Qwen-Image-Edit | 🤗 Qwen/Qwen-Image-Edit | Single-image editing with semantic and appearance control (Aug 2025) |
| Qwen-Image-Edit-2509 | 🤗 Qwen/Qwen-Image-Edit-2509 | Multi-image editing with improved consistency (Sep 2025) |
| Qwen-Image-Edit-2511 | 🤗 Qwen/Qwen-Image-Edit-2511 | Further enhanced consistency, built-in LoRA support (Nov 2025) |
| Qwen-Image-Layered | 🤗 Qwen/Qwen-Image-Layered | Decomposes an input image into multiple RGBA layers (Dec 2025) |
All models share the same DiT transformer core; hence, the acceleration methods (e.g., cache methods, parallelism methods) are applicable across the entire series.
Installation¶
# Clone and install vllm-omni
git clone https://github.com/vllm-project/vllm-omni.git
cd vllm-omni
uv venv
source .venv/bin/activate
uv pip install -e . vllm==0.18.0
Usage¶
Text-to-Image (Qwen-Image, Qwen-Image-2512)¶
Qwen-Image and Qwen-Image-2512 are text-to-image models. Use the text_to_image.py script:
# Qwen-Image (default)
python3 ./examples/offline_inference/text_to_image/text_to_image.py \
--model Qwen/Qwen-Image \
--prompt "a cup of coffee on the table" \
--output output_qwen_image.png \
--num-inference-steps 50 \
--cfg-scale 4.0
# Qwen-Image-2512
python3 ./examples/offline_inference/text_to_image/text_to_image.py \
--model Qwen/Qwen-Image-2512 \
--prompt "a cup of coffee on the table" \
--output output_qwen_image_2512.png \
--num-inference-steps 50 \
--cfg-scale 4.0
Notes: 1. vLLM-Omni enables torch.compile by default. Try
--enforce-eagerif you want to disable it. 2. vLLM-Omni does not enable CPU offload automatically. If you encounter OOM, please--enable-cpu-offloador--enable-layerwise-offload.
Image Editing (Qwen-Image-Edit)¶
Qwen-Image-Edit is the image editing version of Qwen-Image. It simultaneously feeds the input image into Qwen2.5-VL (for visual semantic control) and the VAE Encoder (for visual appearance control), achieving capabilities in both semantic and appearance editing.
# Single image input (Qwen-Image-Edit)
python3 ./examples/offline_inference/image_to_image/image_edit.py \
--model Qwen/Qwen-Image-Edit \
--image qwen_bear.png \
--prompt "Let this mascot dance under the moon, surrounded by floating stars and poetic bubbles such as 'Be Kind'" \
--output output_image_edit.png \
--num-inference-steps 50 \
--cfg-scale 4.0
For multiple image inputs, use Qwen/Qwen-Image-Edit-2509 or Qwen/Qwen-Image-Edit-2511:
# Qwen-Image-Edit-2511 example (multiple images)
python3 ./examples/offline_inference/image_to_image/image_edit.py \
--model Qwen/Qwen-Image-Edit-2511 \
--image image1.png image2.png \
--prompt "Add a white art board written with colorful text 'vLLM-Omni' on grassland. Add a paintbrush in the bear's hands. position the bear standing in front of the art board as if painting" \
--output output_image_edit.png \
--num-inference-steps 50 \
--cfg-scale 4.0
Image Layering (Qwen-Image-Layered)¶
Qwen-Image-Layered decomposes an input image into multiple RGBA layers:
python3 ./examples/offline_inference/image_to_image/image_edit.py \
--model Qwen/Qwen-Image-Layered \
--image input.png \
--prompt "" \
--output layered \
--num-inference-steps 50 \
--cfg-scale 4.0 \
--layers 4 \
--color-format "RGBA"
Key Arguments¶
| Argument | Description |
|---|---|
--model |
Model name or local path. Use Qwen/Qwen-Image-Edit-2509 or later for multiple image support. |
--image |
Path(s) to the source image(s) (PNG/JPG, converted to RGB). Can specify multiple images. |
--prompt / --negative-prompt |
Text description (string). |
--cfg-scale |
True classifier-free guidance scale (default: 4.0). Classifier-free guidance is enabled by setting cfg_scale > 1 and providing a negative_prompt. Higher guidance scale encourages images closely linked to the text prompt, usually at the expense of lower image quality. |
--guidance-scale |
Guidance scale for guidance-distilled models (default: 1.0, disabled). Unlike classifier-free guidance (--cfg-scale), guidance-distilled models take the guidance scale directly as an input parameter. Enabled when guidance_scale > 1. Ignored when not using guidance-distilled models. |
--num-inference-steps |
Diffusion sampling steps (more steps = higher quality, slower). |
--output |
Path to save the generated PNG. For Qwen-Image-Layered, this is used as the filename prefix. |
--vae-use-slicing |
Enable VAE slicing for memory optimization. |
--vae-use-tiling |
Enable VAE tiling for memory optimization. |
--cfg-parallel-size |
Set to 2 to enable CFG Parallel. |
--enable-cpu-offload |
Enable CPU offloading for diffusion models. |
--layers |
Number of layers to decompose the input image into (Qwen-Image-Layered only). |
--color-format |
Output color channel format (RGB or RGBA). Qwen-Image-Layered uses RGBA. |
Acceleration Methods¶
Cache Acceleration¶
vLLM-Omni supports cache-dit and tea-cache for Qwen-Image models.
Cache-DiT¶
# Text-to-Image with Cache-DiT
python3 ./examples/offline_inference/text_to_image/text_to_image.py \
--model Qwen/Qwen-Image \
--prompt "a cup of coffee on the table" \
--cache-backend cache_dit
Advanced Cache-DiT options:
python3 ./examples/offline_inference/image_to_image/image_edit.py \
--model Qwen/Qwen-Image-Edit \
--image qwen_bear.png \
--prompt "Edit description" \
--cache-backend cache_dit \
--cache-dit-max-continuous-cached-steps 3 \
--cache-dit-residual-diff-threshold 0.24 \
--cache-dit-enable-taylorseer
TeaCache¶
# Text-to-Image with TeaCache
python3 ./examples/offline_inference/text_to_image/text_to_image.py \
--model Qwen/Qwen-Image \
--prompt "a cup of coffee on the table" \
--cache-backend tea_cache
Ulysses Sequence Parallelism¶
Distributes computation across GPUs without quality loss. Recommended for high-resolution images (>1536px) with 2–8 GPUs.
# Text-to-Image with Ulysses SP
python3 ./examples/offline_inference/text_to_image/text_to_image.py \
--model Qwen/Qwen-Image \
--prompt "a cup of coffee on the table" \
--ulysses-degree 4
# Image Editing with Ulysses SP
python3 ./examples/offline_inference/image_to_image/image_edit.py \
--model Qwen/Qwen-Image-Edit \
--image qwen_bear.png \
--prompt "Add a white art board written with colorful text 'vLLM-Omni' on grassland." \
--output output_image_edit.png \
--num-inference-steps 50 \
--cfg-scale 4.0 \
--ulysses-degree 4
Ring-Attention Sequence Parallelism¶
Ring-based sequence parallelism, suitable for memory-constrained environments with very long sequences.
# Text-to-Image with Ring-Attention
python3 ./examples/offline_inference/text_to_image/text_to_image.py \
--model Qwen/Qwen-Image \
--prompt "a cup of coffee on the table" \
--ring-degree 4
# Image Editing with Ring-Attention
python3 ./examples/offline_inference/image_to_image/image_edit.py \
--model Qwen/Qwen-Image-Edit \
--image qwen_bear.png \
--prompt "Edit description" \
--ring-degree 4
CFG Parallelism¶
Splits classifier-free guidance positive/negative branches across 2 GPUs. Particularly effective for image editing with cfg-scale > 1.
# Image Editing with CFG Parallel (2 GPUs)
python3 ./examples/offline_inference/image_to_image/image_edit.py \
--model Qwen/Qwen-Image-Edit \
--image qwen_bear.png \
--prompt "Edit description" \
--cfg-parallel-size 2 \
--num-inference-steps 50 \
--cfg-scale 4.0
Tensor Parallelism¶
Shards model weights across multiple GPUs. Useful for running the 20B model across 2+ GPUs.
# Text-to-Image with Tensor Parallelism (2 GPUs)
python3 ./examples/offline_inference/text_to_image/text_to_image.py \
--model Qwen/Qwen-Image \
--prompt "a cup of coffee on the table" \
--tensor-parallel-size 2
# Image Editing with Tensor Parallelism
python3 ./examples/offline_inference/image_to_image/image_edit.py \
--model Qwen/Qwen-Image-Edit \
--image qwen_bear.png \
--prompt "Edit description" \
--tensor-parallel-size 2
CPU Offload¶
Offloads DiT layers to CPU memory between forward passes. Enables inference on limited VRAM.
# Text-to-Image with CPU offload (module-wise)
python3 ./examples/offline_inference/text_to_image/text_to_image.py \
--model Qwen/Qwen-Image \
--prompt "a cup of coffee on the table" \
--enable-cpu-offload
# Image Editing with CPU offload (layerwise, saves more VRAM, but slower)
python3 ./examples/offline_inference/image_to_image/image_edit.py \
--model Qwen/Qwen-Image-Edit \
--image qwen_bear.png \
--prompt "Edit description" \
--enable-layerwise-offload
VAE Patch Parallelism¶
Distributes VAE decode tiling across GPUs, reducing peak VAE memory usage at high resolutions.
# Text-to-Image with VAE Patch Parallelism
python3 ./examples/offline_inference/text_to_image/text_to_image.py \
--model Qwen/Qwen-Image \
--prompt "a cup of coffee on the table" \
--height 1536 --width 1536 \
--ulysses-degree 2 \
--vae-patch-parallel-size 2
VAE patch parallelism cannot be used alone. It must be used together with other parallelism methods.
Quantization¶
Qwen-Image and Qwen-Image-2512 support FP8 and INT8 quantization. Qwen-Image-Edit variants do not support quantization.
FP8¶
# Text-to-Image with FP8 quantization
python3 ./examples/offline_inference/text_to_image/text_to_image.py \
--model Qwen/Qwen-Image \
--prompt "a cup of coffee on the table" \
--quantization fp8
# Skip sensitive layers (recommended for better quality)
python3 ./examples/offline_inference/text_to_image/text_to_image.py \
--model Qwen/Qwen-Image \
--prompt "a cup of coffee on the table" \
--quantization fp8 \
--ignored-layers "img_mlp"
INT8¶
python3 ./examples/offline_inference/text_to_image/text_to_image.py \
--model Qwen/Qwen-Image \
--prompt "a cup of coffee on the table" \
--quantization int8
Feature Support Summary¶
For detailed features support for Qwen-Image series models in vLLM-Omni, see the Feature Support Table For detailed compatibility between features (e.g., combining Cache + SP + CFG-Parallel), see the Feature Compatibility Guide.
Combining Acceleration Methods¶
A few guidelines help pick the right combination:
- Cache (TeaCache or Cache-DiT) reduces redundant DiT computation per inference. Parallelism (SP, TP, CFG-parallel, VAE patch) splits work across GPUs. Use one cache backend together with any supported parallel strategy. See the Feature Compatibility Guide for supported combinations.
- TeaCache and Cache-DiT cannot be used together.
- Sequence parallelism (Ulysses / Ring) is the best parallelism choice for high-resolution or long-sequence workloads. It generally outperforms tensor parallelism (TP) in these settings by distributing token-dimension computation across GPUs.
- Tensor parallelism is most useful when model weights alone do not fit on a single GPU.
- CFG parallelism targets non-distilled diffusion with full classifier-free guidance (
--cfg-scale > 1). It assigns the positive and negative CFG branches to separate GPUs, achieving up to ~1.5× speedup when guidance is the dominant cost. It is not well-suited for guidance-distilled models (CFG is not applied). - To reduce peak VRAM, use
--enable-cpu-offload,--enable-layerwise-offloador pair--vae-patch-parallel-sizewith another parallel method to lower VAE decode memory at high resolutions. - To trade quality for speed, FP8 / INT8 quantization is available for Qwen-Image and Qwen-Image-2512.
Examples¶
1) Sequence parallelism only:
python3 ./examples/offline_inference/text_to_image/text_to_image.py \
--model Qwen/Qwen-Image \
--prompt "a cup of coffee on the table" \
--ulysses-degree 4
2) Cache only (single GPU):
python3 ./examples/offline_inference/text_to_image/text_to_image.py \
--model Qwen/Qwen-Image \
--prompt "a cup of coffee on the table" \
--cache-backend cache_dit
3) Cache + SP (recommended for long sequence generation):
python3 ./examples/offline_inference/text_to_image/text_to_image.py \
--model Qwen/Qwen-Image \
--prompt "a cup of coffee on the table" \
--cache-backend cache_dit \
--ulysses-degree 4
4) SP + VAE patch parallel (high-resolution, VRAM-constrained):
python3 ./examples/offline_inference/text_to_image/text_to_image.py \
--model Qwen/Qwen-Image \
--prompt "a cup of coffee on the table" \
--height 1536 --width 1536 \
--ulysses-degree 2 \
--vae-patch-parallel-size 2
5) Image editing: cache + CFG parallel:
python3 ./examples/offline_inference/image_to_image/image_edit.py \
--model Qwen/Qwen-Image-Edit \
--image qwen_bear.png \
--prompt "Edit description" \
--cache-backend cache_dit \
--cfg-parallel-size 2 \
--num-inference-steps 50 \
--cfg-scale 4.0
6) CPU offload (add when OOM):
python3 ./examples/offline_inference/text_to_image/text_to_image.py \
--model Qwen/Qwen-Image \
--prompt "a cup of coffee on the table" \
--enable-cpu-offload
7) Quantization + SP: