Image-To-Image¶
Source https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/image_to_image.
This example edits an input image with Qwen/Qwen-Image-Edit using the image_edit.py CLI.
Local CLI Usage¶
Single Image Editing¶
Download the example image:
Then run:
python image_edit.py \
--image qwen-bear.png \
--prompt "Let this mascot dance under the moon, surrounded by floating stars and poetic bubbles such as 'Be Kind'" \
--output output_image_edit.png \
--num-inference-steps 50 \
--cfg-scale 4.0
Multiple Image Editing (Qwen-Image-Edit-2509)¶
For multiple image inputs, use Qwen/Qwen-Image-Edit-2509 or Qwen/Qwen-Image-Edit-2511:
python image_edit.py \
--model Qwen/Qwen-Image-Edit-2509 \
--image img1.png img2.png \
--prompt "Combine these images into a single scene" \
--output output_image_edit.png \
--num-inference-steps 50 \
--cfg-scale 4.0 \
--guidance-scale 1.0
Key arguments:
--model: model name or path. UseQwen/Qwen-Image-Edit-2509or later for multiple image support.--image: path(s) to the source image(s) (PNG/JPG, converted to RGB). Can specify multiple images.--prompt/--negative-prompt: text description (string).--cfg-scale: true classifier-free guidance scale (default: 4.0). Classifier-free guidance is enabled by setting cfg_scale > 1 and providing a negative_prompt. Higher guidance scale encourages images closely linked to the text prompt, usually at the expense of lower image quality.--guidance-scale: guidance scale for guidance-distilled models (default: 1.0, disabled). Unlike classifier-free guidance (--cfg-scale), guidance-distilled models take the guidance scale directly as an input parameter. Enabled when guidance_scale > 1. Ignored when not using guidance-distilled models.--num-inference-steps: diffusion sampling steps (more steps = higher quality, slower).--output: path to save the generated PNG.--vae-use-slicing: enable VAE slicing for memory optimization.--vae-use-tiling: enable VAE tiling for memory optimization.--cfg-parallel-size: set it to 2 to enable CFG Parallel. See more examples inuser_guide.--enable-cpu-offload: enable CPU offloading for diffusion models.--strength: Z-Image only - controls the denoising start timestep for I2I (default: 0.6). Range: [0.0, 1.0]. Lower values preserve more of the original image; higher values allow more creative changes.
ℹ️ If you encounter OOM errors, try using
--vae-use-slicingand--vae-use-tilingto reduce memory usage.
Example materials¶
image_edit.py
Large file omitted from the rendered docs. View it on GitHub: https://github.com/vllm-project/vllm-omni/blob/main/examples/offline_inference/image_to_image/image_edit.py.
run_qwen_image_edit_2511.sh
python image_edit.py \
--model Qwen/Qwen-Image-Edit-2511 \
--image qwen_bear.png \
--prompt "Add a white art board written with colorful text 'vLLM-Omni' on grassland. Add a paintbrush in the bear's hands. position the bear standing in front of the art board as if painting" \
--output output_image_edit.png \
--num-inference-steps 50 \
--cfg-scale 4.0 \
--cache-backend cache_dit \