Skip to content

Image-To-Image

Source https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/image_to_image.

This example demonstrates how to deploy Qwen-Image-Edit model for online image editing service using vLLM-Omni.

For multi-image input editing, use Qwen-Image-Edit-2509 (QwenImageEditPlusPipeline) and send multiple images in the user message content.

Start Server

Basic Start

vllm serve Qwen/Qwen-Image-Edit --omni --port 8092

Note

If you encounter Out-of-Memory (OOM) issues or have limited GPU memory, you can enable VAE slicing and tiling to reduce memory usage, --vae-use-slicing --vae-use-tiling

Multi-Image Edit (Qwen-Image-Edit-2509)

vllm serve Qwen/Qwen-Image-Edit-2509 --omni --port 8092

Start with Parameters

Or use the startup script:

bash run_server.sh

To serve Qwen-Image-Edit-2509 with the script:

MODEL=Qwen/Qwen-Image-Edit-2509 bash run_server.sh

API Calls

Method 1: Using curl (Image Editing)

# Image editing
bash run_curl_image_edit.sh input.png "Convert this image to watercolor style"

# Or execute directly
IMG_B64=$(base64 -w0 input.png)

cat <<EOF > request.json
{
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "Convert this image to watercolor style"},
      {"type": "image_url", "image_url": {"url": "data:image/png;base64,$IMG_B64"}}
    ]
  }],
  "extra_body": {
    "height": 1024,
    "width": 1024,
    "num_inference_steps": 50,
    "guidance_scale": 1,
    "seed": 42
  }
}
EOF

curl -s http://localhost:8092/v1/chat/completions   -H "Content-Type: application/json"   -d @request.json | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2 | base64 -d > output.png

Method 2: Using OpenAI Python SDK

import base64
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8092/v1", api_key="none")

with open("input.png", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="Qwen/Qwen-Image-Edit",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Convert to watercolor style"},
            {"type": "image_url", "image_url": {
                "url": f"data:image/png;base64,{img_b64}"
            }},
        ],
    }],
    extra_body={
        "num_inference_steps": 50,
        "guidance_scale": 1,
        "seed": 42,
    },
)

img_url = response.choices[0].message.content[0].image_url.url
_, b64_data = img_url.split(",", 1)
with open("output.png", "wb") as f:
    f.write(base64.b64decode(b64_data))

Note

The OpenAI SDK's extra_body keyword argument merges parameters into the top-level request body automatically. When using curl or Python requests, wrap generation parameters inside a literal "extra_body" key in the JSON instead (as shown in the curl example above).

Method 3: Using Python Client Script

python openai_chat_client.py --input input.png --prompt "Convert to oil painting style" --output output.png

# Multi-image editing (Qwen-Image-Edit-2509 server required)
python openai_chat_client.py --input input1.png input2.png --prompt "Combine these images into a single scene" --output output.png

Method 4: Using Gradio Demo

python gradio_demo.py
# Visit http://localhost:7861

Request Format

Image Editing (Using image_url Format)

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Convert this image to watercolor style"},
        {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
      ]
    }
  ]
}

Image Editing (Using Simplified image Format)

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"text": "Convert this image to watercolor style"},
        {"image": "BASE64_IMAGE_DATA"}
      ]
    }
  ]
}

Image Editing with Parameters

Use extra_body to pass generation parameters:

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Convert to ink wash painting style"},
        {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
      ]
    }
  ],
  "extra_body": {
    "height": 1024,
    "width": 1024,
    "num_inference_steps": 50,
    "guidance_scale": 7.5,
    "seed": 42
  }
}

Layered Image Generation (Qwen-Image-Layered)

Qwen-Image-Layered generates multiple decomposed layers from a reference image and a text prompt. Start the server with:

vllm serve Qwen/Qwen-Image-Layered --omni --port 8093

Using curl

IMG_B64=$(base64 -w0 input.png)

curl -sS http://localhost:8093/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d "$(jq -n --arg img "$IMG_B64" '{
    messages: [{
      role: "user",
      content: [
        {type: "image_url", image_url: {url: ("data:image/png;base64," + $img)}},
        {type: "text", text: "a rabbit"}
      ]
    }],
    extra_body: {
      num_inference_steps: 50,
      cfg_scale: 4.0,
      seed: 0,
      layers: 4,
      resolution: 640
    }
  }')" \
  | jq -r '.choices[0].message.content[] | .image_url.url | split(",")[1]' \
  | while IFS= read -r b64; do
      ((i++)); echo "$b64" | base64 -d > "layer_${i}.png"
    done

Using Python

import base64
import requests

with open("input.png", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode()

payload = {
    "messages": [{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {
                "url": f"data:image/png;base64,{img_b64}"
            }},
            {"type": "text", "text": "a rabbit"},
        ],
    }],
    "extra_body": {
        "num_inference_steps": 50,
        "cfg_scale": 4.0,
        "seed": 0,
        "layers": 4,
        "resolution": 640,
    },
}

resp = requests.post(
    "http://localhost:8093/v1/chat/completions",
    json=payload,
    timeout=600,
)
data = resp.json()

for i, item in enumerate(data["choices"][0]["message"]["content"]):
    _, b64_data = item["image_url"]["url"].split(",", 1)
    with open(f"layer_{i}.png", "wb") as f:
        f.write(base64.b64decode(b64_data))

The response contains multiple images in choices[0].message.content — one per generated layer.

Qwen-Image-Layered Parameters

Parameter Type Default Description
layers int 4 Number of layers to decompose
resolution int 640 Resolution for dimension calculation (640 or 1024)
cfg_scale float 4.0 Classifier-free guidance scale (alias for true_cfg_scale)
num_inference_steps int 50 Number of denoising steps
seed int None Random seed for reproducibility

Multi-Image Editing (Qwen-Image-Edit-2509)

Provide multiple images in content (order matters):

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Combine these images into a single scene"},
        {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."} },
        {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."} }
      ]
    }
  ]
}

Generation Parameters

When using /v1/chat/completions, pass these inside extra_body in the curl JSON, or via the extra_body keyword argument in the OpenAI Python SDK. When using the dedicated /v1/images/edits endpoint, pass the supported generation controls as top-level form fields directly. For image dimensions and count, use size and n rather than height, width, or num_outputs_per_prompt.

Parameter Type Default Description
height int None Output image height in pixels
width int None Output image width in pixels
size str None Output image size (e.g., "1024x1024")
num_inference_steps int 50 Number of denoising steps
guidance_scale float 1.0 CFG guidance scale
seed int None Random seed (reproducible)
negative_prompt str None Negative prompt
num_outputs_per_prompt int 1 Number of images to generate
strength float 0.6 Z-Image only - Denoising start timestep for I2I. Range: [0.0, 1.0]. Lower preserves more of original image.
layers int 4 Number of layers (Qwen-Image-Layered)
resolution int 640 Resolution, 640 or 1024 (Qwen-Image-Layered)

Response Format

{
  "id": "chatcmpl-xxx",
  "created": 1234567890,
  "model": "Qwen/Qwen-Image-Edit",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": [{
        "type": "image_url",
        "image_url": {
          "url": "data:image/png;base64,..."
        }
      }]
    },
    "finish_reason": "stop"
  }],
  "usage": {...}
}

Common Editing Instructions Examples

Instruction Description
Convert this image to watercolor style Style transfer
Convert the image to black and white Desaturation
Enhance the color saturation Color adjustment
Convert to cartoon style Cartoonization
Add vintage filter effect Filter effect
Convert daytime scene to nighttime Scene conversion

File Description

File Description
run_server.sh Server startup script
run_curl_image_edit.sh curl image editing example
openai_chat_client.py Python client
gradio_demo.py Gradio interactive interface

Example materials

gradio_demo.py
#!/usr/bin/env python3
"""
Qwen-Image-Edit Gradio Demo for online serving.

Usage:
    python gradio_demo.py [--server http://localhost:8092] [--port 7861]
"""

import argparse
import base64
from io import BytesIO

try:
    import gradio as gr
except ImportError:
    raise ImportError("gradio is required to run this demo. Install it with: pip install 'vllm-omni[demo]'") from None
import requests
from PIL import Image


def _pil_to_b64_png(img: Image.Image) -> str:
    buffer = BytesIO()
    img.save(buffer, format="PNG")
    return base64.b64encode(buffer.getvalue()).decode("utf-8")


def edit_image(
    input_image: Image.Image,
    extra_images: list[str] | None,
    prompt: str,
    steps: int,
    guidance_scale: float,
    seed: int | None,
    negative_prompt: str,
    server_url: str,
) -> Image.Image | None:
    """Edit an image using the chat completions API."""
    if input_image is None:
        raise gr.Error("Please upload an image first")

    images: list[Image.Image] = [input_image]
    if extra_images:
        for p in extra_images:
            try:
                images.append(Image.open(p).convert("RGB"))
            except Exception as e:
                raise gr.Error(f"Failed to open image: {p}. Error: {e}") from e

    # Build user message with text and image
    content: list[dict[str, object]] = [{"type": "text", "text": prompt}]
    for img in images:
        content.append({"type": "image_url", "image_url": {"url": f"data:image/png;base64,{_pil_to_b64_png(img)}"}})

    messages = [
        {
            "role": "user",
            "content": content,
        }
    ]

    # Build extra_body with generation parameters
    extra_body = {
        "num_inference_steps": steps,
        "guidance_scale": guidance_scale,
    }
    if seed is not None and seed >= 0:
        extra_body["seed"] = seed
    if negative_prompt:
        extra_body["negative_prompt"] = negative_prompt

    # Build request payload
    payload = {"messages": messages, "extra_body": extra_body}

    try:
        response = requests.post(
            f"{server_url}/v1/chat/completions",
            headers={"Content-Type": "application/json"},
            json=payload,
            timeout=300,
        )
        response.raise_for_status()
        data = response.json()

        content = data["choices"][0]["message"]["content"]
        if isinstance(content, list) and len(content) > 0:
            image_url = content[0].get("image_url", {}).get("url", "")
            if image_url.startswith("data:image"):
                _, b64_data = image_url.split(",", 1)
                image_bytes = base64.b64decode(b64_data)
                return Image.open(BytesIO(image_bytes))

        return None

    except Exception as e:
        print(f"Error: {e}")
        raise gr.Error(f"Edit failed: {e}")


def create_demo(server_url: str):
    """Create Gradio demo interface."""

    with gr.Blocks(title="Qwen-Image-Edit Demo") as demo:
        gr.Markdown("# Qwen-Image-Edit Online Editing")
        gr.Markdown(
            "Upload an image and describe the editing effect you want. "
            "For multi-image editing, upload extra images (requires Qwen-Image-Edit-2509 server)."
        )

        with gr.Row():
            with gr.Column(scale=1):
                input_image = gr.Image(
                    label="Input Image",
                    type="pil",
                )
                extra_images = gr.File(
                    label="Additional Images (Optional)",
                    file_count="multiple",
                    type="filepath",
                )
                prompt = gr.Textbox(
                    label="Edit Instruction",
                    placeholder="Describe the editing effect you want...",
                    lines=2,
                )
                negative_prompt = gr.Textbox(
                    label="Negative Prompt",
                    placeholder="Describe what you don't want...",
                    lines=2,
                )

                with gr.Row():
                    steps = gr.Slider(
                        label="Inference Steps",
                        minimum=10,
                        maximum=100,
                        value=50,
                        step=5,
                    )
                    guidance_scale = gr.Slider(
                        label="Guidance Scale (CFG)",
                        minimum=1.0,
                        maximum=20.0,
                        value=7.5,
                        step=0.5,
                    )

                with gr.Row():
                    seed = gr.Number(
                        label="Random Seed (-1 for random)",
                        value=-1,
                        precision=0,
                    )

                edit_btn = gr.Button("Edit Image", variant="primary")

            with gr.Column(scale=1):
                output_image = gr.Image(
                    label="Edited Image",
                    type="pil",
                )

        # Examples
        gr.Examples(
            examples=[
                ["Convert this image to watercolor style"],
                ["Convert the image to black and white"],
                ["Enhance the color saturation"],
                ["Convert to cartoon style"],
                ["Add vintage filter effect"],
                ["Convert daytime to nighttime"],
                ["Convert to oil painting style"],
                ["Add dreamy blur effect"],
            ],
            inputs=[prompt],
        )

        def process_edit(img, imgs, p, st, g, se, n):
            actual_seed = se if se >= 0 else None
            return edit_image(img, imgs, p, st, g, actual_seed, n, server_url)

        edit_btn.click(
            fn=process_edit,
            inputs=[input_image, extra_images, prompt, steps, guidance_scale, seed, negative_prompt],
            outputs=[output_image],
        )

    return demo


def main():
    parser = argparse.ArgumentParser(description="Qwen-Image-Edit Gradio Demo")
    parser.add_argument("--server", default="http://localhost:8092", help="Server URL")
    parser.add_argument("--port", type=int, default=7861, help="Gradio port")
    parser.add_argument("--share", action="store_true", help="Create public link")

    args = parser.parse_args()

    print(f"Connecting to server: {args.server}")
    demo = create_demo(args.server)
    demo.launch(server_port=args.port, share=args.share)


if __name__ == "__main__":
    main()
openai_chat_client.py
#!/usr/bin/env python3
"""
Qwen-Image-Edit OpenAI-compatible chat client for image editing.

Usage:
    python openai_chat_client.py --input qwen_image_output.png --prompt "Convert to watercolor style" --output output.png
    python openai_chat_client.py --input input.png --prompt "Convert to oil painting" --seed 42
    python openai_chat_client.py --input input1.png input2.png --prompt "Combine these images into a single scene"
"""

import argparse
import base64
from io import BytesIO
from pathlib import Path

import requests
from PIL import Image


def _encode_image_as_data_url(input_path: Path) -> str:
    image_bytes = input_path.read_bytes()
    try:
        img = Image.open(BytesIO(image_bytes))
        mime_type = f"image/{img.format.lower()}" if img.format else "image/png"
    except Exception:
        mime_type = "image/png"
    image_b64 = base64.b64encode(image_bytes).decode("utf-8")
    return f"data:{mime_type};base64,{image_b64}"


def edit_image(
    input_image: str | Path | list[str | Path],
    prompt: str,
    server_url: str = "http://localhost:8092",
    height: int | None = None,
    width: int | None = None,
    steps: int | None = None,
    guidance_scale: float | None = None,
    seed: int | None = None,
    negative_prompt: str | None = None,
) -> bytes | None:
    """Edit an image using the chat completions API.

    Args:
        input_image: Path(s) to input image(s). For multi-image editing, pass multiple paths.
        prompt: Text description of the edit
        server_url: Server URL
        height: Output image height in pixels
        width: Output image width in pixels
        steps: Number of inference steps
        guidance_scale: CFG guidance scale
        seed: Random seed
        negative_prompt: Negative prompt

    Returns:
        Edited image bytes or None if failed
    """
    input_images = input_image if isinstance(input_image, list) else [input_image]
    input_paths = [Path(p) for p in input_images]
    for p in input_paths:
        if not p.exists():
            print(f"Error: Input image not found: {p}")
            return None

    # Build user message with text and image
    content: list[dict[str, object]] = [{"type": "text", "text": prompt}]
    for p in input_paths:
        content.append({"type": "image_url", "image_url": {"url": _encode_image_as_data_url(p)}})

    messages = [
        {
            "role": "user",
            "content": content,
        }
    ]

    # Build extra_body with generation parameters
    extra_body = {}
    if height is not None:
        extra_body["height"] = height
    if width is not None:
        extra_body["width"] = width
    if steps is not None:
        extra_body["num_inference_steps"] = steps
    if guidance_scale is not None:
        extra_body["guidance_scale"] = guidance_scale
    if seed is not None:
        extra_body["seed"] = seed
    if negative_prompt:
        extra_body["negative_prompt"] = negative_prompt

    # Build request payload
    payload = {"messages": messages}
    if extra_body:
        payload["extra_body"] = extra_body

    # Send request
    try:
        response = requests.post(
            f"{server_url}/v1/chat/completions",
            headers={"Content-Type": "application/json"},
            json=payload,
            timeout=300,
        )
        response.raise_for_status()
        data = response.json()

        # Extract image from response
        content = data["choices"][0]["message"]["content"]
        if isinstance(content, list) and len(content) > 0:
            image_url = content[0].get("image_url", {}).get("url", "")
            if image_url.startswith("data:image"):
                _, b64_data = image_url.split(",", 1)
                return base64.b64decode(b64_data)

        print(f"Unexpected response format: {content}")
        return None

    except Exception as e:
        print(f"Error: {e}")
        return None


def main():
    parser = argparse.ArgumentParser(description="Qwen-Image-Edit chat client")
    parser.add_argument("--input", "-i", required=True, nargs="+", help="Input image path(s)")
    parser.add_argument("--prompt", "-p", required=True, help="Edit prompt")
    parser.add_argument("--output", "-o", default="output.png", help="Output file")
    parser.add_argument("--server", "-s", default="http://localhost:8092", help="Server URL")
    parser.add_argument("--height", type=int, default=1024, help="Output image height")
    parser.add_argument("--width", type=int, default=1024, help="Output image width")
    parser.add_argument("--steps", type=int, default=50, help="Inference steps")
    parser.add_argument("--guidance", type=float, default=7.5, help="Guidance scale")
    parser.add_argument("--seed", type=int, default=0, help="Random seed")
    parser.add_argument("--negative", help="Negative prompt")

    args = parser.parse_args()

    if len(args.input) == 1:
        print(f"Input: {args.input[0]}")
    else:
        print(f"Inputs ({len(args.input)}): {', '.join(args.input)}")
    print(f"Prompt: {args.prompt}")

    image_bytes = edit_image(
        input_image=args.input,
        prompt=args.prompt,
        server_url=args.server,
        height=args.height,
        width=args.width,
        steps=args.steps,
        guidance_scale=args.guidance,
        seed=args.seed,
        negative_prompt=args.negative,
    )

    if image_bytes:
        output_path = Path(args.output)
        output_path.write_bytes(image_bytes)
        print(f"Image saved to: {output_path}")
        print(f"Size: {len(image_bytes) / 1024:.1f} KB")
    else:
        print("Failed to edit image")
        exit(1)


if __name__ == "__main__":
    main()
run_curl_image_edit.sh
#!/bin/bash
# Qwen-Image image-edit (image-to-image) curl example

set -euo pipefail

if [[ $# -lt 2 ]]; then
  echo "Usage: $0 <input_image> \"<edit_prompt>\" [output_file]" >&2
  exit 1
fi

INPUT_IMG=$1
PROMPT=$2
SERVER="${SERVER:-http://localhost:8092}"
CURRENT_TIME=$(date +%Y%m%d%H%M%S)
OUTPUT="${3:-image_edit_${CURRENT_TIME}.png}"

if [[ ! -f "$INPUT_IMG" ]]; then
  echo "Input image not found: $INPUT_IMG" >&2
  exit 1
fi

REQUEST_JSON_FILE=$(mktemp)
trap 'rm -f "$REQUEST_JSON_FILE"' EXIT

# Pipe base64 into jq via stdin to avoid ARG_MAX limit on large images
base64 -w0 "$INPUT_IMG" \
  | jq -Rs --arg prompt "$PROMPT" '{
    messages: [{
      role: "user",
      content: [
        {"type": "text", "text": $prompt},
        {"type": "image_url", "image_url": {"url": ("data:image/png;base64," + .)}}
      ]
    }],
    extra_body: {
      num_inference_steps: 50,
      guidance_scale: 1,
      seed: 42
    }
  }' > "$REQUEST_JSON_FILE"

echo "Generating edited image..."
echo "Server: $SERVER"
echo "Prompt: $PROMPT"
echo "Input : $INPUT_IMG"
echo "Output: $OUTPUT"

curl -s "$SERVER/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d @"$REQUEST_JSON_FILE" \
  | jq -r '.choices[0].message.content[0].image_url.url' \
  | cut -d',' -f2 \
  | base64 -d > "$OUTPUT"

if [[ -f "$OUTPUT" ]]; then
  echo "Image saved to: $OUTPUT"
  echo "Size: $(du -h "$OUTPUT" | cut -f1)"
else
  echo "Failed to generate image"
  exit 1
fi
run_server.sh
#!/bin/bash
# Qwen-Image-Edit online serving startup script

MODEL="${MODEL:-Qwen/Qwen-Image-Edit}"
PORT="${PORT:-8092}"

echo "Starting Qwen-Image-Edit server..."
echo "Model: $MODEL"
echo "Port: $PORT"

vllm serve "$MODEL" --omni \
    --port "$PORT"