Image-To-Image¶

Source https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/image_to_image.

This example demonstrates how to deploy image-to-image models for online image editing service using vLLM-Omni.

Supported models include Qwen-Image-Edit, BAGEL, and other image-to-image pipelines.

For multi-image input editing, use Qwen-Image-Edit-2509 (QwenImageEditPlusPipeline) and send multiple images in the user message content.

Start Server¶

Basic Start¶

vllm serve Qwen/Qwen-Image-Edit --omni --port 8092

Note

If you encounter Out-of-Memory (OOM) issues or have limited GPU memory, you can enable VAE slicing and tiling to reduce memory usage, --vae-use-slicing --vae-use-tiling

Multi-Image Edit (Qwen-Image-Edit-2509)¶

vllm serve Qwen/Qwen-Image-Edit-2509 --omni --port 8092

BAGEL¶

vllm serve ByteDance-Seed/BAGEL-7B-MoT --omni --port 8091

Start with Parameters¶

Or use the startup script:

bash run_server.sh

To serve Qwen-Image-Edit-2509 with the script:

MODEL=Qwen/Qwen-Image-Edit-2509 bash run_server.sh

API Calls¶

Method 1: Using curl (Image Editing)¶

# Image editing
bash run_curl_image_edit.sh input.png "Convert this image to watercolor style"

# Or execute directly
IMG_B64=$(base64 -w0 input.png)

cat <<EOF > request.json
{
  "messages": [{
    "role": "user",
    "content": [
      {"type": "text", "text": "Convert this image to watercolor style"},
      {"type": "image_url", "image_url": {"url": "data:image/png;base64,$IMG_B64"}}
    ]
  }],
  "extra_body": {
    "height": 1024,
    "width": 1024,
    "num_inference_steps": 50,
    "guidance_scale": 1,
    "seed": 42
  }
}
EOF

curl -s http://localhost:8092/v1/chat/completions   -H "Content-Type: application/json"   -d @request.json | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2 | base64 -d > output.png

Method 2: Using OpenAI Python SDK¶

import base64
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8092/v1", api_key="none")

with open("input.png", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode()

response = client.chat.completions.create(
    model="Qwen/Qwen-Image-Edit",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "Convert to watercolor style"},
            {"type": "image_url", "image_url": {
                "url": f"data:image/png;base64,{img_b64}"
            }},
        ],
    }],
    extra_body={
        "num_inference_steps": 50,
        "guidance_scale": 1,
        "seed": 42,
    },
)

img_url = response.choices[0].message.content[0].image_url.url
_, b64_data = img_url.split(",", 1)
with open("output.png", "wb") as f:
    f.write(base64.b64decode(b64_data))

Note

The OpenAI SDK's extra_body keyword argument merges parameters into the top-level request body automatically. When using curl or Python requests, wrap generation parameters inside a literal "extra_body" key in the JSON instead (as shown in the curl example above).

Method 3: Using Python Client Script¶

python openai_chat_client.py --input input.png --prompt "Convert to oil painting style" --output output.png

# Multi-image editing (Qwen-Image-Edit-2509 server required)
python openai_chat_client.py --input input1.png input2.png --prompt "Combine these images into a single scene" --output output.png

Pass model-specific parameters through --extra-body (e.g. for BAGEL):

python openai_chat_client.py \
  --input input.png \
  --prompt "Make the scene look like a watercolor painting" \
  --server http://localhost:8091 \
  --extra-body '{"cfg_text_scale": 4.0, "cfg_img_scale": 1.5}'

Method 4: Using Gradio Demo¶

python gradio_demo.py
# Visit http://localhost:7861

Request Format¶

Image Editing (Using image_url Format)¶

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Convert this image to watercolor style"},
        {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
      ]
    }
  ]
}

Image Editing (Using Simplified image Format)¶

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"text": "Convert this image to watercolor style"},
        {"image": "BASE64_IMAGE_DATA"}
      ]
    }
  ]
}

Image Editing with Parameters¶

Use extra_body to pass generation parameters:

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Convert to ink wash painting style"},
        {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
      ]
    }
  ],
  "extra_body": {
    "height": 1024,
    "width": 1024,
    "num_inference_steps": 50,
    "guidance_scale": 7.5,
    "seed": 42
  }
}

Layered Image Generation (Qwen-Image-Layered)¶

Qwen-Image-Layered generates multiple decomposed layers from a reference image and a text prompt. Start the server with:

vllm serve Qwen/Qwen-Image-Layered --omni --port 8093

Using curl

IMG_B64=$(base64 -w0 input.png)

curl -sS http://localhost:8093/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d "$(jq -n --arg img "$IMG_B64" '{
    messages: [{
      role: "user",
      content: [
        {type: "image_url", image_url: {url: ("data:image/png;base64," + $img)}},
        {type: "text", text: "a rabbit"}
      ]
    }],
    extra_body: {
      num_inference_steps: 50,
      cfg_scale: 4.0,
      seed: 0,
      layers: 4,
      resolution: 640
    }
  }')" \
  | jq -r '.choices[0].message.content[] | .image_url.url | split(",")[1]' \
  | while IFS= read -r b64; do
      ((i++)); echo "$b64" | base64 -d > "layer_${i}.png"
    done

Using Python

import base64
import requests

with open("input.png", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode()

payload = {
    "messages": [{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {
                "url": f"data:image/png;base64,{img_b64}"
            }},
            {"type": "text", "text": "a rabbit"},
        ],
    }],
    "extra_body": {
        "num_inference_steps": 50,
        "cfg_scale": 4.0,
        "seed": 0,
        "layers": 4,
        "resolution": 640,
    },
}

resp = requests.post(
    "http://localhost:8093/v1/chat/completions",
    json=payload,
    timeout=600,
)
data = resp.json()

for i, item in enumerate(data["choices"][0]["message"]["content"]):
    _, b64_data = item["image_url"]["url"].split(",", 1)
    with open(f"layer_{i}.png", "wb") as f:
        f.write(base64.b64decode(b64_data))

The response contains multiple images in choices[0].message.content — one per generated layer.

Qwen-Image-Layered Parameters¶

Parameter	Type	Default	Description
`layers`	int	4	Number of layers to decompose
`resolution`	int	640	Resolution for dimension calculation (640 or 1024)
`cfg_scale`	float	4.0	Classifier-free guidance scale (alias for `true_cfg_scale`)
`num_inference_steps`	int	50	Number of denoising steps
`seed`	int	None	Random seed for reproducibility

Multi-Image Editing (Qwen-Image-Edit-2509)¶

Provide multiple images in content (order matters):

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "Combine these images into a single scene"},
        {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."} },
        {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."} }
      ]
    }
  ]
}

Generation Parameters¶

When using /v1/chat/completions, pass these inside extra_body in the curl JSON, or via the extra_body keyword argument in the OpenAI Python SDK. When using the dedicated /v1/images/edits endpoint, pass the supported generation controls as top-level form fields directly. For image dimensions and count, use size and n rather than height, width, or num_outputs_per_prompt.

Parameter	Type	Default	Description
`height`	int	None	Output image height in pixels
`width`	int	None	Output image width in pixels
`size`	str	None	Output image size (e.g., "1024x1024")
`num_inference_steps`	int	50	Number of denoising steps
`guidance_scale`	float	1.0	CFG guidance scale
`seed`	int	None	Random seed (reproducible)
`negative_prompt`	str	None	Negative prompt
`num_outputs_per_prompt`	int	1	Number of images to generate
`strength`	float	0.6	Z-Image only - Denoising start timestep for I2I. Range: [0.0, 1.0]. Lower preserves more of original image.
`layers`	int	4	Number of layers (Qwen-Image-Layered)
`resolution`	int	640	Resolution, 640 or 1024 (Qwen-Image-Layered)

Models like BAGEL accept additional parameters via extra_body (e.g. cfg_text_scale, cfg_img_scale). See the BAGEL recipe for the full list.

Response Format¶

{
  "id": "chatcmpl-xxx",
  "created": 1234567890,
  "model": "Qwen/Qwen-Image-Edit",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": [{
        "type": "image_url",
        "image_url": {
          "url": "data:image/png;base64,..."
        }
      }]
    },
    "finish_reason": "stop"
  }],
  "usage": {...}
}

Common Editing Instructions Examples¶

Instruction	Description
`Convert this image to watercolor style`	Style transfer
`Convert the image to black and white`	Desaturation
`Enhance the color saturation`	Color adjustment
`Convert to cartoon style`	Cartoonization
`Add vintage filter effect`	Filter effect
`Convert daytime scene to nighttime`	Scene conversion

File Description¶

File	Description
`run_server.sh`	Server startup script
`run_curl_image_edit.sh`	curl image editing example
`openai_chat_client.py`	Python client
`gradio_demo.py`	Gradio interactive interface

Example materials¶

gradio_demo.py

#!/usr/bin/env python3
"""
Qwen-Image-Edit Gradio Demo for online serving.

Usage:
    python gradio_demo.py [--server http://localhost:8092] [--port 7861]
"""

import argparse
import base64
from io import BytesIO

try:
    import gradio as gr
except ImportError:
    raise ImportError("gradio is required to run this demo. Install it with: pip install 'vllm-omni[demo]'") from None
import requests
from PIL import Image


def _pil_to_b64_png(img: Image.Image) -> str:
    buffer = BytesIO()
    img.save(buffer, format="PNG")
    return base64.b64encode(buffer.getvalue()).decode("utf-8")


def edit_image(
    input_image: Image.Image,
    extra_images: list[str] | None,
    prompt: str,
    steps: int,
    guidance_scale: float,
    seed: int | None,
    negative_prompt: str,
    server_url: str,
) -> Image.Image | None:
    """Edit an image using the chat completions API."""
    if input_image is None:
        raise gr.Error("Please upload an image first")

    images: list[Image.Image] = [input_image]
    if extra_images:
        for p in extra_images:
            try:
                images.append(Image.open(p).convert("RGB"))
            except Exception as e:
                raise gr.Error(f"Failed to open image: {p}. Error: {e}") from e

    # Build user message with text and image
    content: list[dict[str, object]] = [{"type": "text", "text": prompt}]
    for img in images:
        content.append({"type": "image_url", "image_url": {"url": f"data:image/png;base64,{_pil_to_b64_png(img)}"}})

    messages = [
        {
            "role": "user",
            "content": content,
        }
    ]

    # Build extra_body with generation parameters
    extra_body = {
        "num_inference_steps": steps,
        "guidance_scale": guidance_scale,
    }
    if seed is not None and seed >= 0:
        extra_body["seed"] = seed
    if negative_prompt:
        extra_body["negative_prompt"] = negative_prompt

    # Build request payload
    payload = {"messages": messages, "extra_body": extra_body}

    try:
        response = requests.post(
            f"{server_url}/v1/chat/completions",
            headers={"Content-Type": "application/json"},
            json=payload,
            timeout=300,
        )
        response.raise_for_status()
        data = response.json()

        content = data["choices"][0]["message"]["content"]
        if isinstance(content, list) and len(content) > 0:
            image_url = content[0].get("image_url", {}).get("url", "")
            if image_url.startswith("data:image"):
                _, b64_data = image_url.split(",", 1)
                image_bytes = base64.b64decode(b64_data)
                return Image.open(BytesIO(image_bytes))

        return None

    except Exception as e:
        print(f"Error: {e}")
        raise gr.Error(f"Edit failed: {e}")


def create_demo(server_url: str):
    """Create Gradio demo interface."""

    with gr.Blocks(title="Qwen-Image-Edit Demo") as demo:
        gr.Markdown("# Qwen-Image-Edit Online Editing")
        gr.Markdown(
            "Upload an image and describe the editing effect you want. "
            "For multi-image editing, upload extra images (requires Qwen-Image-Edit-2509 server)."
        )

        with gr.Row():
            with gr.Column(scale=1):
                input_image = gr.Image(
                    label="Input Image",
                    type="pil",
                )
                extra_images = gr.File(
                    label="Additional Images (Optional)",
                    file_count="multiple",
                    type="filepath",
                )
                prompt = gr.Textbox(
                    label="Edit Instruction",
                    placeholder="Describe the editing effect you want...",
                    lines=2,
                )
                negative_prompt = gr.Textbox(
                    label="Negative Prompt",
                    placeholder="Describe what you don't want...",
                    lines=2,
                )

                with gr.Row():
                    steps = gr.Slider(
                        label="Inference Steps",
                        minimum=10,
                        maximum=100,
                        value=50,
                        step=5,
                    )
                    guidance_scale = gr.Slider(
                        label="Guidance Scale (CFG)",
                        minimum=1.0,
                        maximum=20.0,
                        value=7.5,
                        step=0.5,
                    )

                with gr.Row():
                    seed = gr.Number(
                        label="Random Seed (-1 for random)",
                        value=-1,
                        precision=0,
                    )

                edit_btn = gr.Button("Edit Image", variant="primary")

            with gr.Column(scale=1):
                output_image = gr.Image(
                    label="Edited Image",
                    type="pil",
                )

        # Examples
        gr.Examples(
            examples=[
                ["Convert this image to watercolor style"],
                ["Convert the image to black and white"],
                ["Enhance the color saturation"],
                ["Convert to cartoon style"],
                ["Add vintage filter effect"],
                ["Convert daytime to nighttime"],
                ["Convert to oil painting style"],
                ["Add dreamy blur effect"],
            ],
            inputs=[prompt],
        )

        def process_edit(img, imgs, p, st, g, se, n):
            actual_seed = se if se >= 0 else None
            return edit_image(img, imgs, p, st, g, actual_seed, n, server_url)

        edit_btn.click(
            fn=process_edit,
            inputs=[input_image, extra_images, prompt, steps, guidance_scale, seed, negative_prompt],
            outputs=[output_image],
        )

    return demo


def main():
    parser = argparse.ArgumentParser(description="Qwen-Image-Edit Gradio Demo")
    parser.add_argument("--server", default="http://localhost:8092", help="Server URL")
    parser.add_argument("--port", type=int, default=7861, help="Gradio port")
    parser.add_argument("--share", action="store_true", help="Create public link")

    args = parser.parse_args()

    print(f"Connecting to server: {args.server}")
    demo = create_demo(args.server)
    demo.launch(server_port=args.port, share=args.share)


if __name__ == "__main__":
    main()

openai_chat_client.py

#!/usr/bin/env python3
"""
Qwen-Image-Edit OpenAI-compatible chat client for image editing.

Usage:
    python openai_chat_client.py --input qwen_image_output.png --prompt "Convert to watercolor style" --output output.png
    python openai_chat_client.py --input input.png --prompt "Convert to oil painting" --seed 42
    python openai_chat_client.py --input input1.png input2.png --prompt "Combine these images into a single scene"
"""

import argparse
import base64
from io import BytesIO
from pathlib import Path

import requests
from PIL import Image


def _encode_image_as_data_url(input_path: Path) -> str:
    image_bytes = input_path.read_bytes()
    try:
        img = Image.open(BytesIO(image_bytes))
        mime_type = f"image/{img.format.lower()}" if img.format else "image/png"
    except Exception:
        mime_type = "image/png"
    image_b64 = base64.b64encode(image_bytes).decode("utf-8")
    return f"data:{mime_type};base64,{image_b64}"


def edit_image(
    input_image: str | Path | list[str | Path],
    prompt: str,
    server_url: str = "http://localhost:8092",
    height: int | None = None,
    width: int | None = None,
    steps: int | None = None,
    guidance_scale: float | None = None,
    seed: int | None = None,
    negative_prompt: str | None = None,
    extra_body: dict | None = None,
) -> bytes | None:
    """Edit an image using the chat completions API.

    Args:
        input_image: Path(s) to input image(s). For multi-image editing, pass multiple paths.
        prompt: Text description of the edit
        server_url: Server URL
        height: Output image height in pixels
        width: Output image width in pixels
        steps: Number of inference steps
        guidance_scale: CFG guidance scale
        seed: Random seed
        negative_prompt: Negative prompt
        extra_body: Additional model-specific params (e.g. cfg_text_scale for BAGEL)

    Returns:
        Edited image bytes or None if failed
    """
    input_images = input_image if isinstance(input_image, list) else [input_image]
    input_paths = [Path(p) for p in input_images]
    for p in input_paths:
        if not p.exists():
            print(f"Error: Input image not found: {p}")
            return None

    # Build user message with text and image
    content: list[dict[str, object]] = [{"type": "text", "text": prompt}]
    for p in input_paths:
        content.append({"type": "image_url", "image_url": {"url": _encode_image_as_data_url(p)}})

    messages = [
        {
            "role": "user",
            "content": content,
        }
    ]

    # Build extra_body with generation parameters
    merged_extra_body: dict[str, object] = {}
    if height is not None:
        merged_extra_body["height"] = height
    if width is not None:
        merged_extra_body["width"] = width
    if steps is not None:
        merged_extra_body["num_inference_steps"] = steps
    if guidance_scale is not None:
        merged_extra_body["guidance_scale"] = guidance_scale
    if seed is not None:
        merged_extra_body["seed"] = seed
    if negative_prompt:
        merged_extra_body["negative_prompt"] = negative_prompt
    if extra_body:
        merged_extra_body.update(extra_body)

    # Build request payload
    payload: dict[str, object] = {"messages": messages}
    if merged_extra_body:
        payload["extra_body"] = merged_extra_body

    # Send request
    try:
        response = requests.post(
            f"{server_url}/v1/chat/completions",
            headers={"Content-Type": "application/json"},
            json=payload,
            timeout=300,
        )
        response.raise_for_status()
        data = response.json()

        # Extract image from response
        content = data["choices"][0]["message"]["content"]
        if isinstance(content, list) and len(content) > 0:
            image_url = content[0].get("image_url", {}).get("url", "")
            if image_url.startswith("data:image"):
                _, b64_data = image_url.split(",", 1)
                return base64.b64decode(b64_data)

        print(f"Unexpected response format: {content}")
        return None

    except Exception as e:
        print(f"Error: {e}")
        return None


def parse_extra_body(value: str) -> dict:
    """Parse a JSON string into a dict for --extra-body."""
    import json

    try:
        obj = json.loads(value)
    except json.JSONDecodeError as e:
        raise argparse.ArgumentTypeError(f"--extra-body must be valid JSON: {e}") from e
    if not isinstance(obj, dict):
        raise argparse.ArgumentTypeError("--extra-body must be a JSON object")
    return obj


def main():
    parser = argparse.ArgumentParser(description="Qwen-Image-Edit chat client")
    parser.add_argument("--input", "-i", required=True, nargs="+", help="Input image path(s)")
    parser.add_argument("--prompt", "-p", required=True, help="Edit prompt")
    parser.add_argument("--output", "-o", default="output.png", help="Output file")
    parser.add_argument("--server", "-s", default="http://localhost:8092", help="Server URL")
    parser.add_argument("--height", type=int, default=1024, help="Output image height")
    parser.add_argument("--width", type=int, default=1024, help="Output image width")
    parser.add_argument("--steps", type=int, default=50, help="Inference steps")
    parser.add_argument("--guidance", type=float, default=7.5, help="Guidance scale")
    parser.add_argument("--seed", type=int, default=0, help="Random seed")
    parser.add_argument("--negative", help="Negative prompt")
    parser.add_argument(
        "--extra-body",
        type=parse_extra_body,
        default=None,
        help="JSON object merged into request extra_body, e.g. '{\"cfg_text_scale\": 4.0}'.",
    )

    args = parser.parse_args()

    if len(args.input) == 1:
        print(f"Input: {args.input[0]}")
    else:
        print(f"Inputs ({len(args.input)}): {', '.join(args.input)}")
    print(f"Prompt: {args.prompt}")

    image_bytes = edit_image(
        input_image=args.input,
        prompt=args.prompt,
        server_url=args.server,
        height=args.height,
        width=args.width,
        steps=args.steps,
        guidance_scale=args.guidance,
        seed=args.seed,
        negative_prompt=args.negative,
        extra_body=args.extra_body,
    )

    if image_bytes:
        output_path = Path(args.output)
        output_path.write_bytes(image_bytes)
        print(f"Image saved to: {output_path}")
        print(f"Size: {len(image_bytes) / 1024:.1f} KB")
    else:
        print("Failed to edit image")
        exit(1)


if __name__ == "__main__":
    main()

run_curl_image_edit.sh

#!/bin/bash
# Qwen-Image image-edit (image-to-image) curl example

set -euo pipefail

if [[ $# -lt 2 ]]; then
  echo "Usage: $0 <input_image> \"<edit_prompt>\" [output_file]" >&2
  exit 1
fi

INPUT_IMG=$1
PROMPT=$2
SERVER="${SERVER:-http://localhost:8092}"
CURRENT_TIME=$(date +%Y%m%d%H%M%S)
OUTPUT="${3:-image_edit_${CURRENT_TIME}.png}"

if [[ ! -f "$INPUT_IMG" ]]; then
  echo "Input image not found: $INPUT_IMG" >&2
  exit 1
fi

REQUEST_JSON_FILE=$(mktemp)
trap 'rm -f "$REQUEST_JSON_FILE"' EXIT

# Pipe base64 into jq via stdin to avoid ARG_MAX limit on large images
base64 -w0 "$INPUT_IMG" \
  | jq -Rs --arg prompt "$PROMPT" '{
    messages: [{
      role: "user",
      content: [
        {"type": "text", "text": $prompt},
        {"type": "image_url", "image_url": {"url": ("data:image/png;base64," + .)}}
      ]
    }],
    extra_body: {
      num_inference_steps: 50,
      guidance_scale: 1,
      seed: 42
    }
  }' > "$REQUEST_JSON_FILE"

echo "Generating edited image..."
echo "Server: $SERVER"
echo "Prompt: $PROMPT"
echo "Input : $INPUT_IMG"
echo "Output: $OUTPUT"

curl -s "$SERVER/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d @"$REQUEST_JSON_FILE" \
  | jq -r '.choices[0].message.content[0].image_url.url' \
  | cut -d',' -f2 \
  | base64 -d > "$OUTPUT"

if [[ -f "$OUTPUT" ]]; then
  echo "Image saved to: $OUTPUT"
  echo "Size: $(du -h "$OUTPUT" | cut -f1)"
else
  echo "Failed to generate image"
  exit 1
fi

run_server.sh

#!/bin/bash
# Qwen-Image-Edit online serving startup script

MODEL="${MODEL:-Qwen/Qwen-Image-Edit}"
PORT="${PORT:-8092}"

echo "Starting Qwen-Image-Edit server..."
echo "Model: $MODEL"
echo "Port: $PORT"

vllm serve "$MODEL" --omni \
    --port "$PORT"