Image-To-Image¶
Source https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/image_to_image.
This example demonstrates how to deploy Qwen-Image-Edit model for online image editing service using vLLM-Omni.
For multi-image input editing, use Qwen-Image-Edit-2509 (QwenImageEditPlusPipeline) and send multiple images in the user message content.
Start Server¶
Basic Start¶
Note
If you encounter Out-of-Memory (OOM) issues or have limited GPU memory, you can enable VAE slicing and tiling to reduce memory usage, --vae-use-slicing --vae-use-tiling
Multi-Image Edit (Qwen-Image-Edit-2509)¶
Start with Parameters¶
Or use the startup script:
To serve Qwen-Image-Edit-2509 with the script:
API Calls¶
Method 1: Using curl (Image Editing)¶
# Image editing
bash run_curl_image_edit.sh input.png "Convert this image to watercolor style"
# Or execute directly
IMG_B64=$(base64 -w0 input.png)
cat <<EOF > request.json
{
"messages": [{
"role": "user",
"content": [
{"type": "text", "text": "Convert this image to watercolor style"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,$IMG_B64"}}
]
}],
"extra_body": {
"height": 1024,
"width": 1024,
"num_inference_steps": 50,
"guidance_scale": 1,
"seed": 42
}
}
EOF
curl -s http://localhost:8092/v1/chat/completions -H "Content-Type: application/json" -d @request.json | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2 | base64 -d > output.png
Method 2: Using OpenAI Python SDK¶
import base64
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8092/v1", api_key="none")
with open("input.png", "rb") as f:
img_b64 = base64.b64encode(f.read()).decode()
response = client.chat.completions.create(
model="Qwen/Qwen-Image-Edit",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Convert to watercolor style"},
{"type": "image_url", "image_url": {
"url": f"data:image/png;base64,{img_b64}"
}},
],
}],
extra_body={
"num_inference_steps": 50,
"guidance_scale": 1,
"seed": 42,
},
)
img_url = response.choices[0].message.content[0].image_url.url
_, b64_data = img_url.split(",", 1)
with open("output.png", "wb") as f:
f.write(base64.b64decode(b64_data))
Note
The OpenAI SDK's extra_body keyword argument merges parameters into the top-level request body automatically. When using curl or Python requests, wrap generation parameters inside a literal "extra_body" key in the JSON instead (as shown in the curl example above).
Method 3: Using Python Client Script¶
python openai_chat_client.py --input input.png --prompt "Convert to oil painting style" --output output.png
# Multi-image editing (Qwen-Image-Edit-2509 server required)
python openai_chat_client.py --input input1.png input2.png --prompt "Combine these images into a single scene" --output output.png
Method 4: Using Gradio Demo¶
Request Format¶
Image Editing (Using image_url Format)¶
{
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Convert this image to watercolor style"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
]
}
]
}
Image Editing (Using Simplified image Format)¶
{
"messages": [
{
"role": "user",
"content": [
{"text": "Convert this image to watercolor style"},
{"image": "BASE64_IMAGE_DATA"}
]
}
]
}
Image Editing with Parameters¶
Use extra_body to pass generation parameters:
{
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Convert to ink wash painting style"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
]
}
],
"extra_body": {
"height": 1024,
"width": 1024,
"num_inference_steps": 50,
"guidance_scale": 7.5,
"seed": 42
}
}
Layered Image Generation (Qwen-Image-Layered)¶
Qwen-Image-Layered generates multiple decomposed layers from a reference image and a text prompt. Start the server with:
Using curl
IMG_B64=$(base64 -w0 input.png)
curl -sS http://localhost:8093/v1/chat/completions \
-H "Content-Type: application/json" \
-d "$(jq -n --arg img "$IMG_B64" '{
messages: [{
role: "user",
content: [
{type: "image_url", image_url: {url: ("data:image/png;base64," + $img)}},
{type: "text", text: "a rabbit"}
]
}],
extra_body: {
num_inference_steps: 50,
cfg_scale: 4.0,
seed: 0,
layers: 4,
resolution: 640
}
}')" \
| jq -r '.choices[0].message.content[] | .image_url.url | split(",")[1]' \
| while IFS= read -r b64; do
((i++)); echo "$b64" | base64 -d > "layer_${i}.png"
done
Using Python
import base64
import requests
with open("input.png", "rb") as f:
img_b64 = base64.b64encode(f.read()).decode()
payload = {
"messages": [{
"role": "user",
"content": [
{"type": "image_url", "image_url": {
"url": f"data:image/png;base64,{img_b64}"
}},
{"type": "text", "text": "a rabbit"},
],
}],
"extra_body": {
"num_inference_steps": 50,
"cfg_scale": 4.0,
"seed": 0,
"layers": 4,
"resolution": 640,
},
}
resp = requests.post(
"http://localhost:8093/v1/chat/completions",
json=payload,
timeout=600,
)
data = resp.json()
for i, item in enumerate(data["choices"][0]["message"]["content"]):
_, b64_data = item["image_url"]["url"].split(",", 1)
with open(f"layer_{i}.png", "wb") as f:
f.write(base64.b64decode(b64_data))
The response contains multiple images in choices[0].message.content — one per generated layer.
Qwen-Image-Layered Parameters¶
| Parameter | Type | Default | Description |
|---|---|---|---|
layers | int | 4 | Number of layers to decompose |
resolution | int | 640 | Resolution for dimension calculation (640 or 1024) |
cfg_scale | float | 4.0 | Classifier-free guidance scale (alias for true_cfg_scale) |
num_inference_steps | int | 50 | Number of denoising steps |
seed | int | None | Random seed for reproducibility |
Multi-Image Editing (Qwen-Image-Edit-2509)¶
Provide multiple images in content (order matters):
{
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "Combine these images into a single scene"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."} },
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."} }
]
}
]
}
Generation Parameters¶
When using /v1/chat/completions, pass these inside extra_body in the curl JSON, or via the extra_body keyword argument in the OpenAI Python SDK. When using the dedicated /v1/images/edits endpoint, pass the supported generation controls as top-level form fields directly. For image dimensions and count, use size and n rather than height, width, or num_outputs_per_prompt.
| Parameter | Type | Default | Description |
|---|---|---|---|
height | int | None | Output image height in pixels |
width | int | None | Output image width in pixels |
size | str | None | Output image size (e.g., "1024x1024") |
num_inference_steps | int | 50 | Number of denoising steps |
guidance_scale | float | 1.0 | CFG guidance scale |
seed | int | None | Random seed (reproducible) |
negative_prompt | str | None | Negative prompt |
num_outputs_per_prompt | int | 1 | Number of images to generate |
strength | float | 0.6 | Z-Image only - Denoising start timestep for I2I. Range: [0.0, 1.0]. Lower preserves more of original image. |
layers | int | 4 | Number of layers (Qwen-Image-Layered) |
resolution | int | 640 | Resolution, 640 or 1024 (Qwen-Image-Layered) |
Response Format¶
{
"id": "chatcmpl-xxx",
"created": 1234567890,
"model": "Qwen/Qwen-Image-Edit",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": [{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,..."
}
}]
},
"finish_reason": "stop"
}],
"usage": {...}
}
Common Editing Instructions Examples¶
| Instruction | Description |
|---|---|
Convert this image to watercolor style | Style transfer |
Convert the image to black and white | Desaturation |
Enhance the color saturation | Color adjustment |
Convert to cartoon style | Cartoonization |
Add vintage filter effect | Filter effect |
Convert daytime scene to nighttime | Scene conversion |
File Description¶
| File | Description |
|---|---|
run_server.sh | Server startup script |
run_curl_image_edit.sh | curl image editing example |
openai_chat_client.py | Python client |
gradio_demo.py | Gradio interactive interface |
Example materials¶
gradio_demo.py
#!/usr/bin/env python3
"""
Qwen-Image-Edit Gradio Demo for online serving.
Usage:
python gradio_demo.py [--server http://localhost:8092] [--port 7861]
"""
import argparse
import base64
from io import BytesIO
try:
import gradio as gr
except ImportError:
raise ImportError("gradio is required to run this demo. Install it with: pip install 'vllm-omni[demo]'") from None
import requests
from PIL import Image
def _pil_to_b64_png(img: Image.Image) -> str:
buffer = BytesIO()
img.save(buffer, format="PNG")
return base64.b64encode(buffer.getvalue()).decode("utf-8")
def edit_image(
input_image: Image.Image,
extra_images: list[str] | None,
prompt: str,
steps: int,
guidance_scale: float,
seed: int | None,
negative_prompt: str,
server_url: str,
) -> Image.Image | None:
"""Edit an image using the chat completions API."""
if input_image is None:
raise gr.Error("Please upload an image first")
images: list[Image.Image] = [input_image]
if extra_images:
for p in extra_images:
try:
images.append(Image.open(p).convert("RGB"))
except Exception as e:
raise gr.Error(f"Failed to open image: {p}. Error: {e}") from e
# Build user message with text and image
content: list[dict[str, object]] = [{"type": "text", "text": prompt}]
for img in images:
content.append({"type": "image_url", "image_url": {"url": f"data:image/png;base64,{_pil_to_b64_png(img)}"}})
messages = [
{
"role": "user",
"content": content,
}
]
# Build extra_body with generation parameters
extra_body = {
"num_inference_steps": steps,
"guidance_scale": guidance_scale,
}
if seed is not None and seed >= 0:
extra_body["seed"] = seed
if negative_prompt:
extra_body["negative_prompt"] = negative_prompt
# Build request payload
payload = {"messages": messages, "extra_body": extra_body}
try:
response = requests.post(
f"{server_url}/v1/chat/completions",
headers={"Content-Type": "application/json"},
json=payload,
timeout=300,
)
response.raise_for_status()
data = response.json()
content = data["choices"][0]["message"]["content"]
if isinstance(content, list) and len(content) > 0:
image_url = content[0].get("image_url", {}).get("url", "")
if image_url.startswith("data:image"):
_, b64_data = image_url.split(",", 1)
image_bytes = base64.b64decode(b64_data)
return Image.open(BytesIO(image_bytes))
return None
except Exception as e:
print(f"Error: {e}")
raise gr.Error(f"Edit failed: {e}")
def create_demo(server_url: str):
"""Create Gradio demo interface."""
with gr.Blocks(title="Qwen-Image-Edit Demo") as demo:
gr.Markdown("# Qwen-Image-Edit Online Editing")
gr.Markdown(
"Upload an image and describe the editing effect you want. "
"For multi-image editing, upload extra images (requires Qwen-Image-Edit-2509 server)."
)
with gr.Row():
with gr.Column(scale=1):
input_image = gr.Image(
label="Input Image",
type="pil",
)
extra_images = gr.File(
label="Additional Images (Optional)",
file_count="multiple",
type="filepath",
)
prompt = gr.Textbox(
label="Edit Instruction",
placeholder="Describe the editing effect you want...",
lines=2,
)
negative_prompt = gr.Textbox(
label="Negative Prompt",
placeholder="Describe what you don't want...",
lines=2,
)
with gr.Row():
steps = gr.Slider(
label="Inference Steps",
minimum=10,
maximum=100,
value=50,
step=5,
)
guidance_scale = gr.Slider(
label="Guidance Scale (CFG)",
minimum=1.0,
maximum=20.0,
value=7.5,
step=0.5,
)
with gr.Row():
seed = gr.Number(
label="Random Seed (-1 for random)",
value=-1,
precision=0,
)
edit_btn = gr.Button("Edit Image", variant="primary")
with gr.Column(scale=1):
output_image = gr.Image(
label="Edited Image",
type="pil",
)
# Examples
gr.Examples(
examples=[
["Convert this image to watercolor style"],
["Convert the image to black and white"],
["Enhance the color saturation"],
["Convert to cartoon style"],
["Add vintage filter effect"],
["Convert daytime to nighttime"],
["Convert to oil painting style"],
["Add dreamy blur effect"],
],
inputs=[prompt],
)
def process_edit(img, imgs, p, st, g, se, n):
actual_seed = se if se >= 0 else None
return edit_image(img, imgs, p, st, g, actual_seed, n, server_url)
edit_btn.click(
fn=process_edit,
inputs=[input_image, extra_images, prompt, steps, guidance_scale, seed, negative_prompt],
outputs=[output_image],
)
return demo
def main():
parser = argparse.ArgumentParser(description="Qwen-Image-Edit Gradio Demo")
parser.add_argument("--server", default="http://localhost:8092", help="Server URL")
parser.add_argument("--port", type=int, default=7861, help="Gradio port")
parser.add_argument("--share", action="store_true", help="Create public link")
args = parser.parse_args()
print(f"Connecting to server: {args.server}")
demo = create_demo(args.server)
demo.launch(server_port=args.port, share=args.share)
if __name__ == "__main__":
main()
openai_chat_client.py
#!/usr/bin/env python3
"""
Qwen-Image-Edit OpenAI-compatible chat client for image editing.
Usage:
python openai_chat_client.py --input qwen_image_output.png --prompt "Convert to watercolor style" --output output.png
python openai_chat_client.py --input input.png --prompt "Convert to oil painting" --seed 42
python openai_chat_client.py --input input1.png input2.png --prompt "Combine these images into a single scene"
"""
import argparse
import base64
from io import BytesIO
from pathlib import Path
import requests
from PIL import Image
def _encode_image_as_data_url(input_path: Path) -> str:
image_bytes = input_path.read_bytes()
try:
img = Image.open(BytesIO(image_bytes))
mime_type = f"image/{img.format.lower()}" if img.format else "image/png"
except Exception:
mime_type = "image/png"
image_b64 = base64.b64encode(image_bytes).decode("utf-8")
return f"data:{mime_type};base64,{image_b64}"
def edit_image(
input_image: str | Path | list[str | Path],
prompt: str,
server_url: str = "http://localhost:8092",
height: int | None = None,
width: int | None = None,
steps: int | None = None,
guidance_scale: float | None = None,
seed: int | None = None,
negative_prompt: str | None = None,
) -> bytes | None:
"""Edit an image using the chat completions API.
Args:
input_image: Path(s) to input image(s). For multi-image editing, pass multiple paths.
prompt: Text description of the edit
server_url: Server URL
height: Output image height in pixels
width: Output image width in pixels
steps: Number of inference steps
guidance_scale: CFG guidance scale
seed: Random seed
negative_prompt: Negative prompt
Returns:
Edited image bytes or None if failed
"""
input_images = input_image if isinstance(input_image, list) else [input_image]
input_paths = [Path(p) for p in input_images]
for p in input_paths:
if not p.exists():
print(f"Error: Input image not found: {p}")
return None
# Build user message with text and image
content: list[dict[str, object]] = [{"type": "text", "text": prompt}]
for p in input_paths:
content.append({"type": "image_url", "image_url": {"url": _encode_image_as_data_url(p)}})
messages = [
{
"role": "user",
"content": content,
}
]
# Build extra_body with generation parameters
extra_body = {}
if height is not None:
extra_body["height"] = height
if width is not None:
extra_body["width"] = width
if steps is not None:
extra_body["num_inference_steps"] = steps
if guidance_scale is not None:
extra_body["guidance_scale"] = guidance_scale
if seed is not None:
extra_body["seed"] = seed
if negative_prompt:
extra_body["negative_prompt"] = negative_prompt
# Build request payload
payload = {"messages": messages}
if extra_body:
payload["extra_body"] = extra_body
# Send request
try:
response = requests.post(
f"{server_url}/v1/chat/completions",
headers={"Content-Type": "application/json"},
json=payload,
timeout=300,
)
response.raise_for_status()
data = response.json()
# Extract image from response
content = data["choices"][0]["message"]["content"]
if isinstance(content, list) and len(content) > 0:
image_url = content[0].get("image_url", {}).get("url", "")
if image_url.startswith("data:image"):
_, b64_data = image_url.split(",", 1)
return base64.b64decode(b64_data)
print(f"Unexpected response format: {content}")
return None
except Exception as e:
print(f"Error: {e}")
return None
def main():
parser = argparse.ArgumentParser(description="Qwen-Image-Edit chat client")
parser.add_argument("--input", "-i", required=True, nargs="+", help="Input image path(s)")
parser.add_argument("--prompt", "-p", required=True, help="Edit prompt")
parser.add_argument("--output", "-o", default="output.png", help="Output file")
parser.add_argument("--server", "-s", default="http://localhost:8092", help="Server URL")
parser.add_argument("--height", type=int, default=1024, help="Output image height")
parser.add_argument("--width", type=int, default=1024, help="Output image width")
parser.add_argument("--steps", type=int, default=50, help="Inference steps")
parser.add_argument("--guidance", type=float, default=7.5, help="Guidance scale")
parser.add_argument("--seed", type=int, default=0, help="Random seed")
parser.add_argument("--negative", help="Negative prompt")
args = parser.parse_args()
if len(args.input) == 1:
print(f"Input: {args.input[0]}")
else:
print(f"Inputs ({len(args.input)}): {', '.join(args.input)}")
print(f"Prompt: {args.prompt}")
image_bytes = edit_image(
input_image=args.input,
prompt=args.prompt,
server_url=args.server,
height=args.height,
width=args.width,
steps=args.steps,
guidance_scale=args.guidance,
seed=args.seed,
negative_prompt=args.negative,
)
if image_bytes:
output_path = Path(args.output)
output_path.write_bytes(image_bytes)
print(f"Image saved to: {output_path}")
print(f"Size: {len(image_bytes) / 1024:.1f} KB")
else:
print("Failed to edit image")
exit(1)
if __name__ == "__main__":
main()
run_curl_image_edit.sh
#!/bin/bash
# Qwen-Image image-edit (image-to-image) curl example
set -euo pipefail
if [[ $# -lt 2 ]]; then
echo "Usage: $0 <input_image> \"<edit_prompt>\" [output_file]" >&2
exit 1
fi
INPUT_IMG=$1
PROMPT=$2
SERVER="${SERVER:-http://localhost:8092}"
CURRENT_TIME=$(date +%Y%m%d%H%M%S)
OUTPUT="${3:-image_edit_${CURRENT_TIME}.png}"
if [[ ! -f "$INPUT_IMG" ]]; then
echo "Input image not found: $INPUT_IMG" >&2
exit 1
fi
REQUEST_JSON_FILE=$(mktemp)
trap 'rm -f "$REQUEST_JSON_FILE"' EXIT
# Pipe base64 into jq via stdin to avoid ARG_MAX limit on large images
base64 -w0 "$INPUT_IMG" \
| jq -Rs --arg prompt "$PROMPT" '{
messages: [{
role: "user",
content: [
{"type": "text", "text": $prompt},
{"type": "image_url", "image_url": {"url": ("data:image/png;base64," + .)}}
]
}],
extra_body: {
num_inference_steps: 50,
guidance_scale: 1,
seed: 42
}
}' > "$REQUEST_JSON_FILE"
echo "Generating edited image..."
echo "Server: $SERVER"
echo "Prompt: $PROMPT"
echo "Input : $INPUT_IMG"
echo "Output: $OUTPUT"
curl -s "$SERVER/v1/chat/completions" \
-H "Content-Type: application/json" \
-d @"$REQUEST_JSON_FILE" \
| jq -r '.choices[0].message.content[0].image_url.url' \
| cut -d',' -f2 \
| base64 -d > "$OUTPUT"
if [[ -f "$OUTPUT" ]]; then
echo "Image saved to: $OUTPUT"
echo "Size: $(du -h "$OUTPUT" | cut -f1)"
else
echo "Failed to generate image"
exit 1
fi