Text-To-Image¶
Source https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/text_to_image.
This example demonstrates how to deploy Qwen-Image model for online image generation service using vLLM-Omni.
Start Server¶
Basic Start¶
Note
If you encounter Out-of-Memory (OOM) issues or have limited GPU memory, you can enable VAE slicing and tiling to reduce memory usage, --vae-use-slicing --vae-use-tiling
Start with Parameters¶
Or use the startup script:
Start with Parallelism Acceleration¶
Enable Tensor Parallelism and VAE Patch Parallelism for faster inference:
# With Tensor Parallelism (requires >= 2 GPUs)
vllm serve Qwen/Qwen-Image --omni --port 8091 --tensor-parallel-size 2
# With Tensor Parallelism and VAE Patch Parallelism (requires >= 2 GPUs)
vllm serve Qwen/Qwen-Image --omni --port 8091 --tensor-parallel-size 2 --vae-patch-parallel-size 2 --vae-use-tiling
# With Sequence Parallelism (Ulysses-SP, requires >= 2 GPUs)
vllm serve Qwen/Qwen-Image --omni --port 8091 --usp 2
# With Ring-Attention (requires >= 2 GPUs)
vllm serve Qwen/Qwen-Image --omni --port 8091 --ring 2
# Combined: Ulysses + Ring (requires >= 4 GPUs)
vllm serve Qwen/Qwen-Image --omni --port 8091 --usp 2 --ring 2
For more details on parallelism acceleration, see the Parallelism Acceleration Guide.
API Calls¶
Method 1: Using curl¶
# Basic text-to-image generation
bash run_curl_text_to_image.sh
# Or execute directly
curl -s http://localhost:8091/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user", "content": "A beautiful landscape painting"}
],
"extra_body": {
"height": 1024,
"width": 1024,
"num_inference_steps": 50,
"true_cfg_scale": 4.0,
"seed": 42
}
}' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png
Method 2: Using OpenAI Python SDK¶
from openai import OpenAI
import base64
client = OpenAI(base_url="http://localhost:8091/v1", api_key="none")
response = client.chat.completions.create(
model="Qwen/Qwen-Image",
messages=[{"role": "user", "content": "A beautiful landscape painting"}],
extra_body={
"height": 1024,
"width": 1024,
"num_inference_steps": 50,
"true_cfg_scale": 4.0,
"seed": 42,
},
)
img_url = response.choices[0].message.content[0].image_url.url
_, b64_data = img_url.split(",", 1)
with open("output.png", "wb") as f:
f.write(base64.b64decode(b64_data))
Note
The OpenAI SDK's extra_body keyword argument merges parameters into the top-level request body automatically. When using curl or Python requests, wrap generation parameters inside a literal "extra_body" key in the JSON instead (as shown in the curl example above).
Method 3: Using Python Client Script¶
Method 4: Using Gradio Demo¶
LoRA¶
This example supports Peft-compatible LoRA (Low-Rank Adaptation) adapters for diffusion models. The LoRA adapter path must be readable on the server machine (usually a local path or a mounted directory).
Using Python Client with LoRA¶
python openai_chat_client.py \
--prompt "A piece of cheesecake" \
--lora-path /path/to/lora_adapter \
--lora-name my_lora \
--lora-scale 1.0 \
--output output.png
Using curl with LoRA (Images API)¶
The /v1/images/generations endpoint supports a lora field in the request body:
curl -X POST http://localhost:8091/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"prompt": "A piece of cheesecake",
"size": "1024x1024",
"seed": 42,
"lora": {
"name": "my_lora",
"local_path": "/path/to/lora_adapter",
"scale": 1.0
}
}' | jq -r '.data[0].b64_json' | base64 -d > output.png
LoRA Parameters¶
| Parameter | Type | Description |
|---|---|---|
name | str | LoRA adapter name (optional, defaults to path stem) |
local_path | str | Server-local path to LoRA adapter folder (PEFT format, required) |
scale | float | LoRA scale factor (default: 1.0) |
int_id | int | LoRA integer ID for caching (optional, derived from path if not provided) |
LoRA Adapter Format¶
LoRA adapters must be in PEFT (Parameter-Efficient Fine-Tuning) format. A typical LoRA adapter directory structure:
Request Format¶
Simple Text Generation¶
Generation with Parameters¶
Use extra_body to pass generation parameters:
{
"messages": [
{"role": "user", "content": "A beautiful landscape painting"}
],
"extra_body": {
"height": 1024,
"width": 1024,
"num_inference_steps": 50,
"true_cfg_scale": 4.0,
"seed": 42
}
}
Multimodal Input (Text + Structured Content)¶
{
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "A beautiful landscape painting"}
]
}
]
}
Generation Parameters¶
When using /v1/chat/completions, pass these inside extra_body in the curl JSON, or via the extra_body keyword argument in the OpenAI Python SDK. When using the dedicated /v1/images/generations endpoint, pass the supported generation controls as top-level JSON fields directly. For image dimensions and count, use size and n rather than height, width, or num_outputs_per_prompt.
| Parameter | Type | Default | Description |
|---|---|---|---|
height | int | None | Image height in pixels |
width | int | None | Image width in pixels |
size | str | None | Image size (e.g., "1024x1024") |
num_inference_steps | int | 50 | Number of denoising steps |
true_cfg_scale | float | 4.0 | Qwen-Image CFG scale |
seed | int | None | Random seed (reproducible) |
negative_prompt | str | None | Negative prompt |
num_outputs_per_prompt | int | 1 | Number of images to generate |
use_system_prompt | str | None | System prompt preset: en_unified, en_vanilla, en_recaption, en_think_recaption, dynamic, None, or custom text string. Only for HunyuanImage-3.0. |
system_prompt | str | None | Custom system prompt text. Only used when use_system_prompt is set to custom. Only for HunyuanImage-3.0. |
Response Format¶
{
"id": "chatcmpl-xxx",
"created": 1234567890,
"model": "Qwen/Qwen-Image",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": [{
"type": "image_url",
"image_url": {
"url": "data:image/png;base64,..."
}
}]
},
"finish_reason": "stop"
}],
"usage": {...}
}
Extract Image¶
# Extract base64 from response and decode to image
cat response.json | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png
File Description¶
| File | Description |
|---|---|
run_server.sh | Server startup script |
run_curl_text_to_image.sh | curl example |
openai_chat_client.py | Python client |
gradio_demo.py | Gradio interactive interface |
Example materials¶
gradio_demo.py
#!/usr/bin/env python3
"""
Qwen-Image Gradio Demo for online serving.
Usage:
python gradio_demo.py [--server http://localhost:8091] [--port 7860]
"""
import argparse
import base64
from io import BytesIO
try:
import gradio as gr
except ImportError:
raise ImportError("gradio is required to run this demo. Install it with: pip install 'vllm-omni[demo]'") from None
import requests
from PIL import Image
def generate_image(
prompt: str,
height: int,
width: int,
steps: int,
cfg_scale: float,
seed: int | None,
negative_prompt: str,
server_url: str,
num_outputs_per_prompt: int = 1,
) -> Image.Image | None:
"""Generate an image using the chat completions API."""
messages = [{"role": "user", "content": prompt}]
# Build extra_body with generation parameters
extra_body = {
"height": height,
"width": width,
"num_inference_steps": steps,
"true_cfg_scale": cfg_scale,
}
if seed is not None and seed >= 0:
extra_body["seed"] = seed
if negative_prompt:
extra_body["negative_prompt"] = negative_prompt
# Keep consistent with run_curl_text_to_image.sh, always send num_outputs_per_prompt
extra_body["num_outputs_per_prompt"] = num_outputs_per_prompt
# Build request payload
payload = {"messages": messages, "extra_body": extra_body}
try:
response = requests.post(
f"{server_url}/v1/chat/completions",
headers={"Content-Type": "application/json"},
json=payload,
timeout=300,
)
response.raise_for_status()
data = response.json()
content = data["choices"][0]["message"]["content"]
if isinstance(content, list) and len(content) > 0:
image_url = content[0].get("image_url", {}).get("url", "")
if image_url.startswith("data:image"):
_, b64_data = image_url.split(",", 1)
image_bytes = base64.b64decode(b64_data)
return Image.open(BytesIO(image_bytes))
return None
except Exception as e:
print(f"Error: {e}")
raise gr.Error(f"Generation failed: {e}")
def create_demo(server_url: str):
"""Create Gradio demo interface."""
with gr.Blocks(title="Qwen-Image Demo") as demo:
gr.Markdown("# Qwen-Image Online Generation")
gr.Markdown("Generate images using Qwen-Image model")
with gr.Row():
with gr.Column(scale=1):
prompt = gr.Textbox(
label="Prompt",
placeholder="Describe the image you want to generate...",
lines=3,
)
negative_prompt = gr.Textbox(
label="Negative Prompt",
placeholder="Describe what you don't want...",
lines=2,
)
with gr.Row():
height = gr.Slider(
label="Height",
minimum=256,
maximum=2048,
value=1024,
step=64,
)
width = gr.Slider(
label="Width",
minimum=256,
maximum=2048,
value=1024,
step=64,
)
with gr.Row():
steps = gr.Slider(
label="Inference Steps",
minimum=10,
maximum=100,
# Default steps aligned with run_curl_text_to_image.sh to 100
value=100,
step=5,
)
cfg_scale = gr.Slider(
label="True CFG Scale",
minimum=1.0,
maximum=20.0,
value=4.0,
step=0.5,
)
with gr.Row():
seed = gr.Number(
label="Random Seed (-1 for random)",
value=-1,
precision=0,
)
generate_btn = gr.Button("Generate Image", variant="primary")
with gr.Column(scale=1):
output_image = gr.Image(
label="Generated Image",
type="pil",
)
# Examples
gr.Examples(
examples=[
["A beautiful landscape painting with misty mountains", "", 1024, 1024, 100, 4.0, 42],
["A cute cat sitting on a windowsill with sunlight", "", 1024, 1024, 100, 4.0, 123],
["Cyberpunk style futuristic city with neon lights", "blurry, low quality", 1024, 768, 100, 4.0, 456],
["Chinese ink painting of bamboo forest with a house", "", 768, 1024, 100, 4.0, 789],
],
inputs=[prompt, negative_prompt, height, width, steps, cfg_scale, seed],
)
generate_btn.click(
fn=lambda p, h, w, st, c, se, n: generate_image(
p,
h,
w,
st,
c,
se if se >= 0 else None,
n,
server_url,
1,
),
inputs=[prompt, height, width, steps, cfg_scale, seed, negative_prompt],
outputs=[output_image],
)
return demo
def main():
parser = argparse.ArgumentParser(description="Qwen-Image Gradio Demo")
parser.add_argument("--server", default="http://localhost:8091", help="Server URL")
parser.add_argument("--port", type=int, default=7860, help="Gradio port")
parser.add_argument("--share", action="store_true", help="Create public link")
args = parser.parse_args()
print(f"Connecting to server: {args.server}")
demo = create_demo(args.server)
demo.launch(server_port=args.port, share=args.share)
if __name__ == "__main__":
main()
openai_chat_client.py
#!/usr/bin/env python3
"""
Qwen-Image OpenAI-compatible image generation client.
Usage:
python openai_chat_client.py --prompt "A beautiful landscape" --output output.png
python openai_chat_client.py --prompt "A sunset" --height 1024 --width 1024 --steps 50 --seed 42
"""
import argparse
import base64
from pathlib import Path
import requests
def generate_image(
prompt: str,
server_url: str = "http://localhost:8091",
height: int | None = None,
width: int | None = None,
steps: int | None = None,
true_cfg_scale: float | None = None,
seed: int | None = None,
negative_prompt: str | None = None,
num_outputs_per_prompt: int = 1,
lora_path: str | None = None,
lora_name: str | None = None,
lora_scale: float | None = None,
lora_int_id: int | None = None,
use_system_prompt: str | None = None,
system_prompt: str | None = None,
) -> bytes | None:
"""Generate an image using the images generation API.
Args:
prompt: Text description of the image
server_url: Server URL
height: Image height in pixels
width: Image width in pixels
steps: Number of diffusion steps
true_cfg_scale: Qwen-Image CFG scale
seed: Random seed
negative_prompt: Negative prompt
num_outputs_per_prompt: Number of images to generate
lora_path: Server-local LoRA adapter folder path (PEFT format)
lora_name: LoRA name (optional, defaults to path stem)
lora_scale: LoRA scale factor (default: 1.0)
lora_int_id: LoRA integer ID (optional, derived from path if not provided)
use_system_prompt: System prompt for generation.
system_prompt: Custom system prompt.
Returns:
Image bytes or None if failed
"""
payload: dict[str, object] = {
"prompt": prompt,
"response_format": "b64_json",
"n": num_outputs_per_prompt,
}
if width is not None and height is not None:
payload["size"] = f"{width}x{height}"
elif width is not None:
payload["size"] = f"{width}x{width}"
elif height is not None:
payload["size"] = f"{height}x{height}"
if steps is not None:
payload["num_inference_steps"] = steps
if true_cfg_scale is not None:
payload["true_cfg_scale"] = true_cfg_scale
if negative_prompt:
payload["negative_prompt"] = negative_prompt
if seed is not None:
payload["seed"] = seed
if use_system_prompt is not None:
payload["use_system_prompt"] = use_system_prompt
if system_prompt is not None:
payload["system_prompt"] = system_prompt
# Add LoRA if provided
if lora_path:
lora_body: dict = {
"local_path": lora_path,
"name": lora_name or Path(lora_path).stem,
}
if lora_scale is not None:
lora_body["scale"] = float(lora_scale)
if lora_int_id is not None:
lora_body["int_id"] = int(lora_int_id)
payload["lora"] = lora_body
try:
response = requests.post(
f"{server_url}/v1/images/generations",
headers={"Content-Type": "application/json"},
json=payload,
timeout=300,
)
response.raise_for_status()
data = response.json()
items = data.get("data")
if isinstance(items, list) and items:
first = items[0].get("b64_json") if isinstance(items[0], dict) else None
if isinstance(first, str):
return base64.b64decode(first)
print(f"Unexpected response format: {data}")
return None
except Exception as e:
print(f"Error: {e}")
return None
def main():
parser = argparse.ArgumentParser(description="Qwen-Image chat client")
parser.add_argument("--prompt", "-p", default="a cup of coffee on the table", help="Text prompt")
parser.add_argument("--output", "-o", default="qwen_image_output.png", help="Output file")
parser.add_argument("--server", "-s", default="http://localhost:8091", help="Server URL")
parser.add_argument("--height", type=int, default=1024, help="Image height")
parser.add_argument("--width", type=int, default=1024, help="Image width")
parser.add_argument("--steps", type=int, default=50, help="Inference steps")
parser.add_argument("--cfg-scale", type=float, default=4.0, help="True CFG scale")
parser.add_argument("--seed", type=int, default=0, help="Random seed")
parser.add_argument("--negative", help="Negative prompt")
parser.add_argument("--lora-path", default=None, help="Server-local LoRA adapter folder (PEFT format)")
parser.add_argument("--lora-name", default=None, help="LoRA name (optional)")
parser.add_argument("--lora-scale", type=float, default=1.0, help="LoRA scale")
parser.add_argument(
"--lora-int-id",
type=int,
default=None,
help="LoRA integer id (cache key). If omitted, the server derives a stable id from lora_path.",
)
parser.add_argument(
"--use-system-prompt",
type=str,
default=None,
help=(
"System prompt for generation. Use predefined types: 'en_unified', 'en_vanilla', 'en_recaption', 'en_think_recaption', 'dynamic', or 'None'; Or provide custom text string directly. Recommended en_unified. "
),
)
parser.add_argument(
"--system-prompt",
type=str,
default=None,
help=("Custom system prompt. Used when --use-system-prompt is custom. "),
)
args = parser.parse_args()
print(f"Generating image for: {args.prompt}")
image_bytes = generate_image(
prompt=args.prompt,
server_url=args.server,
height=args.height,
width=args.width,
steps=args.steps,
true_cfg_scale=args.cfg_scale,
seed=args.seed,
negative_prompt=args.negative,
lora_path=args.lora_path,
lora_name=args.lora_name,
lora_scale=args.lora_scale if args.lora_path else None,
lora_int_id=args.lora_int_id if args.lora_path else None,
use_system_prompt=args.use_system_prompt,
system_prompt=args.system_prompt,
)
if image_bytes:
output_path = Path(args.output)
output_path.write_bytes(image_bytes)
print(f"Image saved to: {output_path}")
print(f"Size: {len(image_bytes) / 1024:.1f} KB")
else:
print("Failed to generate image")
exit(1)
if __name__ == "__main__":
main()
run_curl_text_to_image.sh
#!/bin/bash
# Qwen-Image text-to-image curl example
curl -X POST http://localhost:8091/v1/images/generations \
-H "Content-Type: application/json" \
-d '{
"prompt": "a dragon laying over the spine of the Green Mountains of Vermont",
"size": "1024x1024",
"seed": 42
}' | jq -r '.data[0].b64_json' | base64 -d > dragon.png