Text-To-Image¶

Source https://github.com/vllm-project/vllm-omni/tree/main/examples/online_serving/text_to_image.

This example demonstrates how to deploy Qwen-Image model for online image generation service using vLLM-Omni.

Start Server¶

Basic Start¶

vllm serve Qwen/Qwen-Image --omni --port 8091

Note

If you encounter Out-of-Memory (OOM) issues or have limited GPU memory, you can enable VAE slicing and tiling to reduce memory usage, --vae-use-slicing --vae-use-tiling

Start with Parameters¶

Or use the startup script:

bash run_server.sh

Start with Parallelism Acceleration¶

Enable Tensor Parallelism and VAE Patch Parallelism for faster inference:

# With Tensor Parallelism (requires >= 2 GPUs)
vllm serve Qwen/Qwen-Image --omni --port 8091 --tensor-parallel-size 2

# With Tensor Parallelism and VAE Patch Parallelism (requires >= 2 GPUs)
vllm serve Qwen/Qwen-Image --omni --port 8091 --tensor-parallel-size 2 --vae-patch-parallel-size 2 --vae-use-tiling

# With Sequence Parallelism (Ulysses-SP, requires >= 2 GPUs)
vllm serve Qwen/Qwen-Image --omni --port 8091 --usp 2

# With Ring-Attention (requires >= 2 GPUs)
vllm serve Qwen/Qwen-Image --omni --port 8091 --ring 2

# Combined: Ulysses + Ring (requires >= 4 GPUs)
vllm serve Qwen/Qwen-Image --omni --port 8091 --usp 2 --ring 2

For more details on parallelism acceleration, see the Parallelism Acceleration Guide.

API Calls¶

Method 1: Using curl¶

# Basic text-to-image generation
bash run_curl_text_to_image.sh

# Or execute directly
curl -s http://localhost:8091/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "A beautiful landscape painting"}
    ],
    "extra_body": {
      "height": 1024,
      "width": 1024,
      "num_inference_steps": 50,
      "true_cfg_scale": 4.0,
      "seed": 42
    }
  }' | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png

Method 2: Using OpenAI Python SDK¶

from openai import OpenAI
import base64

client = OpenAI(base_url="http://localhost:8091/v1", api_key="none")

response = client.chat.completions.create(
    model="Qwen/Qwen-Image",
    messages=[{"role": "user", "content": "A beautiful landscape painting"}],
    extra_body={
        "height": 1024,
        "width": 1024,
        "num_inference_steps": 50,
        "true_cfg_scale": 4.0,
        "seed": 42,
    },
)

img_url = response.choices[0].message.content[0].image_url.url
_, b64_data = img_url.split(",", 1)
with open("output.png", "wb") as f:
    f.write(base64.b64decode(b64_data))

Note

The OpenAI SDK's extra_body keyword argument merges parameters into the top-level request body automatically. When using curl or Python requests, wrap generation parameters inside a literal "extra_body" key in the JSON instead (as shown in the curl example above).

Method 3: Using Python Client Script¶

python openai_chat_client.py --prompt "A beautiful landscape painting" --output output.png

Method 4: Using Gradio Demo¶

python gradio_demo.py
# Visit http://localhost:7860

LoRA¶

This example supports Peft-compatible LoRA (Low-Rank Adaptation) adapters for diffusion models. The LoRA adapter path must be readable on the server machine (usually a local path or a mounted directory).

Using Python Client with LoRA¶

python openai_chat_client.py \
  --prompt "A piece of cheesecake" \
  --lora-path /path/to/lora_adapter \
  --lora-name my_lora \
  --lora-scale 1.0 \
  --output output.png

Using curl with LoRA (Images API)¶

The /v1/images/generations endpoint supports a lora field in the request body:

curl -X POST http://localhost:8091/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "A piece of cheesecake",
    "size": "1024x1024",
    "seed": 42,
    "lora": {
      "name": "my_lora",
      "local_path": "/path/to/lora_adapter",
      "scale": 1.0
    }
  }' | jq -r '.data[0].b64_json' | base64 -d > output.png

LoRA Parameters¶

Parameter	Type	Description
`name`	str	LoRA adapter name (optional, defaults to path stem)
`local_path`	str	Server-local path to LoRA adapter folder (PEFT format, required)
`scale`	float	LoRA scale factor (default: 1.0)
`int_id`	int	LoRA integer ID for caching (optional, derived from path if not provided)

LoRA Adapter Format¶

LoRA adapters must be in PEFT (Parameter-Efficient Fine-Tuning) format. A typical LoRA adapter directory structure:

lora_adapter/
├── adapter_config.json
└── adapter_model.safetensors

Request Format¶

Simple Text Generation¶

{
  "messages": [
    {"role": "user", "content": "A beautiful landscape painting"}
  ]
}

Generation with Parameters¶

Use extra_body to pass generation parameters:

{
  "messages": [
    {"role": "user", "content": "A beautiful landscape painting"}
  ],
  "extra_body": {
    "height": 1024,
    "width": 1024,
    "num_inference_steps": 50,
    "true_cfg_scale": 4.0,
    "seed": 42
  }
}

Multimodal Input (Text + Structured Content)¶

{
  "messages": [
    {
      "role": "user",
      "content": [
        {"type": "text", "text": "A beautiful landscape painting"}
      ]
    }
  ]
}

Generation Parameters¶

When using /v1/chat/completions, pass these inside extra_body in the curl JSON, or via the extra_body keyword argument in the OpenAI Python SDK. When using the dedicated /v1/images/generations endpoint, pass the supported generation controls as top-level JSON fields directly. For image dimensions and count, use size and n rather than height, width, or num_outputs_per_prompt.

Parameter	Type	Default	Description
`height`	int	None	Image height in pixels
`width`	int	None	Image width in pixels
`size`	str	None	Image size (e.g., "1024x1024")
`num_inference_steps`	int	50	Number of denoising steps
`true_cfg_scale`	float	4.0	Qwen-Image CFG scale
`seed`	int	None	Random seed (reproducible)
`negative_prompt`	str	None	Negative prompt
`num_outputs_per_prompt`	int	1	Number of images to generate
`use_system_prompt`	str	None	System prompt preset: `en_unified`, `en_vanilla`, `en_recaption`, `en_think_recaption`, `dynamic`, `None`, or custom text string. Only for HunyuanImage-3.0.
`system_prompt`	str	None	Custom system prompt text. Only used when `use_system_prompt` is set to `custom`. Only for HunyuanImage-3.0.

Response Format¶

{
  "id": "chatcmpl-xxx",
  "created": 1234567890,
  "model": "Qwen/Qwen-Image",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": [{
        "type": "image_url",
        "image_url": {
          "url": "data:image/png;base64,..."
        }
      }]
    },
    "finish_reason": "stop"
  }],
  "usage": {...}
}

Extract Image¶

# Extract base64 from response and decode to image
cat response.json | jq -r '.choices[0].message.content[0].image_url.url' | cut -d',' -f2- | base64 -d > output.png

File Description¶

File	Description
`run_server.sh`	Server startup script
`run_curl_text_to_image.sh`	curl example
`openai_chat_client.py`	Python client
`gradio_demo.py`	Gradio interactive interface

Example materials¶

gradio_demo.py

#!/usr/bin/env python3
"""
Qwen-Image Gradio Demo for online serving.

Usage:
    python gradio_demo.py [--server http://localhost:8091] [--port 7860]
"""

import argparse
import base64
from io import BytesIO

try:
    import gradio as gr
except ImportError:
    raise ImportError("gradio is required to run this demo. Install it with: pip install 'vllm-omni[demo]'") from None
import requests
from PIL import Image


def generate_image(
    prompt: str,
    height: int,
    width: int,
    steps: int,
    cfg_scale: float,
    seed: int | None,
    negative_prompt: str,
    server_url: str,
    num_outputs_per_prompt: int = 1,
) -> Image.Image | None:
    """Generate an image using the chat completions API."""
    messages = [{"role": "user", "content": prompt}]

    # Build extra_body with generation parameters
    extra_body = {
        "height": height,
        "width": width,
        "num_inference_steps": steps,
        "true_cfg_scale": cfg_scale,
    }
    if seed is not None and seed >= 0:
        extra_body["seed"] = seed
    if negative_prompt:
        extra_body["negative_prompt"] = negative_prompt
    # Keep consistent with run_curl_text_to_image.sh, always send num_outputs_per_prompt
    extra_body["num_outputs_per_prompt"] = num_outputs_per_prompt

    # Build request payload
    payload = {"messages": messages, "extra_body": extra_body}

    try:
        response = requests.post(
            f"{server_url}/v1/chat/completions",
            headers={"Content-Type": "application/json"},
            json=payload,
            timeout=300,
        )
        response.raise_for_status()
        data = response.json()

        content = data["choices"][0]["message"]["content"]
        if isinstance(content, list) and len(content) > 0:
            image_url = content[0].get("image_url", {}).get("url", "")
            if image_url.startswith("data:image"):
                _, b64_data = image_url.split(",", 1)
                image_bytes = base64.b64decode(b64_data)
                return Image.open(BytesIO(image_bytes))

        return None

    except Exception as e:
        print(f"Error: {e}")
        raise gr.Error(f"Generation failed: {e}")


def create_demo(server_url: str):
    """Create Gradio demo interface."""

    with gr.Blocks(title="Qwen-Image Demo") as demo:
        gr.Markdown("# Qwen-Image Online Generation")
        gr.Markdown("Generate images using Qwen-Image model")

        with gr.Row():
            with gr.Column(scale=1):
                prompt = gr.Textbox(
                    label="Prompt",
                    placeholder="Describe the image you want to generate...",
                    lines=3,
                )
                negative_prompt = gr.Textbox(
                    label="Negative Prompt",
                    placeholder="Describe what you don't want...",
                    lines=2,
                )

                with gr.Row():
                    height = gr.Slider(
                        label="Height",
                        minimum=256,
                        maximum=2048,
                        value=1024,
                        step=64,
                    )
                    width = gr.Slider(
                        label="Width",
                        minimum=256,
                        maximum=2048,
                        value=1024,
                        step=64,
                    )

                with gr.Row():
                    steps = gr.Slider(
                        label="Inference Steps",
                        minimum=10,
                        maximum=100,
                        # Default steps aligned with run_curl_text_to_image.sh to 100
                        value=100,
                        step=5,
                    )
                    cfg_scale = gr.Slider(
                        label="True CFG Scale",
                        minimum=1.0,
                        maximum=20.0,
                        value=4.0,
                        step=0.5,
                    )

                with gr.Row():
                    seed = gr.Number(
                        label="Random Seed (-1 for random)",
                        value=-1,
                        precision=0,
                    )

                generate_btn = gr.Button("Generate Image", variant="primary")

            with gr.Column(scale=1):
                output_image = gr.Image(
                    label="Generated Image",
                    type="pil",
                )

        # Examples
        gr.Examples(
            examples=[
                ["A beautiful landscape painting with misty mountains", "", 1024, 1024, 100, 4.0, 42],
                ["A cute cat sitting on a windowsill with sunlight", "", 1024, 1024, 100, 4.0, 123],
                ["Cyberpunk style futuristic city with neon lights", "blurry, low quality", 1024, 768, 100, 4.0, 456],
                ["Chinese ink painting of bamboo forest with a house", "", 768, 1024, 100, 4.0, 789],
            ],
            inputs=[prompt, negative_prompt, height, width, steps, cfg_scale, seed],
        )

        generate_btn.click(
            fn=lambda p, h, w, st, c, se, n: generate_image(
                p,
                h,
                w,
                st,
                c,
                se if se >= 0 else None,
                n,
                server_url,
                1,
            ),
            inputs=[prompt, height, width, steps, cfg_scale, seed, negative_prompt],
            outputs=[output_image],
        )

    return demo


def main():
    parser = argparse.ArgumentParser(description="Qwen-Image Gradio Demo")
    parser.add_argument("--server", default="http://localhost:8091", help="Server URL")
    parser.add_argument("--port", type=int, default=7860, help="Gradio port")
    parser.add_argument("--share", action="store_true", help="Create public link")

    args = parser.parse_args()

    print(f"Connecting to server: {args.server}")
    demo = create_demo(args.server)
    demo.launch(server_port=args.port, share=args.share)


if __name__ == "__main__":
    main()

openai_chat_client.py

#!/usr/bin/env python3
"""
Qwen-Image OpenAI-compatible image generation client.

Usage:
    python openai_chat_client.py --prompt "A beautiful landscape" --output output.png
    python openai_chat_client.py --prompt "A sunset" --height 1024 --width 1024 --steps 50 --seed 42
"""

import argparse
import base64
from pathlib import Path

import requests


def generate_image(
    prompt: str,
    server_url: str = "http://localhost:8091",
    height: int | None = None,
    width: int | None = None,
    steps: int | None = None,
    true_cfg_scale: float | None = None,
    seed: int | None = None,
    negative_prompt: str | None = None,
    num_outputs_per_prompt: int = 1,
    lora_path: str | None = None,
    lora_name: str | None = None,
    lora_scale: float | None = None,
    lora_int_id: int | None = None,
    use_system_prompt: str | None = None,
    system_prompt: str | None = None,
) -> bytes | None:
    """Generate an image using the images generation API.

    Args:
        prompt: Text description of the image
        server_url: Server URL
        height: Image height in pixels
        width: Image width in pixels
        steps: Number of diffusion steps
        true_cfg_scale: Qwen-Image CFG scale
        seed: Random seed
        negative_prompt: Negative prompt
        num_outputs_per_prompt: Number of images to generate
        lora_path: Server-local LoRA adapter folder path (PEFT format)
        lora_name: LoRA name (optional, defaults to path stem)
        lora_scale: LoRA scale factor (default: 1.0)
        lora_int_id: LoRA integer ID (optional, derived from path if not provided)
        use_system_prompt: System prompt for generation.
        system_prompt: Custom system prompt.

    Returns:
        Image bytes or None if failed
    """
    payload: dict[str, object] = {
        "prompt": prompt,
        "response_format": "b64_json",
        "n": num_outputs_per_prompt,
    }

    if width is not None and height is not None:
        payload["size"] = f"{width}x{height}"
    elif width is not None:
        payload["size"] = f"{width}x{width}"
    elif height is not None:
        payload["size"] = f"{height}x{height}"

    if steps is not None:
        payload["num_inference_steps"] = steps
    if true_cfg_scale is not None:
        payload["true_cfg_scale"] = true_cfg_scale
    if negative_prompt:
        payload["negative_prompt"] = negative_prompt
    if seed is not None:
        payload["seed"] = seed
    if use_system_prompt is not None:
        payload["use_system_prompt"] = use_system_prompt
    if system_prompt is not None:
        payload["system_prompt"] = system_prompt
    # Add LoRA if provided
    if lora_path:
        lora_body: dict = {
            "local_path": lora_path,
            "name": lora_name or Path(lora_path).stem,
        }
        if lora_scale is not None:
            lora_body["scale"] = float(lora_scale)
        if lora_int_id is not None:
            lora_body["int_id"] = int(lora_int_id)
        payload["lora"] = lora_body

    try:
        response = requests.post(
            f"{server_url}/v1/images/generations",
            headers={"Content-Type": "application/json"},
            json=payload,
            timeout=300,
        )
        response.raise_for_status()
        data = response.json()

        items = data.get("data")
        if isinstance(items, list) and items:
            first = items[0].get("b64_json") if isinstance(items[0], dict) else None
            if isinstance(first, str):
                return base64.b64decode(first)

        print(f"Unexpected response format: {data}")
        return None

    except Exception as e:
        print(f"Error: {e}")
        return None


def main():
    parser = argparse.ArgumentParser(description="Qwen-Image chat client")
    parser.add_argument("--prompt", "-p", default="a cup of coffee on the table", help="Text prompt")
    parser.add_argument("--output", "-o", default="qwen_image_output.png", help="Output file")
    parser.add_argument("--server", "-s", default="http://localhost:8091", help="Server URL")
    parser.add_argument("--height", type=int, default=1024, help="Image height")
    parser.add_argument("--width", type=int, default=1024, help="Image width")
    parser.add_argument("--steps", type=int, default=50, help="Inference steps")
    parser.add_argument("--cfg-scale", type=float, default=4.0, help="True CFG scale")
    parser.add_argument("--seed", type=int, default=0, help="Random seed")
    parser.add_argument("--negative", help="Negative prompt")

    parser.add_argument("--lora-path", default=None, help="Server-local LoRA adapter folder (PEFT format)")
    parser.add_argument("--lora-name", default=None, help="LoRA name (optional)")
    parser.add_argument("--lora-scale", type=float, default=1.0, help="LoRA scale")
    parser.add_argument(
        "--lora-int-id",
        type=int,
        default=None,
        help="LoRA integer id (cache key). If omitted, the server derives a stable id from lora_path.",
    )
    parser.add_argument(
        "--use-system-prompt",
        type=str,
        default=None,
        help=(
            "System prompt for generation. Use predefined types: 'en_unified', 'en_vanilla', 'en_recaption', 'en_think_recaption', 'dynamic', or 'None'; Or provide custom text string directly. Recommended en_unified. "
        ),
    )
    parser.add_argument(
        "--system-prompt",
        type=str,
        default=None,
        help=("Custom system prompt. Used when --use-system-prompt is custom. "),
    )
    args = parser.parse_args()
    print(f"Generating image for: {args.prompt}")

    image_bytes = generate_image(
        prompt=args.prompt,
        server_url=args.server,
        height=args.height,
        width=args.width,
        steps=args.steps,
        true_cfg_scale=args.cfg_scale,
        seed=args.seed,
        negative_prompt=args.negative,
        lora_path=args.lora_path,
        lora_name=args.lora_name,
        lora_scale=args.lora_scale if args.lora_path else None,
        lora_int_id=args.lora_int_id if args.lora_path else None,
        use_system_prompt=args.use_system_prompt,
        system_prompt=args.system_prompt,
    )

    if image_bytes:
        output_path = Path(args.output)
        output_path.write_bytes(image_bytes)
        print(f"Image saved to: {output_path}")
        print(f"Size: {len(image_bytes) / 1024:.1f} KB")
    else:
        print("Failed to generate image")
        exit(1)


if __name__ == "__main__":
    main()

run_curl_text_to_image.sh

#!/bin/bash
# Qwen-Image text-to-image curl example

curl -X POST http://localhost:8091/v1/images/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a dragon laying over the spine of the Green Mountains of Vermont",
    "size": "1024x1024",
    "seed": 42
  }' | jq -r '.data[0].b64_json' | base64 -d > dragon.png

run_server.sh

#!/bin/bash
# Qwen-Image online serving startup script

MODEL="${MODEL:-Qwen/Qwen-Image}"
PORT="${PORT:-8091}"

echo "Starting Qwen-Image server..."
echo "Model: $MODEL"
echo "Port: $PORT"

vllm serve "$MODEL" --omni \
    --port "$PORT"