Image Edit API¶

vLLM-Omni provides an OpenAI DALL-E compatible API for image edit using diffusion models.

Each server instance runs a single model (specified at startup via vllm serve <model> --omni).

Quick Start¶

Start the Server¶

For example...

# Qwen-Image
vllm serve Qwen/Qwen-Image-Edit-2511 --omni --port 8000

Generate Images¶

Using curl:

curl -s -D >(grep -i x-request-id >&2) \
  -o >(jq -r '.data[0].b64_json' | base64 --decode > gift-basket.png) \
  -X POST "http://localhost:8000/v1/images/edits" \
  -F "model=xxx" \
  -F "image=@./xx.png" \
  -F "prompt='this bear is wearing sportwear. holding a basketball, and bending one leg.'" \
  -F "size=1024x1024" \
  -F "output_format=png"

Using OpenAI SDK:

import base64
from openai import OpenAI
from pathlib import Path
client = OpenAI(
    api_key="None",
    base_url="http://localhost:8000/v1"
)

input_image_url = "https://vllm-public-assets.s3.us-west-2.amazonaws.com/omni-assets/qwen-bear.png"

result = client.images.edit(
    image=[],
    model="Qwen-Image-Edit-2511",
    prompt="Change the bears in the two input images into walking together.",
    size='512x512',
    stream=False,
    output_format='jpeg',
    # url格式
    extra_body={
        "url": [input_image_url,input_image_url],
        "num_inference_steps": 50,
        "guidance_scale": 1,
        "seed": 777,
    }
)

image_base64 = result.data[0].b64_json
image_bytes = base64.b64decode(image_base64)

# Save the image to a file
with open("edit_out_http.jpeg", "wb") as f:
    f.write(image_bytes)

API Reference¶

Endpoint¶

POST /v1/images/edits
Content-Type: multipart/form-data

Request Parameters¶

OpenAI Standard Parameters¶

Parameter	Type	Default	Description
`prompt`	string	required	A text description of the desired image
`model`	string	server's model	Model to use (optional, should match server if specified)
`image`	string or array	required	The image(s) to edit.
`n`	integer	1	Number of images to generate (1-10)
`size`	string	"auto"	Image dimensions in WxH format (e.g., "1024x1024", "512x512"), when set to auto, it decide size from first input image.
`response_format`	string	"b64_json"	Response format (only "b64_json" supported)
`user`	string	null	User identifier for tracking
`output_format`	string	"png"	The format in which the generated images are returned. Must be one of "png", "jpg", "jpeg", "webp".
`output_compression`	integer	100	The compression level (0-100%) for the generated images.
`background`	string or null	"auto"	Allows to set transparency for the background of the generated image(s).
`stream`	boolean	false	Return Server-Sent Events instead of a single JSON response. Currently supported only by multi-stage image edit pipelines that produce AR text before the final image, such as HunyuanImage3 IT2I AR+DiT.

vllm-omni Extension Parameters¶

Parameter	Type	Default	Description
`url`	string or array	None	The image(s) to edit.
`negative_prompt`	string	null	Text describing what to avoid in the image
`num_inference_steps`	integer	model defaults	Number of diffusion steps
`guidance_scale`	float	model defaults	Classifier-free guidance scale (typically 0.0-20.0)
`true_cfg_scale`	float	model defaults	True CFG scale (model-specific parameter, may be ignored if not supported)
`seed`	integer	null	Random seed for reproducibility
`reference_image`	string or array	null	Reference image for inpainting
`mask_image`	string or array	null	Mask for inpainting (white areas will be inpainted)

Response Format¶

When stream=false or omitted, the endpoint returns the standard image edit response:

{
  "created": 1701234567,
  "data": [
    {
      "b64_json": "<base64-encoded PNG>",
      "url": null,
      "revised_prompt": null
    }
  ],
  "output_format": null,
  "size": null
}

Streaming Response Format¶

Set stream=true when you want to receive the HunyuanImage3 IT2I AR recaption text before the final edited image is ready. Streaming is only available for multi-stage image edit pipelines where stage 0 produces AR text and a later diffusion stage produces the image. Single-stage diffusion image edit pipelines reject stream=true with HTTP 400.

The response uses Server-Sent Events with Content-Type: text/event-stream. Each event is sent as one data: line. The stream order is:

One or more AR text delta chunks.
One final image chunk.
data: [DONE].

AR text chunks expose only the generated text delta and the AR completion index. They do not expose token ids, ratio tokens, KV cache data, or internal prompt token ids.

Example AR delta event:

data: {"object":"image.edit.chunk","type":"ar_delta","delta":"A close-up product photo...","index":0,"created":1701234567,"model":"tencent/HunyuanImage-3.0-Instruct"}

Example final image event:

data: {"object":"image.edit.chunk","type":"image","data":[{"b64_json":"<base64-encoded PNG>","url":null,"revised_prompt":null}],"output_format":"png","size":"1024x1024","created":1701234567,"model":"tencent/HunyuanImage-3.0-Instruct"}

Terminal event:

data: [DONE]

If the engine fails after the SSE response has started, the stream emits an error chunk followed by [DONE]:

data: {"object":"error","created":1701234567,"model":"tencent/HunyuanImage-3.0-Instruct","error":{"message":"<error message>","type":"server_error","code":500}}

data: [DONE]

Examples¶

Multiple Images input¶

curl -s -D >(grep -i x-request-id >&2) \
  -o >(jq -r '.data[0].b64_json' | base64 --decode > gift-basket.png) \
  -X POST "http://localhost:8000/v1/images/edits" \
  -F "model=xxx" \
  -F "[email protected]" \
  -F "[email protected]" \
  -F "prompt='this bear is wearing sportwear. holding a basketball, and bending one leg.'" \
  -F "size=1024x1024" \
  -F "output_format=png"

Streaming HunyuanImage3 IT2I AR Text¶

curl -N -X POST "http://localhost:8000/v1/images/edits" \
  -F "model=tencent/HunyuanImage-3.0-Instruct" \
  -F "image=@./input.png" \
  -F "prompt=Turn the product into a clean studio advertisement." \
  -F "size=1024x1024" \
  -F "output_format=png" \
  -F "stream=true"

Use the ar_delta events for progressive display of the AR-generated recaption. Decode the b64_json field from the final image event to get the edited image.

Parameter Handling¶

The API passes parameters directly to the diffusion pipeline without model-specific transformation:

Default values: When parameters are not specified, the underlying model uses its own defaults
Pass-through design: User-provided values are forwarded directly to the diffusion engine
Minimal validation: Only basic type checking and range validation at the API level

Parameter Compatibility¶

The API passes parameters directly to the diffusion pipeline without model-specific validation.

Unsupported parameters may be silently ignored by the model
Incompatible values will result in errors from the underlying pipeline
Recommended values vary by model - consult model documentation

Best Practice: Start with the model's recommended parameters, then adjust based on your needs.

Error Responses¶

400 Bad Request¶

Invalid parameters (e.g., model mismatch):

{
  "detail": "Invalid size format: '1024x'. Expected format: 'WIDTHxHEIGHT' (e.g., '1024x1024')."
}

422 Unprocessable Entity¶

Validation errors (missing required fields):

{
  "detail": "Field 'image' or 'url' is required"
}

Troubleshooting¶

Server Not Running¶

# Check if server is responding
curl http://localhost:8000/v1/images/edits \
  -F "prompt='test'"

Out of Memory¶

If you encounter OOM errors: 1. Reduce image size: "size": "512x512" 2. Reduce inference steps: "num_inference_steps": 25

Development¶

Enable debug logging to see prompts and generation details:

vllm serve Qwen/Qwen-Image-Edit-2511 --omni \
  --uvicorn-log-level debug