HunyuanImage-3.0-Instruct¶

Source https://github.com/vllm-project/vllm-omni/tree/main/examples/offline_inference/hunyuan_image3.

This example runs HunyuanImage-3.0-Instruct offline with the unified deploy YAMLs under vllm_omni/deploy/.

Deploy Configs¶

File	Topology	Default use
`vllm_omni/deploy/hunyuan_image3.yaml`	AR + DiT	Default for `text2img` and `img2img`.
`vllm_omni/deploy/hunyuan_image3_ar.yaml`	AR only	Default for `img2text` and `text2text`.
`vllm_omni/deploy/hunyuan_image3_dit.yaml`	DiT only	Standalone diffusion stage. Pass it explicitly with `--deploy-config`.

The example chooses a deploy config automatically when --deploy-config and --stage-configs-path are both omitted:

`--modality`	`mode` passed to Omni	Default deploy
`text2img`	`text-to-image`	`hunyuan_image3.yaml`
`img2img`	`image-editing`	`hunyuan_image3.yaml`
`img2text`	`image-to-text`	`hunyuan_image3_ar.yaml`
`text2text`	`text-to-text`	`hunyuan_image3_ar.yaml`

--modality is an offline example convenience flag. It maps to the internal mode argument passed to Omni(...) by this script. HunyuanImage3 uses separate deploy YAMLs for AR + DiT, AR-only, and DiT-only topologies, so the stage topology is selected by the deploy file rather than by YAML mode overrides.

Online serving does not expose a --modality flag or accept mode as an API request field. Choose the deploy topology when starting the server with --deploy-config, then use the OpenAI-compatible endpoint and request shape for the scenario. The modalities request field is used by the chat completions path; the image endpoints infer the image task from the endpoint and payload.

Online scenario	Server deploy	Request
Text to image	`--deploy-config vllm_omni/deploy/hunyuan_image3.yaml`	`POST /v1/images/generations`, or `POST /v1/chat/completions` with `"modalities": ["image"]`.
Image editing	`--deploy-config vllm_omni/deploy/hunyuan_image3.yaml`	`POST /v1/images/edits`.
Image/text to text	`--deploy-config vllm_omni/deploy/hunyuan_image3_ar.yaml`	`POST /v1/chat/completions` for text output, for example with `"modalities": ["text"]`.
DiT-only image generation	`--deploy-config vllm_omni/deploy/hunyuan_image3_dit.yaml`	`POST /v1/images/generations`.

Run Examples¶

Text to image, using the default AR + DiT deploy:

python examples/offline_inference/hunyuan_image3/end2end.py \
  --model tencent/HunyuanImage-3.0-Instruct \
  --modality text2img \
  --prompts "A cute cat sitting on a windowsill watching the sunset"

Image editing, using the default AR + DiT deploy:

python examples/offline_inference/hunyuan_image3/end2end.py \
  --model tencent/HunyuanImage-3.0-Instruct \
  --modality img2img \
  --image-path /path/to/image.png \
  --prompts "Make the petals neon pink"

Image to text, using the AR-only deploy:

python examples/offline_inference/hunyuan_image3/end2end.py \
  --model tencent/HunyuanImage-3.0-Instruct \
  --modality img2text \
  --image-path /path/to/image.jpg \
  --prompts "Describe the content of the picture."

Text to text, using the AR-only deploy:

python examples/offline_inference/hunyuan_image3/end2end.py \
  --model tencent/HunyuanImage-3.0-Instruct \
  --modality text2text \
  --prompts "What is the capital of France?"

Standalone DiT, using the DiT-only deploy explicitly:

python examples/offline_inference/hunyuan_image3/end2end.py \
  --model tencent/HunyuanImage-3.0-Instruct \
  --modality text2img \
  --deploy-config vllm_omni/deploy/hunyuan_image3_dit.yaml \
  --prompts "A cinematic portrait of an astronaut in a greenhouse"

Override the default full AR + DiT deploy explicitly:

python examples/offline_inference/hunyuan_image3/end2end.py \
  --model tencent/HunyuanImage-3.0-Instruct \
  --modality text2img \
  --deploy-config vllm_omni/deploy/hunyuan_image3.yaml \
  --prompts "A cute cat"

Additional Config¶

You can pass diffusion worker additional_config from the offline example as a JSON object. This maps to the upstream vLLM VllmConfig.additional_config platform extension field: https://docs.vllm.ai/en/stable/api/vllm/config/#vllm.config.VllmConfig.additional_config

python end2end.py --modality text2img \
                  --prompts "A cute cat" \
                  --additional-config '{"torchair_graph_config":{"enabled":true}}'

Key Arguments¶

Argument	Description
`--deploy-config`	Preferred config path for unified deploy YAMLs.
`--stage-configs-path`	Legacy stage config path, kept only for compatibility. Prefer `--deploy-config`.
`--additional-config`	JSON object forwarded to diffusion worker `additional_config`.
`--modality`	Offline-only convenience flag. One of `text2img`, `img2img`, `img2text`, `text2text`. It selects prompt formatting, internal `mode`, and default deploy config for this script. Online serving uses `--deploy-config` plus the endpoint and, for chat completions, request `modalities` instead.
`--steps`	Number of diffusion inference steps for image generation.
`--guidance-scale`	Classifier-free guidance scale for image generation.
`--height`, `--width`	Output image size for `text2img`.
`--bot-task`	Override prompt mode. `none`, `think`, `recaption`, `think_recaption`, or `vanilla`.
`--sys-type`	Override the system prompt type, for example `en_unified` or `en_vanilla`.
`--vae-use-tiling`	Enable VAE tiling for memory reduction.

Notes¶

hunyuan_image3_ar.yaml is a 4-card AR-only text/comprehension deploy.
hunyuan_image3_dit.yaml is a single-stage DiT deploy with stage_id: 0.
The old HunyuanImage3 YAMLs under model_executor/stage_configs/ and platforms/*/stage_configs/ have been folded into the deploy YAMLs.

Prompt Format¶

HunyuanImage-3.0-Instruct uses an instruct chat template:

<|startoftext|>{system_prompt}

User: {<img>?}{user_prompt}

Assistant: {trigger_tag?}

<img>: Placeholder for each input image (single token; expanded by the multimodal pipeline).
Trigger tags: <think> for CoT and <recaption> for recaptioning, placed after Assistant:.
System prompt: Auto-selected from task and bot_task.
bot_task='vanilla' with task='t2i' uses the bare pretrain template.

The shared vllm_omni.diffusion.models.hunyuan_image3.prompt_utils.build_prompt_tokens() helper handles segment-by-segment tokenization and matches HF apply_chat_template.

Example materials¶

end2end.py

Large file omitted from the rendered docs. View it on GitHub: https://github.com/vllm-project/vllm-omni/blob/main/examples/offline_inference/hunyuan_image3/end2end.py.