vllm_omni.diffusion.models.hunyuan_image3.prompt_utils ¶
Shared prompt-template construction for HunyuanImage-3.0-Instruct.
Single source of truth for the AR-prefill prompt format used by the example scripts and any downstream caller that needs to build HunyuanImage3 chat-template token sequences without invoking the full diffusion pipeline tokenizer wrapper.
The DiT pipeline (pipeline_hunyuan_image3.py) builds prompts through TokenizerWrapper.apply_chat_template, which eagerly consumes JointImageInfo objects produced by image preprocessing. The example flow uses an <img> placeholder + multi_modal_data instead, so it needs a lighter-weight builder that only requires a HF tokenizer. This module provides that builder; the (task, bot_task) -> template mapping below is the canonical mapping for both flows.
Two orthogonal axes:
-
taskselects the I/O modality combination, which only controls whether<img>placeholders are emitted betweenUser:and the user prompt:i2t/it2iproduce them,t2t/t2ido not. -
bot_taskselects the prompting mode and drives both the system prompt and the trigger tag appended afterAssistant:.None(default) gives a plain Assistant turn under the unified prompt;think/recaptionswitch the trigger tag to<think>/<recaption>;think_recaptionswaps the system prompt for the dedicated combined-mode template;vanilladrops the chat structure entirely (pretrain template,t2ionly).
HUNYUAN_IMAGE3_SPECIAL_TOKEN_IDS module-attribute ¶
HUNYUAN_IMAGE3_SPECIAL_TOKEN_IDS: dict[str, int] = {
"<|endoftext|>": 127957,
"<|startoftext|>": 127958,
"<boi>": 128000,
"<eoi>": 128001,
"<img>": 128006,
"<cfg>": 128010,
"<recaption>": 128018,
"</recaption>": 128019,
"<think>": 128023,
"</think>": 128024,
"<answer>": 128025,
"</answer>": 128026,
"<img_size_1024>": 128037,
"<img_ratio_0>": 128044,
"<img_ratio_32>": 128076,
"<img_ratio_33>": 130103,
"<img_ratio_36>": 130106,
}
PromptTokensResult dataclass ¶
available_bot_tasks ¶
Sorted list of bot_task values (with None first).
available_tasks ¶
Sorted list of task values accepted by the prompt builders.
build_prompt ¶
build_prompt(
user_prompt: str,
task: str = "it2i",
bot_task: str
| None
| _DefaultBotTask = _DEFAULT_BOT_TASK,
sys_type: str | None = None,
custom_system_prompt: str | None = None,
num_images: int = 1,
) -> str
Build a HunyuanImage-3.0 prompt as a string (legacy/compat path).
build_prompt_tokens ¶
build_prompt_tokens(
user_prompt: str,
tokenizer,
task: str = "it2i",
bot_task: str
| None
| _DefaultBotTask = _DEFAULT_BOT_TASK,
sys_type: str | None = None,
custom_system_prompt: str | None = None,
num_images: int = 1,
) -> PromptTokensResult
Segment-by-segment tokenization that matches HF apply_chat_template.
resolve_stop_token_ids ¶
resolve_stop_token_ids(
task: str = "it2i",
bot_task: str
| None
| _DefaultBotTask = _DEFAULT_BOT_TASK,
tokenizer: Any | None = None,
image_size: str | None = None,
) -> list[int]
AR stop-token ids for a given (task, bot_task) generation request.
Image-output tasks (it2i / t2i) stop on any <img_ratio_*> token. Upstream modeling_hunyuan_image_3.py::generate_image (line 3289-3303) sets final_stop_tokens to the full ratio token range when need_ratio is true, then strips the trailing ratio token before passing the cot to the image stage. AR's natural trajectory under _stage_transitions is </recaption><answer><boi><img_size_base><img_ratio_X>; stopping AT the ratio token means KV ends exactly at the prefix DiT reuses, and ar2diffusion can read the ratio off the last sampled token without AR wasting decode steps on <|endoftext|>.
Text-output tasks (i2t / t2t) stop on <answer> -- the AR is the final stage, and the comprehension response sits inside the <answer> body so the answer-open is the natural cot/recaption terminator.