vllm_omni.diffusion.models.hunyuan_image3.prompt_utils ¶

Shared prompt-template construction for HunyuanImage-3.0-Instruct.

Single source of truth for the AR-prefill prompt format used by the example scripts and any downstream caller that needs to build HunyuanImage3 chat-template token sequences without invoking the full diffusion pipeline tokenizer wrapper.

The DiT pipeline (pipeline_hunyuan_image3.py) builds prompts through TokenizerWrapper.apply_chat_template, which eagerly consumes JointImageInfo objects produced by image preprocessing. The example flow uses an <img> placeholder + multi_modal_data instead, so it needs a lighter-weight builder that only requires a HF tokenizer. This module provides that builder; the (task, bot_task) -> template mapping below is the canonical mapping for both flows.

Two orthogonal axes:

task selects the I/O modality combination, which only controls whether <img> placeholders are emitted between User: and the user prompt: i2t / it2i produce them, t2t / t2i do not.
bot_task selects the prompting mode and drives both the system prompt and the trigger tag appended after Assistant:. None (default) gives a plain Assistant turn under the unified prompt; think / recaption switch the trigger tag to <think> / <recaption>; think_recaption swaps the system prompt for the dedicated combined-mode template; vanilla drops the chat structure entirely (pretrain template, t2i only).

HUNYUAN_IMAGE3_SPECIAL_TOKEN_IDS `module-attribute` ¶

HUNYUAN_IMAGE3_SPECIAL_TOKEN_IDS: dict[str, int] = {
    "<|endoftext|>": 127957,
    "<|startoftext|>": 127958,
    "<boi>": 128000,
    "<eoi>": 128001,
    "<img>": 128006,
    "<cfg>": 128010,
    "<recaption>": 128018,
    "</recaption>": 128019,
    "<think>": 128023,
    "</think>": 128024,
    "<answer>": 128025,
    "</answer>": 128026,
    "<img_size_1024>": 128037,
    "<img_ratio_0>": 128044,
    "<img_ratio_32>": 128076,
    "<img_ratio_33>": 130103,
    "<img_ratio_36>": 130106,
}

MAX_IMAGES_PER_REQUEST `module-attribute` ¶

MAX_IMAGES_PER_REQUEST = 3

PromptTokensResult `dataclass` ¶

system_prompt_type `instance-attribute` ¶

system_prompt_type: str

token_ids `instance-attribute` ¶

token_ids: list[int]

available_bot_tasks ¶

available_bot_tasks() -> list[str | None]

Sorted list of bot_task values (with None first).

available_tasks ¶

available_tasks() -> list[str]

Sorted list of task values accepted by the prompt builders.

build_prompt ¶

build_prompt(
    user_prompt: str,
    task: str = "it2i",
    bot_task: str
    | None
    | _DefaultBotTask = _DEFAULT_BOT_TASK,
    sys_type: str | None = None,
    custom_system_prompt: str | None = None,
    num_images: int = 1,
) -> str

Build a HunyuanImage-3.0 prompt as a string (legacy/compat path).

build_prompt_tokens ¶

build_prompt_tokens(
    user_prompt: str,
    tokenizer,
    task: str = "it2i",
    bot_task: str
    | None
    | _DefaultBotTask = _DEFAULT_BOT_TASK,
    sys_type: str | None = None,
    custom_system_prompt: str | None = None,
    num_images: int = 1,
) -> PromptTokensResult

Segment-by-segment tokenization that matches HF apply_chat_template.

resolve_stop_token_ids ¶

resolve_stop_token_ids(
    task: str = "it2i",
    bot_task: str
    | None
    | _DefaultBotTask = _DEFAULT_BOT_TASK,
    tokenizer: Any | None = None,
    image_size: str | None = None,
) -> list[int]

AR stop-token ids for a given (task, bot_task) generation request.

Image-output tasks (it2i / t2i) stop on any <img_ratio_*> token. Upstream modeling_hunyuan_image_3.py::generate_image (line 3289-3303) sets final_stop_tokens to the full ratio token range when need_ratio is true, then strips the trailing ratio token before passing the cot to the image stage. AR's natural trajectory under _stage_transitions is </recaption><answer><boi><img_size_base><img_ratio_X>; stopping AT the ratio token means KV ends exactly at the prefix DiT reuses, and ar2diffusion can read the ratio off the last sampled token without AR wasting decode steps on <|endoftext|>.

Text-output tasks (i2t / t2t) stop on <answer> -- the AR is the final stage, and the comprehension response sits inside the <answer> body so the answer-open is the natural cot/recaption terminator.

resolve_sys_type ¶

resolve_sys_type(bot_task: str | None) -> str

Default system-prompt type for a given bot_task.