Skip to content

vllm_omni.diffusion.worker.input_batch

Diffusion input-batch structures following the MRV2-style vLLM layout.

Request states remain the only persistent source of truth. Static tensors are normalized/padded onto the request state once, while :class:InputBatch assembles an ephemeral step-local view. Dynamic tensors are re-gathered every step, and step outputs are scattered back into request states by scatter_latents() using idx_mapping.

DiffusionInputBatch module-attribute

DiffusionInputBatch = InputBatch

InputBatch dataclass

Ephemeral step-level batch view.

Static request-local tensors are normalized and padded onto DiffusionRequestState itself, making the request state the persistent source of truth. InputBatch only assembles a contiguous view for the current step and refreshes dynamic fields in-place when composition is unchanged.

cfg_normalize class-attribute instance-attribute

cfg_normalize: bool = False

do_true_cfg class-attribute instance-attribute

do_true_cfg: bool = False

guidance class-attribute instance-attribute

guidance: Tensor | None = None

idx_mapping instance-attribute

idx_mapping: Tensor

idx_mapping_np instance-attribute

idx_mapping_np: ndarray

image_latents class-attribute instance-attribute

image_latents: Tensor | None = None

img_shapes class-attribute instance-attribute

img_shapes: list | None = None

latents instance-attribute

latents: Tensor

negative_prompt_embeds instance-attribute

negative_prompt_embeds: Tensor | None

negative_prompt_embeds_mask instance-attribute

negative_prompt_embeds_mask: Tensor | None

negative_txt_seq_lens class-attribute instance-attribute

negative_txt_seq_lens: list[int] | None = None

num_reqs instance-attribute

num_reqs: int

num_reqs_after_padding instance-attribute

num_reqs_after_padding: int

prompt_embeds instance-attribute

prompt_embeds: Tensor

prompt_embeds_mask instance-attribute

prompt_embeds_mask: Tensor | None

request_ids instance-attribute

request_ids: list[str]

timesteps instance-attribute

timesteps: Tensor

true_cfg_scale class-attribute instance-attribute

true_cfg_scale: float = 4.0

txt_seq_lens class-attribute instance-attribute

txt_seq_lens: list[int] | None = None

make_batch classmethod

make_batch(
    states: Sequence[DiffusionRequestState],
    idx_mapping: Tensor | None = None,
    cached_batch: InputBatch | None = None,
) -> InputBatch

Build a temporary step-local batch view from request states.

scatter_latents

scatter_latents(
    states: Sequence[DiffusionRequestState],
    input_batch: InputBatch,
) -> None

Scatter the step-updated latents back into persistent request states.

This is the CPU fallback of the vLLM-style post-update path. The mapping is driven entirely by input_batch.idx_mapping_np so the runner remains free to keep request states in its own persistent storage layout.