Skip to content

vllm_omni.diffusion.forward_context

ForwardContext dataclass

set forward context for diffusion models

attn_metadata class-attribute instance-attribute

attn_metadata: (
    dict[str, AttentionMetadata]
    | list[dict[str, AttentionMetadata]]
    | None
) = None

denoise_step_idx class-attribute instance-attribute

denoise_step_idx: int | None = None

omni_diffusion_config class-attribute instance-attribute

omni_diffusion_config: OmniDiffusionConfig | None = None

ref_latent class-attribute instance-attribute

ref_latent: Tensor | None = None

sp_active property

sp_active: bool

Returns True when SP attention parallelism should be enabled.

  • If _sp_plan hooks are applied: use _sp_shard_depth (0 = outside sharded region).
  • If _sp_plan hooks are NOT applied: default to True when sequence_parallel_size > 1, since _sp_shard_depth is only meaningful within the _sp_plan hook mechanism.

sp_original_seq_len class-attribute instance-attribute

sp_original_seq_len: int | None = None

sp_padding_size class-attribute instance-attribute

sp_padding_size: int = 0

sp_plan_hooks_applied class-attribute instance-attribute

sp_plan_hooks_applied: bool = False

split_text_embed_in_sp class-attribute instance-attribute

split_text_embed_in_sp: bool = False

vllm_config class-attribute instance-attribute

vllm_config: VllmConfig | None = None

create_forward_context

create_forward_context(
    vllm_config: VllmConfig | None = None,
    omni_diffusion_config: OmniDiffusionConfig
    | None = None,
    attn_metadata: dict[str, AttentionMetadata]
    | list[dict[str, AttentionMetadata]]
    | None = None,
    split_text_embed_in_sp: bool = False,
    denoise_step_idx: int | None = None,
)

get_forward_context

get_forward_context() -> ForwardContext

Get the current forward context.

get_ulysses_mode

get_ulysses_mode(*, default: str = 'strict') -> str

Resolve the Ulysses-SP mode from the current ForwardContext.

Returns default when ForwardContext is unavailable or the diffusion config is not set.

is_forward_context_available

is_forward_context_available() -> bool

override_forward_context

override_forward_context(
    forward_context: ForwardContext | None,
)

A context manager that overrides the current forward context. This is used to override the forward context for a specific forward pass.

set_forward_context

set_forward_context(
    vllm_config: VllmConfig | None = None,
    omni_diffusion_config: OmniDiffusionConfig
    | None = None,
    attn_metadata: dict[str, AttentionMetadata]
    | list[dict[str, AttentionMetadata]]
    | None = None,
    split_text_embed_in_sp: bool = False,
    denoise_step_idx: int | None = None,
)

A context manager that stores the current forward context, can be attention metadata, split_text_embed_in_sp, etc. Here we can inject common logic for every model forward pass.

set_forward_context_denoise_step_idx

set_forward_context_denoise_step_idx(
    step_idx: int | None,
) -> None

Set the current diffusion denoise step on the active ForwardContext.

set_forward_context_ref_latent

set_forward_context_ref_latent(
    ref_latent: Tensor | None,
) -> None

Set the per-request reference latent on the active ForwardContext.

Used by img2img-capable DiT models (e.g. Ming-flash-omni-2.0) so the transformer can read the reference latent from request scope instead of module instance state.