Skip to content

vllm_omni.diffusion.cache.cache_dit_backend

cache-dit integration backend for vllm-omni.

This module provides a CacheDiTBackend class to enable cache-dit acceleration on diffusion pipelines in vllm-omni, supporting both single and dual-transformer architectures.

CUSTOM_DIT_ENABLERS module-attribute

CUSTOM_DIT_ENABLERS: dict[str, Callable] = {}

logger module-attribute

logger = init_logger(__name__)

BagelCachedAdapter

Bases: CachedAdapter

Custom CachedAdapter for Bagel that uses BagelCachedContextManager and BagelCachedBlocks.

collect_unified_blocks classmethod

collect_unified_blocks(
    block_adapter: BlockAdapter, contexts_kwargs: list[dict]
) -> list[dict[str, ModuleList]]

create_context classmethod

create_context(
    block_adapter: BlockAdapter, **context_kwargs
) -> tuple[list[str], list[dict[str, Any]]]

BagelCachedBlocks

Bases: CachedBlocks_Pattern_0_1_2

Custom CachedBlocks for Bagel that safely handles NaiveCache objects by adding isinstance checks in call_Mn_blocks and compute_or_prune.

call_Mn_blocks

call_Mn_blocks(
    hidden_states: Tensor,
    encoder_hidden_states: Tensor,
    *args,
    **kwargs,
)

compute_or_prune

compute_or_prune(
    block_id: int,
    block,
    hidden_states: Tensor,
    encoder_hidden_states: Tensor,
    *args,
    **kwargs,
)

BagelCachedContextManager

Bases: CachedContextManager

Custom CachedContextManager for Bagel that safely handles NaiveCache objects (mapped to encoder_hidden_states) by skipping tensor operations on them.

apply_cache

apply_cache(
    hidden_states: Tensor,
    encoder_hidden_states: Tensor = None,
    prefix: str = "Bn",
    encoder_prefix: str = "Bn_encoder",
) -> tuple[Tensor, Tensor | None]

CacheDiTBackend

Bases: CacheBackend

Backend class for cache-dit acceleration on diffusion pipelines.

This class implements cache-dit acceleration (DBCache, SCM, TaylorSeer) using the cache-dit library. It inherits from CacheBackend and provides a unified interface for managing cache-dit acceleration on diffusion models.

Attributes:

Name Type Description
config

Cache configuration (DiffusionCacheConfig instance), inherited from CacheBackend.

enabled

Whether cache-dit is enabled on this pipeline, inherited from CacheBackend.

_refresh_func Callable[[Any, int, bool], None] | None

Internal refresh function for updating cache context.

_last_num_inference_steps int | None

Last num_inference_steps used for refresh optimization.

enable

enable(pipeline: Any) -> None

Enable cache-dit on the pipeline if configured.

This method applies cache-dit acceleration to the appropriate transformer(s) in the pipeline. It handles both single-transformer and dual-transformer architectures (e.g., Wan2.2).

Parameters:

Name Type Description Default
pipeline Any

The diffusion pipeline instance.

required

is_enabled

is_enabled() -> bool

Check if cache-dit is enabled on this pipeline.

Returns:

Type Description
bool

True if cache-dit is enabled, False otherwise.

refresh

refresh(
    pipeline: Any,
    num_inference_steps: int,
    verbose: bool = True,
) -> None

Refresh cache context with new num_inference_steps.

This method updates the cache context when num_inference_steps changes during inference. For dual-transformer models (e.g., Wan2.2), it automatically splits the steps based on boundary_ratio.

Parameters:

Name Type Description Default
pipeline Any

The diffusion pipeline instance.

required
num_inference_steps int

New number of inference steps.

required
verbose bool

Whether to log refresh operations.

True

SensenovaCachedAdapter

Bases: CachedAdapter

Custom CachedAdapter for SenseNova-U1 that uses SensenovaCachedBlocks.

collect_unified_blocks classmethod

collect_unified_blocks(
    block_adapter: BlockAdapter, contexts_kwargs: list[dict]
) -> list[dict[str, ModuleList]]

SensenovaCachedBlocks

Bases: CachedBlocks_Pattern_3_4_5

Custom CachedBlocks for SenseNova-U1 that only caches image-token hidden states during denoising.

forward

forward(hidden_states: Tensor, *args, **kwargs)

Wan22S2VCachedAdapter

Bases: CachedAdapter

CacheDiT adapter that uses Wan22S2VCachedBlocks for S2V audio injection.

Only overrides collect_unified_blocks to use Wan22S2VCachedBlocks (which calls after_transformer_block per-layer internally). The base class mock_transformer handles the forward wrapping — after_transformer_block is permanently replaced with a no-op in enable_cache_for_wan22_s2v() to prevent double injection.

collect_unified_blocks classmethod

collect_unified_blocks(
    block_adapter: BlockAdapter, contexts_kwargs: list[dict]
) -> list[dict[str, ModuleList]]

Wan22S2VCachedBlocks

Bases: CachedBlocks_Pattern_3_4_5

CacheDiT blocks wrapper that preserves S2V per-layer audio injection.

call_Bn_blocks

call_Bn_blocks(hidden_states: Tensor, *args, **kwargs)

call_Fn_blocks

call_Fn_blocks(hidden_states: Tensor, *args, **kwargs)

call_Mn_blocks

call_Mn_blocks(hidden_states: Tensor, *args, **kwargs)

call_blocks

call_blocks(hidden_states: Tensor, *args, **kwargs)

cache_summary

cache_summary(pipeline: Any, details: bool = True) -> None

enable_cache_for_bagel

enable_cache_for_bagel(
    pipeline: Any, cache_config: Any
) -> Callable[[int], None]

Enable cache-dit for Bagel model (via OmniDiffusion pipeline).

Parameters:

Name Type Description Default
pipeline Any

The OmniDiffusion pipeline instance.

required
cache_config Any

DiffusionCacheConfig instance with cache configuration.

required

Returns:

Type Description
Callable[[int], None]

A refresh function that can be called to update cache context with new num_inference_steps.

enable_cache_for_cosmos3

enable_cache_for_cosmos3(
    pipeline: Any, cache_config: Any
) -> Callable[[int], None]

Enable cache-dit for Cosmos3.

Cosmos3 has a dual-pathway architecture (UND + GEN) but only the GEN pathway (gen_layers) runs at every denoising step. The UND pathway computes once and its K/V are cached by the pipeline itself; no cache-dit needed there. We wrap only gen_layers via BlockAdapter.

Parameters:

Name Type Description Default
pipeline Any

The Cosmos3 pipeline instance.

required
cache_config Any

DiffusionCacheConfig instance with cache configuration.

required

Returns:

Type Description
Callable[[int], None]

A refresh function that can be called to update cache context with new num_inference_steps.

enable_cache_for_dit

enable_cache_for_dit(
    pipeline: Any, cache_config: Any
) -> Callable[[int], None]

Enable cache-dit for regular single-transformer DiT models.

Parameters:

Name Type Description Default
pipeline Any

The diffusion pipeline instance.

required
cache_config Any

DiffusionCacheConfig instance with cache configuration.

required

Returns:

Type Description
Callable[[int], None]

A refresh function that can be called to update cache context with new num_inference_steps.

enable_cache_for_dreamid_omni

enable_cache_for_dreamid_omni(
    pipeline: Any, cache_config: Any
) -> Callable[[int], None]

Enable cache-dit for DreamID-Omni fused pipeline.

Parameters:

Name Type Description Default
pipeline Any

The DreamIDOmni pipeline instance.

required
cache_config Any

DiffusionCacheConfig instance with cache configuration.

required

Returns:

Type Description
Callable[[int], None]

A refresh function that can be called with a new num_inference_steps

Callable[[int], None]

to update the cache context for the pipeline.

enable_cache_for_ernie_image

enable_cache_for_ernie_image(
    pipeline: Any, cache_config: Any
) -> Callable[[int], None]

Enable cache-dit for ERNIE-Image pipeline.

ERNIE-Image blocks have signature

forward(x, rotary_pos_emb, temb, attention_mask) -> x

Where x is hidden_states (concatenated image + text tokens). This matches Pattern_3 which expects: - Input: hidden_states only - Output: hidden_states only

Parameters:

Name Type Description Default
pipeline Any

The ERNIE-Image pipeline instance.

required
cache_config Any

DiffusionCacheConfig instance with cache configuration.

required

Returns: A refresh function that can be called to update cache context with new num_inference_steps.

enable_cache_for_flux

enable_cache_for_flux(
    pipeline: Any, cache_config: Any
) -> Callable[[int], None]

Enable cache-dit for Flux.1-dev pipeline.

Parameters:

Name Type Description Default
pipeline Any

The Flux pipeline instance.

required
cache_config Any

DiffusionCacheConfig instance with cache configuration.

required

Returns: A refresh function that can be called with a new num_inference_steps to update the cache context for the pipeline.

enable_cache_for_flux2

enable_cache_for_flux2(
    pipeline: Any, cache_config: Any
) -> Callable[[int], None]

Enable cache-dit for Flux.2-dev pipeline.

Parameters:

Name Type Description Default
pipeline Any

The Flux2 pipeline instance.

required
cache_config Any

DiffusionCacheConfig instance with cache configuration.

required

Returns: A refresh function that can be called with a new num_inference_steps to update the cache context for the pipeline.

enable_cache_for_flux2_klein

enable_cache_for_flux2_klein(
    pipeline: Any, cache_config: Any
) -> Callable[[int], None]

Enable cache-dit for FLUX.2-klein-4B pipeline.

Parameters:

Name Type Description Default
pipeline Any

The FLUX.2-klein-4B pipeline instance.

required
cache_config Any

DiffusionCacheConfig instance with cache configuration.

required

Returns: A refresh function that can be called with a new num_inference_steps to update the cache context for the pipeline.

enable_cache_for_glm_image

enable_cache_for_glm_image(
    pipeline: Any, cache_config: Any
) -> Callable[[int], None]

Enable cache-dit for GlmImage pipeline.

Parameters:

Name Type Description Default
pipeline Any

The GlmImage pipeline instance.

required
cache_config Any

DiffusionCacheConfig instance with cache configuration.

required

Returns: A refresh function that can be called with a new num_inference_steps to update the cache context for the pipeline.

enable_cache_for_helios

enable_cache_for_helios(
    pipeline: Any, cache_config: Any
) -> Callable[[int], None]

Enable cache-dit for Helios pipeline.

Helios extends Wan2.2 with multi-term memory patches and guidance cross-attention. Its transformer blocks have the same single-output signature as Wan2.2 blocks (returns hidden_states only), so we use ForwardPattern.Pattern_2.

Parameters:

Name Type Description Default
pipeline Any

The HeliosPipeline instance.

required
cache_config Any

DiffusionCacheConfig instance with cache configuration.

required

Returns:

Type Description
Callable[[int], None]

A refresh function that can be called to update cache context with new num_inference_steps.

enable_cache_for_hunyuan_image3

enable_cache_for_hunyuan_image3(
    pipeline: Any, cache_config: Any
) -> Callable[[int], None]

Enable cache-dit for HunyuanImage3 pipeline.

HunyuanImage3 stores its main transformer stack at pipeline.model with decoder blocks in pipeline.model.layers.

enable_cache_for_hunyuan_video_15

enable_cache_for_hunyuan_video_15(
    pipeline: Any, cache_config: Any
) -> Callable[[int], None]

Enable cache-dit for HunyuanVideo 1.5 pipeline.

HunyuanVideo 1.5 uses a single transformer with has_separate_cfg=True (separate conditional/unconditional forward passes). The _sp_plan scatter is applied at the transformer input boundary (empty-string key), so CacheDiT sees already-sharded hidden_states throughout transformer_blocks.

enable_cache_for_longcat_image

enable_cache_for_longcat_image(
    pipeline: Any, cache_config: Any
) -> Callable[[int], None]

Enable cache-dit for LongCatImage pipeline.

Parameters:

Name Type Description Default
pipeline Any

The LongCatImage pipeline instance.

required
cache_config Any

DiffusionCacheConfig instance with cache configuration.

required

enable_cache_for_ltx2

enable_cache_for_ltx2(
    pipeline: Any, cache_config: Any
) -> Callable[[int], None]

Enable cache-dit for LTX2 pipelines (audio-video transformer blocks).

enable_cache_for_sd3

enable_cache_for_sd3(
    pipeline: Any, cache_config: Any
) -> Callable[[int], None]

Enable cache-dit for StableDiffusion3Pipeline.

Parameters:

Name Type Description Default
pipeline Any

The StableDiffusion3 pipeline instance.

required
cache_config Any

DiffusionCacheConfig instance with cache configuration.

required

enable_cache_for_sensenova_u1

enable_cache_for_sensenova_u1(
    pipeline: Any, cache_config: Any
) -> Callable[[int], None]

Enable cache-dit for SenseNova-U1 model.

Parameters:

Name Type Description Default
pipeline Any

The SenseNova-U1 pipeline instance.

required
cache_config Any

DiffusionCacheConfig instance with cache configuration.

required

Returns:

Type Description
Callable[[int], None]

A refresh function that can be called to update cache context with new num_inference_steps.

enable_cache_for_stable_audio_open

enable_cache_for_stable_audio_open(
    pipeline: Any, cache_config: Any
) -> Callable[[int], None]

Enable cache-dit for Stable Audio Open pipeline.

Parameters:

Name Type Description Default
pipeline Any

The StableAudioPipeline instance.

required
cache_config Any

DiffusionCacheConfig instance with cache configuration.

required

Returns:

Type Description
Callable[[int], None]

A refresh function that can be called to update cache context with new num_inference_steps.

enable_cache_for_wan22

enable_cache_for_wan22(
    pipeline: Any, cache_config: Any
) -> Callable[[int], None]

Enable cache-dit for Wan2.2 single or dual-transformer architecture.

Wan2.2 can use single or dual transformers (transformer and transformer_2) that need to be enabled using BlockAdapter.

Parameters:

Name Type Description Default
pipeline Any

The Wan2.2 pipeline instance.

required
cache_config Any

DiffusionCacheConfig instance with cache configuration.

required

Returns:

Type Description
Callable[[int], None]

A refresh function that can be called to update cache context with new num_inference_steps.

enable_cache_for_wan22_s2v

enable_cache_for_wan22_s2v(
    pipeline: Any, cache_config: Any
) -> Callable[[int], None]

Enable cache-dit for Wan2.2 S2V.

S2V uses a single transformer, but unlike the other Wan2.2 variants its block loop calls each block as block(hidden_states, **kwargs) and keeps the timestep modulation state in e rather than a second positional tensor. CacheDiT Pattern_3 matches that contract: cache hidden states only and pass the remaining conditioning through kwargs unchanged.

The S2V transformer has an after_transformer_block method that injects audio embeddings after specific layers. The cached blocks wrapper (Wan22S2VCachedBlocks._run_block) calls the original internally, so we permanently replace it with a no-op on the transformer to prevent double injection from the main forward loop.

may_enable_cache_dit

may_enable_cache_dit(
    pipeline: Any, od_config: OmniDiffusionConfig
) -> Optional[CacheDiTBackend]

Enable cache-dit on the pipeline if configured (convenience function).

This is a convenience function that creates and enables a CacheDiTBackend. For new code, consider using CacheDiTBackend directly.

Parameters:

Name Type Description Default
pipeline Any

The diffusion pipeline instance.

required
od_config OmniDiffusionConfig

OmniDiffusionConfig with cache configuration.

required

Returns:

Type Description
Optional[CacheDiTBackend]

A CacheDiTBackend instance if cache-dit is enabled, None otherwise.