Skip to content

vllm_omni.diffusion.cache.cache_dit_backend

cache-dit integration backend for vllm-omni.

This module provides a CacheDiTBackend class to enable cache-dit acceleration on diffusion pipelines in vllm-omni, supporting both single and dual-transformer architectures.

CUSTOM_DIT_ENABLERS module-attribute

CUSTOM_DIT_ENABLERS: dict[str, Callable] = {}

RefreshCacheContextFunc module-attribute

RefreshCacheContextFunc: TypeAlias = Callable[
    [Any, int, bool], None
]

logger module-attribute

logger = init_logger(__name__)

BagelCachedAdapter

Bases: CachedAdapter

Custom CachedAdapter for Bagel that uses BagelCachedContextManager and BagelCachedBlocks.

collect_unified_blocks classmethod

collect_unified_blocks(
    block_adapter: BlockAdapter, contexts_kwargs: list[dict]
) -> list[dict[str, ModuleList]]

create_context classmethod

create_context(
    block_adapter: BlockAdapter, **context_kwargs
) -> tuple[list[str], list[dict[str, Any]]]

BagelCachedBlocks

Bases: CachedBlocks_Pattern_0_1_2

Custom CachedBlocks for Bagel that safely handles NaiveCache objects by adding isinstance checks in call_Mn_blocks and compute_or_prune.

call_Mn_blocks

call_Mn_blocks(
    hidden_states: Tensor,
    encoder_hidden_states: Tensor,
    *args,
    **kwargs,
)

compute_or_prune

compute_or_prune(
    block_id: int,
    block,
    hidden_states: Tensor,
    encoder_hidden_states: Tensor,
    *args,
    **kwargs,
)

BagelCachedContextManager

Bases: CachedContextManager

Custom CachedContextManager for Bagel that safely handles NaiveCache objects (mapped to encoder_hidden_states) by skipping tensor operations on them.

apply_cache

apply_cache(
    hidden_states: Tensor,
    encoder_hidden_states: Tensor = None,
    prefix: str = "Bn",
    encoder_prefix: str = "Bn_encoder",
) -> tuple[Tensor, Tensor | None]

CacheDiTAdapterConfig dataclass

Config for creating a Cache DiT's block adapter; to enable CacheDiT, most models just need to define an instance of this class as a class var in the DiT.

block_forward_patterns instance-attribute

block_forward_patterns: dict[str, ForwardPattern]

cached_adapter_cls class-attribute instance-attribute

cached_adapter_cls: type[CachedAdapter] | None = None

check_forward_pattern class-attribute instance-attribute

check_forward_pattern: bool = True

has_separate_cfg class-attribute instance-attribute

has_separate_cfg: bool = False

CacheDiTBackend

Bases: CacheBackend

Backend class for cache-dit acceleration on diffusion pipelines.

This class implements cache-dit acceleration (DBCache, SCM, TaylorSeer) using the cache-dit library. It inherits from CacheBackend and provides a unified interface for managing cache-dit acceleration on diffusion models.

Attributes:

Name Type Description
config

Cache configuration (DiffusionCacheConfig instance), inherited from CacheBackend.

enabled

Whether cache-dit is enabled on this pipeline, inherited from CacheBackend.

_refresh_func Callable[[Any, int, bool], None] | None

Internal refresh function for updating cache context.

_last_num_inference_steps int | None

Last num_inference_steps used for refresh optimization.

enable

enable(pipeline: Any) -> None

Enable cache-dit on the pipeline if configured.

This method applies cache-dit acceleration to the appropriate transformer(s) in the pipeline. It handles both single-transformer and dual-transformer architectures (e.g., Wan2.2).

Parameters:

Name Type Description Default
pipeline Any

The diffusion pipeline instance.

required

is_enabled

is_enabled() -> bool

Check if cache-dit is enabled on this pipeline.

Returns:

Type Description
bool

True if cache-dit is enabled, False otherwise.

maybe_build_block_adapter staticmethod

maybe_build_block_adapter(pipeline) -> BlockAdapter | None

If a module defines _cache_dit_adapter_config, build the corresponding block adapter.

maybe_get_cached_adapter_cls staticmethod

maybe_get_cached_adapter_cls(
    pipeline,
) -> type[CachedAdapter] | None

If a module has a custom cached adapter type registered, e.g., SenseNova, retrieve it from the transformer's CacheDiTAdapterConfig.

refresh

refresh(
    pipeline: Any,
    num_inference_steps: int,
    verbose: bool = True,
) -> None

Refresh cache context with new num_inference_steps.

This method updates the cache context when num_inference_steps changes during inference. For dual-transformer models (e.g., Wan2.2), it automatically splits the steps based on boundary_ratio.

Parameters:

Name Type Description Default
pipeline Any

The diffusion pipeline instance.

required
num_inference_steps int

New number of inference steps.

required
verbose bool

Whether to log refresh operations.

True

SensenovaCachedAdapter

Bases: CachedAdapter

Custom CachedAdapter for SenseNova-U1 that uses SensenovaCachedBlocks.

collect_unified_blocks classmethod

collect_unified_blocks(
    block_adapter: BlockAdapter, contexts_kwargs: list[dict]
) -> list[dict[str, ModuleList]]

SensenovaCachedBlocks

Bases: CachedBlocks_Pattern_3_4_5

Custom CachedBlocks for SenseNova-U1 that only caches image-token hidden states during denoising.

forward

forward(hidden_states: Tensor, *args, **kwargs)

Wan22S2VCachedAdapter

Bases: CachedAdapter

CacheDiT adapter that uses Wan22S2VCachedBlocks for S2V audio injection.

Only overrides collect_unified_blocks to use Wan22S2VCachedBlocks (which calls after_transformer_block per-layer internally). The base class mock_transformer handles the forward wrapping — after_transformer_block is permanently replaced with a no-op in enable_cache_for_wan22_s2v() to prevent double injection.

collect_unified_blocks classmethod

collect_unified_blocks(
    block_adapter: BlockAdapter, contexts_kwargs: list[dict]
) -> list[dict[str, ModuleList]]

Wan22S2VCachedBlocks

Bases: CachedBlocks_Pattern_3_4_5

CacheDiT blocks wrapper that preserves S2V per-layer audio injection.

call_Bn_blocks

call_Bn_blocks(hidden_states: Tensor, *args, **kwargs)

call_Fn_blocks

call_Fn_blocks(hidden_states: Tensor, *args, **kwargs)

call_Mn_blocks

call_Mn_blocks(hidden_states: Tensor, *args, **kwargs)

call_blocks

call_blocks(hidden_states: Tensor, *args, **kwargs)

build_cache_context_refresh

build_cache_context_refresh(
    cache_config: DiffusionCacheConfig,
    get_pipeline_transformer: Callable[
        [Any], Any
    ] = default_get_pipeline_transformer,
) -> RefreshCacheContextFunc

Build the cache context refresh func for a single Transformer.

cache_summary

cache_summary(pipeline: Any, details: bool = True) -> None

default_get_pipeline_transformer

default_get_pipeline_transformer(pipeline: Any) -> Any

enable_cache_for_cosmos3

enable_cache_for_cosmos3(
    pipeline: Any, cache_config: Any
) -> RefreshCacheContextFunc

Enable cache-dit for Cosmos3.

Cosmos3 has a dual-pathway architecture (UND + GEN) but only the GEN pathway (gen_layers) runs at every denoising step. The UND pathway computes once and its K/V are cached by the pipeline itself; no cache-dit needed there. We wrap only gen_layers via BlockAdapter.

Parameters:

Name Type Description Default
pipeline Any

The Cosmos3 pipeline instance.

required
cache_config Any

DiffusionCacheConfig instance with cache configuration.

required

Returns:

Type Description
RefreshCacheContextFunc

A refresh function that can be called to update cache context with new num_inference_steps.

enable_cache_for_dit

enable_cache_for_dit(
    pipeline: Any,
    cache_config: Any,
    block_adapter: BlockAdapter | None = None,
    adapter_cls: type[CachedAdapter] | None = None,
) -> RefreshCacheContextFunc

Enable cache-dit for regular single-transformer DiT models.

Parameters:

Name Type Description Default
pipeline Any

The diffusion pipeline instance.

required
cache_config Any

DiffusionCacheConfig instance with cache configuration.

required
block_adapter BlockAdapter | None

Custom block adapters for specific model architectures.

None
adapter_cls type[CachedAdapter] | None

Custom cached adapter class for specific model architectures.

None

Returns:

Type Description
RefreshCacheContextFunc

A refresh function that can be called to update cache context with new num_inference_steps.

enable_cache_for_wan22

enable_cache_for_wan22(
    pipeline: Any, cache_config: Any
) -> RefreshCacheContextFunc

Enable cache-dit for Wan2.2 single or dual-transformer architecture.

Wan2.2 can use single or dual transformers (transformer and transformer_2) that need to be enabled using BlockAdapter.

Parameters:

Name Type Description Default
pipeline Any

The Wan2.2 pipeline instance.

required
cache_config Any

DiffusionCacheConfig instance with cache configuration.

required

Returns:

Type Description
RefreshCacheContextFunc

A refresh function that can be called to update cache context with new num_inference_steps.

enable_cache_for_wan22_s2v

enable_cache_for_wan22_s2v(
    pipeline: Any, cache_config: Any
) -> RefreshCacheContextFunc

Enable cache-dit for Wan2.2 S2V.

S2V uses a single transformer, but unlike the other Wan2.2 variants its block loop calls each block as block(hidden_states, **kwargs) and keeps the timestep modulation state in e rather than a second positional tensor. CacheDiT Pattern_3 matches that contract: cache hidden states only and pass the remaining conditioning through kwargs unchanged.

The S2V transformer has an after_transformer_block method that injects audio embeddings after specific layers. The cached blocks wrapper (Wan22S2VCachedBlocks._run_block) calls the original internally, so we permanently replace it with a no-op on the transformer to prevent double injection from the main forward loop.

may_enable_cache_dit

may_enable_cache_dit(
    pipeline: Any, od_config: OmniDiffusionConfig
) -> Optional[CacheDiTBackend]

Enable cache-dit on the pipeline if configured (convenience function).

This is a convenience function that creates and enables a CacheDiTBackend. For new code, consider using CacheDiTBackend directly.

Parameters:

Name Type Description Default
pipeline Any

The diffusion pipeline instance.

required
od_config OmniDiffusionConfig

OmniDiffusionConfig with cache configuration.

required

Returns:

Type Description
Optional[CacheDiTBackend]

A CacheDiTBackend instance if cache-dit is enabled, None otherwise.