vllm_omni.diffusion.cache.cache_dit_backend ¶

cache-dit integration backend for vllm-omni.

This module provides a CacheDiTBackend class to enable cache-dit acceleration on diffusion pipelines in vllm-omni, supporting both single and dual-transformer architectures.

CUSTOM_DIT_ENABLERS `module-attribute` ¶

CUSTOM_DIT_ENABLERS: dict[str, Callable] = {}

RefreshCacheContextFunc `module-attribute` ¶

RefreshCacheContextFunc: TypeAlias = Callable[
    [Any, int, bool], None
]

logger `module-attribute` ¶

logger = init_logger(__name__)

BagelCachedAdapter ¶

Bases: CachedAdapter

Custom CachedAdapter for Bagel that uses BagelCachedContextManager and BagelCachedBlocks.

collect_unified_blocks `classmethod` ¶

collect_unified_blocks(
    block_adapter: BlockAdapter, contexts_kwargs: list[dict]
) -> list[dict[str, ModuleList]]

create_context `classmethod` ¶

create_context(
    block_adapter: BlockAdapter, **context_kwargs
) -> tuple[list[str], list[dict[str, Any]]]

BagelCachedBlocks ¶

Bases: CachedBlocks_Pattern_0_1_2

Custom CachedBlocks for Bagel that safely handles NaiveCache objects by adding isinstance checks in call_Mn_blocks and compute_or_prune.

call_Mn_blocks ¶

call_Mn_blocks(
    hidden_states: Tensor,
    encoder_hidden_states: Tensor,
    *args,
    **kwargs,
)

compute_or_prune ¶

compute_or_prune(
    block_id: int,
    block,
    hidden_states: Tensor,
    encoder_hidden_states: Tensor,
    *args,
    **kwargs,
)

BagelCachedContextManager ¶

Bases: CachedContextManager

Custom CachedContextManager for Bagel that safely handles NaiveCache objects (mapped to encoder_hidden_states) by skipping tensor operations on them.

apply_cache ¶

apply_cache(
    hidden_states: Tensor,
    encoder_hidden_states: Tensor = None,
    prefix: str = "Bn",
    encoder_prefix: str = "Bn_encoder",
) -> tuple[Tensor, Tensor | None]

CacheDiTAdapterConfig `dataclass` ¶

Config for creating a Cache DiT's block adapter; to enable CacheDiT, most models just need to define an instance of this class as a class var in the DiT.

block_forward_patterns `instance-attribute` ¶

block_forward_patterns: dict[str, ForwardPattern]

cached_adapter_cls `class-attribute` `instance-attribute` ¶

cached_adapter_cls: type[CachedAdapter] | None = None

check_forward_pattern `class-attribute` `instance-attribute` ¶

check_forward_pattern: bool = True

has_separate_cfg `class-attribute` `instance-attribute` ¶

has_separate_cfg: bool = False

CacheDiTBackend ¶

Bases: CacheBackend

Backend class for cache-dit acceleration on diffusion pipelines.

This class implements cache-dit acceleration (DBCache, SCM, TaylorSeer) using the cache-dit library. It inherits from CacheBackend and provides a unified interface for managing cache-dit acceleration on diffusion models.

Attributes:

Name	Type	Description
`config`		Cache configuration (DiffusionCacheConfig instance), inherited from CacheBackend.
`enabled`		Whether cache-dit is enabled on this pipeline, inherited from CacheBackend.
`_refresh_func`	`Callable[[Any, int, bool], None] \| None`	Internal refresh function for updating cache context.
`_last_num_inference_steps`	`int \| None`	Last num_inference_steps used for refresh optimization.

enable ¶

enable(pipeline: Any) -> None

Enable cache-dit on the pipeline if configured.

This method applies cache-dit acceleration to the appropriate transformer(s) in the pipeline. It handles both single-transformer and dual-transformer architectures (e.g., Wan2.2).

Parameters:

Name	Type	Description	Default
`pipeline`	`Any`	The diffusion pipeline instance.	required

is_enabled ¶

is_enabled() -> bool

Check if cache-dit is enabled on this pipeline.

Returns:

Type	Description
`bool`	True if cache-dit is enabled, False otherwise.

maybe_build_block_adapter `staticmethod` ¶

maybe_build_block_adapter(pipeline) -> BlockAdapter | None

If a module defines _cache_dit_adapter_config, build the corresponding block adapter.

maybe_get_cached_adapter_cls `staticmethod` ¶

maybe_get_cached_adapter_cls(
    pipeline,
) -> type[CachedAdapter] | None

If a module has a custom cached adapter type registered, e.g., SenseNova, retrieve it from the transformer's CacheDiTAdapterConfig.

refresh ¶

refresh(
    pipeline: Any,
    num_inference_steps: int,
    verbose: bool = True,
) -> None

Refresh cache context with new num_inference_steps.

This method updates the cache context when num_inference_steps changes during inference. For dual-transformer models (e.g., Wan2.2), it automatically splits the steps based on boundary_ratio.

Parameters:

Name	Type	Description	Default
`pipeline`	`Any`	The diffusion pipeline instance.	required
`num_inference_steps`	`int`	New number of inference steps.	required
`verbose`	`bool`	Whether to log refresh operations.	`True`

SensenovaCachedAdapter ¶

Bases: CachedAdapter

Custom CachedAdapter for SenseNova-U1 that uses SensenovaCachedBlocks.

collect_unified_blocks `classmethod` ¶

collect_unified_blocks(
    block_adapter: BlockAdapter, contexts_kwargs: list[dict]
) -> list[dict[str, ModuleList]]

SensenovaCachedBlocks ¶

Bases: CachedBlocks_Pattern_3_4_5

Custom CachedBlocks for SenseNova-U1 that only caches image-token hidden states during denoising.

forward ¶

forward(hidden_states: Tensor, *args, **kwargs)

Wan22S2VCachedAdapter ¶

Bases: CachedAdapter

CacheDiT adapter that uses Wan22S2VCachedBlocks for S2V audio injection.

Only overrides collect_unified_blocks to use Wan22S2VCachedBlocks (which calls after_transformer_block per-layer internally). The base class mock_transformer handles the forward wrapping — after_transformer_block is permanently replaced with a no-op in enable_cache_for_wan22_s2v() to prevent double injection.

collect_unified_blocks `classmethod` ¶

collect_unified_blocks(
    block_adapter: BlockAdapter, contexts_kwargs: list[dict]
) -> list[dict[str, ModuleList]]

Wan22S2VCachedBlocks ¶

Bases: CachedBlocks_Pattern_3_4_5

CacheDiT blocks wrapper that preserves S2V per-layer audio injection.

call_Bn_blocks ¶

call_Bn_blocks(hidden_states: Tensor, *args, **kwargs)

call_Fn_blocks ¶

call_Fn_blocks(hidden_states: Tensor, *args, **kwargs)

call_Mn_blocks ¶

call_Mn_blocks(hidden_states: Tensor, *args, **kwargs)

call_blocks ¶

call_blocks(hidden_states: Tensor, *args, **kwargs)

build_cache_context_refresh ¶

build_cache_context_refresh(
    cache_config: DiffusionCacheConfig,
    get_pipeline_transformer: Callable[
        [Any], Any
    ] = default_get_pipeline_transformer,
) -> RefreshCacheContextFunc

Build the cache context refresh func for a single Transformer.

cache_summary ¶

cache_summary(pipeline: Any, details: bool = True) -> None

default_get_pipeline_transformer ¶

default_get_pipeline_transformer(pipeline: Any) -> Any

enable_cache_for_cosmos3 ¶

enable_cache_for_cosmos3(
    pipeline: Any, cache_config: Any
) -> RefreshCacheContextFunc

Enable cache-dit for Cosmos3.

Cosmos3 has a dual-pathway architecture (UND + GEN) but only the GEN pathway (gen_layers) runs at every denoising step. The UND pathway computes once and its K/V are cached by the pipeline itself; no cache-dit needed there. We wrap only gen_layers via BlockAdapter.

Parameters:

Name	Type	Description	Default
`pipeline`	`Any`	The Cosmos3 pipeline instance.	required
`cache_config`	`Any`	DiffusionCacheConfig instance with cache configuration.	required

Returns:

Type	Description
`RefreshCacheContextFunc`	A refresh function that can be called to update cache context with new num_inference_steps.

enable_cache_for_dit ¶

enable_cache_for_dit(
    pipeline: Any,
    cache_config: Any,
    block_adapter: BlockAdapter | None = None,
    adapter_cls: type[CachedAdapter] | None = None,
) -> RefreshCacheContextFunc

Enable cache-dit for regular single-transformer DiT models.

Parameters:

Name	Type	Description	Default
`pipeline`	`Any`	The diffusion pipeline instance.	required
`cache_config`	`Any`	DiffusionCacheConfig instance with cache configuration.	required
`block_adapter`	`BlockAdapter \| None`	Custom block adapters for specific model architectures.	`None`
`adapter_cls`	`type[CachedAdapter] \| None`	Custom cached adapter class for specific model architectures.	`None`

Returns:

Type	Description
`RefreshCacheContextFunc`	A refresh function that can be called to update cache context with new num_inference_steps.

enable_cache_for_krea2 ¶

enable_cache_for_krea2(
    pipeline: Any, cache_config: Any
) -> RefreshCacheContextFunc

Enable cache-dit for Krea 2.

Krea 2 is a single-stream MMDiT: each Krea2TransformerBlock takes and returns only hidden_states (text is fused into the token stream), so the blocks follow ForwardPattern.Pattern_3.

has_separate_cfg is checkpoint-dependent, which is why this needs a custom enabler rather than a static _cache_dit_adapter_config: the distilled Turbo checkpoint runs no-CFG (a single transformer forward per denoise step), while the Raw checkpoint runs CFG as two separate forwards. cache-dit tells cond/uncond apart purely by transformer-forward parity, so the flag must match the actual per-step forward count — True only for the CFG (Raw) path. The pipeline exposes this via is_distilled (read from model_index.json).

Parameters:

Name	Type	Description	Default
`pipeline`	`Any`	The Krea2Pipeline instance.	required
`cache_config`	`Any`	DiffusionCacheConfig instance with cache configuration.	required

Returns:

Type	Description
`RefreshCacheContextFunc`	A refresh function that can be called to update cache context with new num_inference_steps.

enable_cache_for_wan22 ¶

enable_cache_for_wan22(
    pipeline: Any, cache_config: Any
) -> RefreshCacheContextFunc

Enable cache-dit for Wan2.2 single or dual-transformer architecture.

Wan2.2 can use single or dual transformers (transformer and transformer_2) that need to be enabled using BlockAdapter.

Parameters:

Name	Type	Description	Default
`pipeline`	`Any`	The Wan2.2 pipeline instance.	required
`cache_config`	`Any`	DiffusionCacheConfig instance with cache configuration.	required

Returns:

Type	Description
`RefreshCacheContextFunc`	A refresh function that can be called to update cache context with new num_inference_steps.

enable_cache_for_wan22_s2v ¶

enable_cache_for_wan22_s2v(
    pipeline: Any, cache_config: Any
) -> RefreshCacheContextFunc

Enable cache-dit for Wan2.2 S2V.

S2V uses a single transformer, but unlike the other Wan2.2 variants its block loop calls each block as block(hidden_states, **kwargs) and keeps the timestep modulation state in e rather than a second positional tensor. CacheDiT Pattern_3 matches that contract: cache hidden states only and pass the remaining conditioning through kwargs unchanged.

The S2V transformer has an after_transformer_block method that injects audio embeddings after specific layers. The cached blocks wrapper (Wan22S2VCachedBlocks._run_block) calls the original internally, so we permanently replace it with a no-op on the transformer to prevent double injection from the main forward loop.

may_enable_cache_dit ¶

may_enable_cache_dit(
    pipeline: Any, od_config: OmniDiffusionConfig
) -> Optional[CacheDiTBackend]

Enable cache-dit on the pipeline if configured (convenience function).

This is a convenience function that creates and enables a CacheDiTBackend. For new code, consider using CacheDiTBackend directly.

Parameters:

Name	Type	Description	Default
`pipeline`	`Any`	The diffusion pipeline instance.	required
`od_config`	`OmniDiffusionConfig`	OmniDiffusionConfig with cache configuration.	required

Returns:

Type	Description
`Optional[CacheDiTBackend]`	A CacheDiTBackend instance if cache-dit is enabled, None otherwise.

vllm_omni.diffusion.cache.cache_dit_backend ¶

CUSTOM_DIT_ENABLERS module-attribute ¶

RefreshCacheContextFunc module-attribute ¶

logger module-attribute ¶

BagelCachedAdapter ¶

collect_unified_blocks classmethod ¶

create_context classmethod ¶

BagelCachedBlocks ¶

call_Mn_blocks ¶

compute_or_prune ¶

BagelCachedContextManager ¶

apply_cache ¶

CacheDiTAdapterConfig dataclass ¶

block_forward_patterns instance-attribute ¶

cached_adapter_cls class-attribute instance-attribute ¶

check_forward_pattern class-attribute instance-attribute ¶

has_separate_cfg class-attribute instance-attribute ¶

CacheDiTBackend ¶

enable ¶

is_enabled ¶

maybe_build_block_adapter staticmethod ¶

maybe_get_cached_adapter_cls staticmethod ¶

refresh ¶

SensenovaCachedAdapter ¶

collect_unified_blocks classmethod ¶

SensenovaCachedBlocks ¶

forward ¶

Wan22S2VCachedAdapter ¶

collect_unified_blocks classmethod ¶

Wan22S2VCachedBlocks ¶

call_Bn_blocks ¶

call_Fn_blocks ¶

call_Mn_blocks ¶

call_blocks ¶

build_cache_context_refresh ¶

cache_summary ¶

default_get_pipeline_transformer ¶

enable_cache_for_cosmos3 ¶

enable_cache_for_dit ¶

enable_cache_for_krea2 ¶

enable_cache_for_wan22 ¶

enable_cache_for_wan22_s2v ¶

may_enable_cache_dit ¶

CUSTOM_DIT_ENABLERS `module-attribute` ¶

RefreshCacheContextFunc `module-attribute` ¶

logger `module-attribute` ¶

collect_unified_blocks `classmethod` ¶

create_context `classmethod` ¶

CacheDiTAdapterConfig `dataclass` ¶

block_forward_patterns `instance-attribute` ¶

cached_adapter_cls `class-attribute` `instance-attribute` ¶

check_forward_pattern `class-attribute` `instance-attribute` ¶

has_separate_cfg `class-attribute` `instance-attribute` ¶

maybe_build_block_adapter `staticmethod` ¶

maybe_get_cached_adapter_cls `staticmethod` ¶

collect_unified_blocks `classmethod` ¶

collect_unified_blocks `classmethod` ¶