vllm_omni.diffusion.cache.cache_dit_backend ¶
cache-dit integration backend for vllm-omni.
This module provides a CacheDiTBackend class to enable cache-dit acceleration on diffusion pipelines in vllm-omni, supporting both single and dual-transformer architectures.
BagelCachedAdapter ¶
Bases: CachedAdapter
Custom CachedAdapter for Bagel that uses BagelCachedContextManager and BagelCachedBlocks.
BagelCachedBlocks ¶
Bases: CachedBlocks_Pattern_0_1_2
Custom CachedBlocks for Bagel that safely handles NaiveCache objects by adding isinstance checks in call_Mn_blocks and compute_or_prune.
BagelCachedContextManager ¶
Bases: CachedContextManager
Custom CachedContextManager for Bagel that safely handles NaiveCache objects (mapped to encoder_hidden_states) by skipping tensor operations on them.
CacheDiTBackend ¶
Bases: CacheBackend
Backend class for cache-dit acceleration on diffusion pipelines.
This class implements cache-dit acceleration (DBCache, SCM, TaylorSeer) using the cache-dit library. It inherits from CacheBackend and provides a unified interface for managing cache-dit acceleration on diffusion models.
Attributes:
| Name | Type | Description |
|---|---|---|
config | Cache configuration (DiffusionCacheConfig instance), inherited from CacheBackend. | |
enabled | Whether cache-dit is enabled on this pipeline, inherited from CacheBackend. | |
_refresh_func | Callable[[Any, int, bool], None] | None | Internal refresh function for updating cache context. |
_last_num_inference_steps | int | None | Last num_inference_steps used for refresh optimization. |
enable ¶
enable(pipeline: Any) -> None
Enable cache-dit on the pipeline if configured.
This method applies cache-dit acceleration to the appropriate transformer(s) in the pipeline. It handles both single-transformer and dual-transformer architectures (e.g., Wan2.2).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | The diffusion pipeline instance. | required |
is_enabled ¶
is_enabled() -> bool
Check if cache-dit is enabled on this pipeline.
Returns:
| Type | Description |
|---|---|
bool | True if cache-dit is enabled, False otherwise. |
refresh ¶
Refresh cache context with new num_inference_steps.
This method updates the cache context when num_inference_steps changes during inference. For dual-transformer models (e.g., Wan2.2), it automatically splits the steps based on boundary_ratio.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | The diffusion pipeline instance. | required |
num_inference_steps | int | New number of inference steps. | required |
verbose | bool | Whether to log refresh operations. | True |
SensenovaCachedAdapter ¶
SensenovaCachedBlocks ¶
Bases: CachedBlocks_Pattern_3_4_5
Custom CachedBlocks for SenseNova-U1 that only caches image-token hidden states during denoising.
Wan22S2VCachedAdapter ¶
Bases: CachedAdapter
CacheDiT adapter that uses Wan22S2VCachedBlocks for S2V audio injection.
Only overrides collect_unified_blocks to use Wan22S2VCachedBlocks (which calls after_transformer_block per-layer internally). The base class mock_transformer handles the forward wrapping — after_transformer_block is permanently replaced with a no-op in enable_cache_for_wan22_s2v() to prevent double injection.
Wan22S2VCachedBlocks ¶
Bases: CachedBlocks_Pattern_3_4_5
CacheDiT blocks wrapper that preserves S2V per-layer audio injection.
enable_cache_for_bagel ¶
Enable cache-dit for Bagel model (via OmniDiffusion pipeline).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | The OmniDiffusion pipeline instance. | required |
cache_config | Any | DiffusionCacheConfig instance with cache configuration. | required |
Returns:
| Type | Description |
|---|---|
Callable[[int], None] | A refresh function that can be called to update cache context with new num_inference_steps. |
enable_cache_for_cosmos3 ¶
Enable cache-dit for Cosmos3.
Cosmos3 has a dual-pathway architecture (UND + GEN) but only the GEN pathway (gen_layers) runs at every denoising step. The UND pathway computes once and its K/V are cached by the pipeline itself; no cache-dit needed there. We wrap only gen_layers via BlockAdapter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | The Cosmos3 pipeline instance. | required |
cache_config | Any | DiffusionCacheConfig instance with cache configuration. | required |
Returns:
| Type | Description |
|---|---|
Callable[[int], None] | A refresh function that can be called to update cache context with new num_inference_steps. |
enable_cache_for_dit ¶
Enable cache-dit for regular single-transformer DiT models.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | The diffusion pipeline instance. | required |
cache_config | Any | DiffusionCacheConfig instance with cache configuration. | required |
Returns:
| Type | Description |
|---|---|
Callable[[int], None] | A refresh function that can be called to update cache context with new num_inference_steps. |
enable_cache_for_dreamid_omni ¶
Enable cache-dit for DreamID-Omni fused pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | The DreamIDOmni pipeline instance. | required |
cache_config | Any | DiffusionCacheConfig instance with cache configuration. | required |
Returns:
| Type | Description |
|---|---|
Callable[[int], None] | A refresh function that can be called with a new |
Callable[[int], None] | to update the cache context for the pipeline. |
enable_cache_for_ernie_image ¶
Enable cache-dit for ERNIE-Image pipeline.
ERNIE-Image blocks have signature
forward(x, rotary_pos_emb, temb, attention_mask) -> x
Where x is hidden_states (concatenated image + text tokens). This matches Pattern_3 which expects: - Input: hidden_states only - Output: hidden_states only
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | The ERNIE-Image pipeline instance. | required |
cache_config | Any | DiffusionCacheConfig instance with cache configuration. | required |
Returns: A refresh function that can be called to update cache context with new num_inference_steps.
enable_cache_for_flux ¶
Enable cache-dit for Flux.1-dev pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | The Flux pipeline instance. | required |
cache_config | Any | DiffusionCacheConfig instance with cache configuration. | required |
Returns: A refresh function that can be called with a new num_inference_steps to update the cache context for the pipeline.
enable_cache_for_flux2 ¶
Enable cache-dit for Flux.2-dev pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | The Flux2 pipeline instance. | required |
cache_config | Any | DiffusionCacheConfig instance with cache configuration. | required |
Returns: A refresh function that can be called with a new num_inference_steps to update the cache context for the pipeline.
enable_cache_for_flux2_klein ¶
Enable cache-dit for FLUX.2-klein-4B pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | The FLUX.2-klein-4B pipeline instance. | required |
cache_config | Any | DiffusionCacheConfig instance with cache configuration. | required |
Returns: A refresh function that can be called with a new num_inference_steps to update the cache context for the pipeline.
enable_cache_for_glm_image ¶
Enable cache-dit for GlmImage pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | The GlmImage pipeline instance. | required |
cache_config | Any | DiffusionCacheConfig instance with cache configuration. | required |
Returns: A refresh function that can be called with a new num_inference_steps to update the cache context for the pipeline.
enable_cache_for_helios ¶
Enable cache-dit for Helios pipeline.
Helios extends Wan2.2 with multi-term memory patches and guidance cross-attention. Its transformer blocks have the same single-output signature as Wan2.2 blocks (returns hidden_states only), so we use ForwardPattern.Pattern_2.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | The HeliosPipeline instance. | required |
cache_config | Any | DiffusionCacheConfig instance with cache configuration. | required |
Returns:
| Type | Description |
|---|---|
Callable[[int], None] | A refresh function that can be called to update cache context with new num_inference_steps. |
enable_cache_for_hunyuan_image3 ¶
Enable cache-dit for HunyuanImage3 pipeline.
HunyuanImage3 stores its main transformer stack at pipeline.model with decoder blocks in pipeline.model.layers.
enable_cache_for_hunyuan_video_15 ¶
Enable cache-dit for HunyuanVideo 1.5 pipeline.
HunyuanVideo 1.5 uses a single transformer with has_separate_cfg=True (separate conditional/unconditional forward passes). The _sp_plan scatter is applied at the transformer input boundary (empty-string key), so CacheDiT sees already-sharded hidden_states throughout transformer_blocks.
enable_cache_for_longcat_image ¶
enable_cache_for_ltx2 ¶
Enable cache-dit for LTX2 pipelines (audio-video transformer blocks).
enable_cache_for_sd3 ¶
enable_cache_for_sensenova_u1 ¶
Enable cache-dit for SenseNova-U1 model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | The SenseNova-U1 pipeline instance. | required |
cache_config | Any | DiffusionCacheConfig instance with cache configuration. | required |
Returns:
| Type | Description |
|---|---|
Callable[[int], None] | A refresh function that can be called to update cache context with new num_inference_steps. |
enable_cache_for_stable_audio_open ¶
Enable cache-dit for Stable Audio Open pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | The StableAudioPipeline instance. | required |
cache_config | Any | DiffusionCacheConfig instance with cache configuration. | required |
Returns:
| Type | Description |
|---|---|
Callable[[int], None] | A refresh function that can be called to update cache context with new num_inference_steps. |
enable_cache_for_wan22 ¶
Enable cache-dit for Wan2.2 single or dual-transformer architecture.
Wan2.2 can use single or dual transformers (transformer and transformer_2) that need to be enabled using BlockAdapter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | The Wan2.2 pipeline instance. | required |
cache_config | Any | DiffusionCacheConfig instance with cache configuration. | required |
Returns:
| Type | Description |
|---|---|
Callable[[int], None] | A refresh function that can be called to update cache context with new num_inference_steps. |
enable_cache_for_wan22_s2v ¶
Enable cache-dit for Wan2.2 S2V.
S2V uses a single transformer, but unlike the other Wan2.2 variants its block loop calls each block as block(hidden_states, **kwargs) and keeps the timestep modulation state in e rather than a second positional tensor. CacheDiT Pattern_3 matches that contract: cache hidden states only and pass the remaining conditioning through kwargs unchanged.
The S2V transformer has an after_transformer_block method that injects audio embeddings after specific layers. The cached blocks wrapper (Wan22S2VCachedBlocks._run_block) calls the original internally, so we permanently replace it with a no-op on the transformer to prevent double injection from the main forward loop.
may_enable_cache_dit ¶
may_enable_cache_dit(
pipeline: Any, od_config: OmniDiffusionConfig
) -> Optional[CacheDiTBackend]
Enable cache-dit on the pipeline if configured (convenience function).
This is a convenience function that creates and enables a CacheDiTBackend. For new code, consider using CacheDiTBackend directly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | The diffusion pipeline instance. | required |
od_config | OmniDiffusionConfig | OmniDiffusionConfig with cache configuration. | required |
Returns:
| Type | Description |
|---|---|
Optional[CacheDiTBackend] | A CacheDiTBackend instance if cache-dit is enabled, None otherwise. |