vllm_omni.diffusion.cache.magcache.backend ¶
MagCache backend implementation.
This module provides the MagCache backend that implements the CacheBackend interface using the hooks-based MagCache system.
MagCacheBackend ¶
Bases: CacheBackend
MagCache implementation using hooks.
MagCache (Magnitude-based Cache) is an adaptive caching technique that speeds up diffusion inference by reusing transformer block computations based on accumulated magnitude error between timesteps.
The backend applies MagCache hooks to the transformer which intercept the forward pass and implement the caching logic transparently.
Example
from vllm_omni.diffusion.data import DiffusionCacheConfig from vllm_omni.diffusion.cache.magcache.config import MagCacheConfig from vllm_omni.diffusion.cache.magcache.strategy import FluxMagCacheStrategy cache_config = DiffusionCacheConfig( ... mag_ratios=FluxMagCacheStrategy.FLUX_MAG_RATIOS, ... num_inference_steps=28, ... mag_threshold=0.24, ... mag_max_skip_steps=5, ... mag_retention_ratio=0.1, ... ) backend = MagCacheBackend(cache_config) backend.enable(pipeline) backend.refresh(pipeline, num_inference_steps=50)
enable ¶
enable(pipeline: Any) -> None
Enable MagCache on transformer using hooks.
This creates a MagCacheConfig from the backend's DiffusionCacheConfig and applies the MagCache hook to the transformer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | Diffusion pipeline instance. Extracts transformer and transformer_type: - transformer: pipeline.transformer - transformer_type: pipeline.transformer.class.name | required |
refresh ¶
Refresh MagCache state for new generation.
Clears all cached residuals and resets counters/accumulators. Should be called before each generation to ensure clean state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | Diffusion pipeline instance. Extracts transformer via pipeline.transformer. | required |
num_inference_steps | int | Number of inference steps for the current generation. May be used for cache context updates. | required |