Skip to content

vllm_omni.diffusion.cache.magcache.backend

MagCache backend implementation.

This module provides the MagCache backend that implements the CacheBackend interface using the hooks-based MagCache system.

logger module-attribute

logger = init_logger(__name__)

MagCacheBackend

Bases: CacheBackend

MagCache implementation using hooks.

MagCache (Magnitude-based Cache) is an adaptive caching technique that speeds up diffusion inference by reusing transformer block computations based on accumulated magnitude error between timesteps.

The backend applies MagCache hooks to the transformer which intercept the forward pass and implement the caching logic transparently.

Example

from vllm_omni.diffusion.data import DiffusionCacheConfig from vllm_omni.diffusion.cache.magcache.config import MagCacheConfig from vllm_omni.diffusion.cache.magcache.strategy import FluxMagCacheStrategy cache_config = DiffusionCacheConfig( ... mag_ratios=FluxMagCacheStrategy.FLUX_MAG_RATIOS, ... num_inference_steps=28, ... mag_threshold=0.24, ... mag_max_skip_steps=5, ... mag_retention_ratio=0.1, ... ) backend = MagCacheBackend(cache_config) backend.enable(pipeline) backend.refresh(pipeline, num_inference_steps=50)

enable

enable(pipeline: Any) -> None

Enable MagCache on transformer using hooks.

This creates a MagCacheConfig from the backend's DiffusionCacheConfig and applies the MagCache hook to the transformer.

Parameters:

Name Type Description Default
pipeline Any

Diffusion pipeline instance. Extracts transformer and transformer_type: - transformer: pipeline.transformer - transformer_type: pipeline.transformer.class.name

required

is_enabled

is_enabled() -> bool

Check if MagCache is enabled.

Returns:

Type Description
bool

True if enabled, False otherwise.

refresh

refresh(pipeline: Any, num_inference_steps: int) -> None

Refresh MagCache state for new generation.

Clears all cached residuals and resets counters/accumulators. Should be called before each generation to ensure clean state.

Parameters:

Name Type Description Default
pipeline Any

Diffusion pipeline instance. Extracts transformer via pipeline.transformer.

required
num_inference_steps int

Number of inference steps for the current generation. May be used for cache context updates.

required