vllm_omni.diffusion.cache.magcache.backend ¶

MagCache backend implementation.

This module provides the MagCache backend that implements the CacheBackend interface using the hooks-based MagCache system.

logger `module-attribute` ¶

logger = init_logger(__name__)

MagCacheBackend ¶

Bases: CacheBackend

MagCache implementation using hooks.

MagCache (Magnitude-based Cache) is an adaptive caching technique that speeds up diffusion inference by reusing transformer block computations based on accumulated magnitude error between timesteps.

The backend applies MagCache hooks to the transformer which intercept the forward pass and implement the caching logic transparently.

Example

from vllm_omni.diffusion.data import DiffusionCacheConfig from vllm_omni.diffusion.cache.magcache.config import MagCacheConfig from vllm_omni.diffusion.cache.magcache.strategy import FluxMagCacheStrategy cache_config = DiffusionCacheConfig( ... mag_ratios=FluxMagCacheStrategy.FLUX_MAG_RATIOS, ... num_inference_steps=28, ... mag_threshold=0.24, ... mag_max_skip_steps=5, ... mag_retention_ratio=0.1, ... ) backend = MagCacheBackend(cache_config) backend.enable(pipeline) backend.refresh(pipeline, num_inference_steps=50)

enable ¶

enable(pipeline: Any) -> None

Enable MagCache on transformer using hooks.

This creates a MagCacheConfig from the backend's DiffusionCacheConfig and applies the MagCache hook to the transformer.

Parameters:

Name	Type	Description	Default
`pipeline`	`Any`	Diffusion pipeline instance. Extracts transformer and transformer_type: - transformer: pipeline.transformer - transformer_type: pipeline.transformer.class.name	required

is_enabled ¶

is_enabled() -> bool

Check if MagCache is enabled.

Returns:

Type	Description
`bool`	True if enabled, False otherwise.

refresh ¶

refresh(pipeline: Any, num_inference_steps: int) -> None

Refresh MagCache state for new generation.

Clears all cached residuals and resets counters/accumulators. Should be called before each generation to ensure clean state.