Skip to content

vllm_omni.diffusion.cache.magcache.hook

Hook-based MagCache implementation for vLLM-Omni.

This module implements a diffusers-style hook system for MagCache (Magnitude-based Cache), providing adaptive caching for diffusion model inference.

MagCache speeds up inference by skipping transformer block computations when the accumulated magnitude error is below a threshold, reusing cached residuals instead.

Based on: https://github.com/Zehong-Ma/MagCache Reference: diffusers/src/diffusers/hooks/mag_cache.py

Architecture: - MagCacheStrategy: Model-specific strategy for preprocessing/postprocessing - MagCacheState: Per-step state tracking residuals and accumulated error - MagCacheHeadHook: Decides whether to skip based on accumulated error - MagCacheBlockHook: Computes and stores residuals at tail block

logger module-attribute

logger = init_logger(__name__)

MagCacheBlockHook

Bases: ModelHook

Block hook for MagCache - computes residuals at tail block.

config instance-attribute

config = config

is_tail instance-attribute

is_tail = is_tail

state_manager instance-attribute

state_manager = state_manager

advance_step

advance_step(state: MagCacheState) -> None

initialize_hook

initialize_hook(module: Module) -> Module

log_residual_computed

log_residual_computed(
    state: MagCacheState,
    residual: Tensor | tuple[Tensor, Tensor],
) -> None

new_forward

new_forward(module: Module, *args, **kwargs)

perform_calibration

perform_calibration(
    state: MagCacheState,
    current_residual: Tensor | tuple[Tensor, Tensor],
) -> None

reset_state

reset_state(module: Module) -> Module

MagCacheHeadHook

Bases: ModelHook

Head block hook for MagCache - decides whether to skip computation.

config instance-attribute

config = config

state_manager instance-attribute

state_manager = state_manager

initialize_hook

initialize_hook(module: Module) -> Module

log_cache_hit

log_cache_hit(state: MagCacheState, output, ret)

log_cache_miss

log_cache_miss(state: MagCacheState, output)

new_forward

new_forward(module: Module, *args, **kwargs)

reset_state

reset_state(module: Module) -> Module

apply_mag_cache_hook

apply_mag_cache_hook(
    module: Module,
    config: MagCacheConfig,
    strategy: MagCacheStrategy | None = None,
) -> None

Apply MagCache optimization to a transformer module.

Parameters:

Name Type Description Default
module Module

Transformer model to optimize (e.g., FluxTransformer2DModel)

required
config MagCacheConfig

MagCacheConfig specifying caching parameters

required
strategy MagCacheStrategy | None

Optional strategy to use. If None, will be looked up from registry.

None