vllm_omni.diffusion.cache.magcache.hook ¶

Hook-based MagCache implementation for vLLM-Omni.

This module implements a diffusers-style hook system for MagCache (Magnitude-based Cache), providing adaptive caching for diffusion model inference.

MagCache speeds up inference by skipping transformer block computations when the accumulated magnitude error is below a threshold, reusing cached residuals instead.

Based on: https://github.com/Zehong-Ma/MagCache Reference: diffusers/src/diffusers/hooks/mag_cache.py

Architecture: - MagCacheStrategy: Model-specific strategy for preprocessing/postprocessing - MagCacheState: Per-step state tracking residuals and accumulated error - MagCacheHeadHook: Decides whether to skip based on accumulated error - MagCacheBlockHook: Computes and stores residuals at tail block

logger `module-attribute` ¶

logger = init_logger(__name__)

MagCacheBlockHook ¶

Bases: ModelHook

Block hook for MagCache - computes residuals at tail block.

config `instance-attribute` ¶

config = config

is_tail `instance-attribute` ¶

is_tail = is_tail

state_manager `instance-attribute` ¶

state_manager = state_manager

advance_step ¶

advance_step(state: MagCacheState) -> None

initialize_hook ¶

initialize_hook(module: Module) -> Module

log_residual_computed ¶

log_residual_computed(
    state: MagCacheState,
    residual: Tensor | tuple[Tensor, Tensor],
) -> None

new_forward ¶

new_forward(module: Module, *args, **kwargs)

perform_calibration ¶

perform_calibration(
    state: MagCacheState,
    current_residual: Tensor | tuple[Tensor, Tensor],
) -> None

reset_state ¶

reset_state(module: Module) -> Module

MagCacheHeadHook ¶

Bases: ModelHook

Head block hook for MagCache - decides whether to skip computation.

config `instance-attribute` ¶

config = config

state_manager `instance-attribute` ¶

state_manager = state_manager

initialize_hook ¶

initialize_hook(module: Module) -> Module

log_cache_hit ¶

log_cache_hit(state: MagCacheState, output, ret)

log_cache_miss ¶

log_cache_miss(state: MagCacheState, output)

new_forward ¶

new_forward(module: Module, *args, **kwargs)

reset_state ¶

reset_state(module: Module) -> Module

apply_mag_cache_hook ¶

apply_mag_cache_hook(
    module: Module,
    config: MagCacheConfig,
    strategy: MagCacheStrategy | None = None,
) -> None

Apply MagCache optimization to a transformer module.

Parameters:

Name	Type	Description	Default
`module`	`Module`	Transformer model to optimize (e.g., FluxTransformer2DModel)	required
`config`	`MagCacheConfig`	MagCacheConfig specifying caching parameters	required
`strategy`	`MagCacheStrategy \| None`	Optional strategy to use. If None, will be looked up from registry.	`None`

vllm_omni.diffusion.cache.magcache.hook ¶

logger module-attribute ¶

MagCacheBlockHook ¶

config instance-attribute ¶

is_tail instance-attribute ¶

state_manager instance-attribute ¶

advance_step ¶

initialize_hook ¶

log_residual_computed ¶

new_forward ¶

perform_calibration ¶

reset_state ¶

MagCacheHeadHook ¶

config instance-attribute ¶

state_manager instance-attribute ¶

initialize_hook ¶

log_cache_hit ¶

log_cache_miss ¶

new_forward ¶

reset_state ¶

apply_mag_cache_hook ¶

logger `module-attribute` ¶

config `instance-attribute` ¶

is_tail `instance-attribute` ¶

state_manager `instance-attribute` ¶

config `instance-attribute` ¶

state_manager `instance-attribute` ¶