Skip to content

vllm_omni.diffusion.cache.magcache.config

MagCacheConfig dataclass

Configuration for MagCache applied to transformer models.

MagCache (Magnitude-based Cache) is an adaptive caching technique that speeds up diffusion model inference by reusing transformer block computations based on magnitude ratios between consecutive timesteps.

Reference: https://github.com/Zehong-Ma/MagCache

Parameters:

Name Type Description Default
threshold float

Accumulated error threshold. Higher = more aggressive skipping (faster, lower quality). Default: 0.24

0.24
max_skip_steps int

Max consecutive skip steps (K). Default: 5

5
retention_ratio float

Fraction of initial steps where skipping is disabled (stability). Default: 0.1

0.1
num_inference_steps int

Total inference steps. Required for retention step calculation. Default: 28

28
mag_ratios Tensor | list[float] | None

Pre-computed magnitude ratios per step. Calibrate or use strategy defaults. Default: None

None
mag_calibrate bool

If True, runs without skipping and logs norm_ratios for calibration. Default: False

False
transformer_type str

Transformer class name for logging. Default: "FluxTransformer2DModel"

'FluxTransformer2DModel'

mag_calibrate class-attribute instance-attribute

mag_calibrate: bool = False

mag_ratios class-attribute instance-attribute

mag_ratios: Tensor | list[float] | None = None

max_skip_steps class-attribute instance-attribute

max_skip_steps: int = 5

num_inference_steps class-attribute instance-attribute

num_inference_steps: int = 28

retention_ratio class-attribute instance-attribute

retention_ratio: float = 0.1

threshold class-attribute instance-attribute

threshold: float = 0.24

transformer_type class-attribute instance-attribute

transformer_type: str = 'FluxTransformer2DModel'