vllm_omni.diffusion.cache.magcache.config ¶
MagCacheConfig dataclass ¶
Configuration for MagCache applied to transformer models.
MagCache (Magnitude-based Cache) is an adaptive caching technique that speeds up diffusion model inference by reusing transformer block computations based on magnitude ratios between consecutive timesteps.
Reference: https://github.com/Zehong-Ma/MagCache
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
threshold | float | Accumulated error threshold. Higher = more aggressive skipping (faster, lower quality). Default: 0.24 | 0.24 |
max_skip_steps | int | Max consecutive skip steps (K). Default: 5 | 5 |
retention_ratio | float | Fraction of initial steps where skipping is disabled (stability). Default: 0.1 | 0.1 |
num_inference_steps | int | Total inference steps. Required for retention step calculation. Default: 28 | 28 |
mag_ratios | Tensor | list[float] | None | Pre-computed magnitude ratios per step. Calibrate or use strategy defaults. Default: None | None |
mag_calibrate | bool | If True, runs without skipping and logs norm_ratios for calibration. Default: False | False |
transformer_type | str | Transformer class name for logging. Default: "FluxTransformer2DModel" | 'FluxTransformer2DModel' |