vllm_omni.diffusion.cache.teacache.config ¶
TeaCacheConfig dataclass ¶
Configuration for TeaCache applied to transformer models.
TeaCache (Timestep Embedding Aware Cache) is an adaptive caching technique that speeds up diffusion model inference by reusing transformer block computations when consecutive timestep embeddings are similar.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rel_l1_thresh | float | Threshold for accumulated relative L1 distance. When below threshold, cached residual is reused. Values in [0.1, 0.3] work best: - 0.2: ~1.5x speedup with minimal quality loss - 0.4: ~1.8x speedup with slight quality loss - 0.6: ~2.0x speedup with noticeable quality loss | 0.2 |
coefficients | list[float] | None | Polynomial coefficients for rescaling L1 distance. If None, uses model-specific defaults based on transformer_type. | None |
transformer_type | str | Transformer class name (e.g., "QwenImageTransformer2DModel"). Auto-detected from pipeline.transformer.class.name in backend. Defaults to "QwenImageTransformer2DModel". | 'QwenImageTransformer2DModel' |