vllm_omni.diffusion.cache.teacache.backend ¶
TeaCache backend implementation.
This module provides the TeaCache backend that implements the CacheBackend interface using the hooks-based TeaCache system.
CUSTOM_TEACACHE_ENABLERS module-attribute ¶
CUSTOM_TEACACHE_ENABLERS = {
"BagelPipeline": enable_bagel_teacache,
"Flux2KleinPipeline": enable_flux2_klein_teacache,
"HunyuanImage3Pipeline": enable_hunyuan_image3_teacache,
}
TeaCacheBackend ¶
Bases: CacheBackend
TeaCache implementation using hooks.
TeaCache (Timestep Embedding Aware Cache) is an adaptive caching technique that speeds up diffusion inference by reusing transformer block computations when consecutive timestep embeddings are similar.
The backend applies TeaCache hooks to the transformer which intercept the forward pass and implement the caching logic transparently.
Example
from vllm_omni.diffusion.data import DiffusionCacheConfig backend = TeaCacheBackend(DiffusionCacheConfig(rel_l1_thresh=0.2)) backend.enable(pipeline)
Generate with cache enabled¶
backend.refresh(pipeline, num_inference_steps=50) # Refresh before each generation
Access config attributes: backend.config.rel_l1_thresh¶
enable ¶
enable(pipeline: Any) -> None
Enable TeaCache on transformer using hooks.
This creates a TeaCacheConfig from the backend's DiffusionCacheConfig and applies the TeaCache hook to the transformer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | Diffusion pipeline instance. Extracts transformer and transformer_type: - transformer: pipeline.transformer - transformer_type: pipeline.transformer.class.name | required |
refresh ¶
Refresh TeaCache state for new generation.
Clears all cached residuals and resets counters/accumulators. Should be called before each generation to ensure clean state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Any | Diffusion pipeline instance. Extracts transformer via pipeline.transformer. | required |
num_inference_steps | int | Number of inference steps for the current generation. Currently not used by TeaCache but accepted for interface consistency. | required |
verbose | bool | Whether to log refresh operations (default: True) | True |
enable_bagel_teacache ¶
enable_bagel_teacache(
pipeline: Any, config: DiffusionCacheConfig
) -> None
Enable TeaCache for Bagel model.
enable_flux2_klein_teacache ¶
enable_flux2_klein_teacache(
pipeline: Any, config: DiffusionCacheConfig
) -> None
Enable TeaCache for Flux2 Klein model.
enable_hunyuan_image3_teacache ¶
enable_hunyuan_image3_teacache(
pipeline: Any, config: DiffusionCacheConfig
) -> None
Enable TeaCache for HunyuanImage3 model.
HunyuanImage3 uses a GPT-based architecture with KV cache, which is incompatible with the standard hook-based TeaCache approach. Instead, we store the TeaCacheConfig on the pipeline so the denoising loop can implement caching directly.