Skip to content

vllm_omni.diffusion.cache.teacache.backend

TeaCache backend implementation.

This module provides the TeaCache backend that implements the CacheBackend interface using the hooks-based TeaCache system.

CUSTOM_TEACACHE_ENABLERS module-attribute

CUSTOM_TEACACHE_ENABLERS = {
    "BagelPipeline": enable_bagel_teacache,
    "Flux2KleinPipeline": enable_flux2_klein_teacache,
    "HunyuanImage3Pipeline": enable_hunyuan_image3_teacache,
}

logger module-attribute

logger = init_logger(__name__)

TeaCacheBackend

Bases: CacheBackend

TeaCache implementation using hooks.

TeaCache (Timestep Embedding Aware Cache) is an adaptive caching technique that speeds up diffusion inference by reusing transformer block computations when consecutive timestep embeddings are similar.

The backend applies TeaCache hooks to the transformer which intercept the forward pass and implement the caching logic transparently.

Example

from vllm_omni.diffusion.data import DiffusionCacheConfig backend = TeaCacheBackend(DiffusionCacheConfig(rel_l1_thresh=0.2)) backend.enable(pipeline)

Generate with cache enabled

backend.refresh(pipeline, num_inference_steps=50) # Refresh before each generation

Access config attributes: backend.config.rel_l1_thresh

enable

enable(pipeline: Any) -> None

Enable TeaCache on transformer using hooks.

This creates a TeaCacheConfig from the backend's DiffusionCacheConfig and applies the TeaCache hook to the transformer.

Parameters:

Name Type Description Default
pipeline Any

Diffusion pipeline instance. Extracts transformer and transformer_type: - transformer: pipeline.transformer - transformer_type: pipeline.transformer.class.name

required

refresh

refresh(
    pipeline: Any,
    num_inference_steps: int,
    verbose: bool = True,
) -> None

Refresh TeaCache state for new generation.

Clears all cached residuals and resets counters/accumulators. Should be called before each generation to ensure clean state.

Parameters:

Name Type Description Default
pipeline Any

Diffusion pipeline instance. Extracts transformer via pipeline.transformer.

required
num_inference_steps int

Number of inference steps for the current generation. Currently not used by TeaCache but accepted for interface consistency.

required
verbose bool

Whether to log refresh operations (default: True)

True

enable_bagel_teacache

enable_bagel_teacache(
    pipeline: Any, config: DiffusionCacheConfig
) -> None

Enable TeaCache for Bagel model.

enable_flux2_klein_teacache

enable_flux2_klein_teacache(
    pipeline: Any, config: DiffusionCacheConfig
) -> None

Enable TeaCache for Flux2 Klein model.

enable_hunyuan_image3_teacache

enable_hunyuan_image3_teacache(
    pipeline: Any, config: DiffusionCacheConfig
) -> None

Enable TeaCache for HunyuanImage3 model.

HunyuanImage3 uses a GPT-based architecture with KV cache, which is incompatible with the standard hook-based TeaCache approach. Instead, we store the TeaCacheConfig on the pipeline so the denoising loop can implement caching directly.