vllm_omni.diffusion.cache.teacache.backend ¶

TeaCache backend implementation.

This module provides the TeaCache backend that implements the CacheBackend interface using the hooks-based TeaCache system.

CUSTOM_TEACACHE_ENABLERS `module-attribute` ¶

CUSTOM_TEACACHE_ENABLERS = {
    "BagelPipeline": enable_bagel_teacache,
    "Flux2KleinPipeline": enable_flux2_klein_teacache,
    "HunyuanImage3Pipeline": enable_hunyuan_image3_teacache,
    "SenseNovaU1Pipeline": enable_sensenova_u1_teacache,
}

logger `module-attribute` ¶

logger = init_logger(__name__)

TeaCacheBackend ¶

Bases: CacheBackend

TeaCache implementation using hooks.

TeaCache (Timestep Embedding Aware Cache) is an adaptive caching technique that speeds up diffusion inference by reusing transformer block computations when consecutive timestep embeddings are similar.

The backend applies TeaCache hooks to the transformer which intercept the forward pass and implement the caching logic transparently.

Example

from vllm_omni.diffusion.data import DiffusionCacheConfig backend = TeaCacheBackend(DiffusionCacheConfig(rel_l1_thresh=0.2)) backend.enable(pipeline)

Generate with cache enabled¶

backend.refresh(pipeline, num_inference_steps=50) # Refresh before each generation

Access config attributes: backend.config.rel_l1_thresh¶

enable ¶

enable(pipeline: Any) -> None

Enable TeaCache on transformer using hooks.

This creates a TeaCacheConfig from the backend's DiffusionCacheConfig and applies the TeaCache hook to the transformer.

Parameters:

Name	Type	Description	Default
`pipeline`	`Any`	Diffusion pipeline instance. Extracts transformer and transformer_type: - transformer: pipeline.transformer - transformer_type: pipeline.transformer.class.name	required

refresh ¶

refresh(
    pipeline: Any,
    num_inference_steps: int,
    verbose: bool = True,
) -> None

Refresh TeaCache state for new generation.

Clears all cached residuals and resets counters/accumulators. Should be called before each generation to ensure clean state.

Parameters:

Name	Type	Description	Default
`pipeline`	`Any`	Diffusion pipeline instance. Extracts transformer via pipeline.transformer.	required
`num_inference_steps`	`int`	Number of inference steps for the current generation. Currently not used by TeaCache but accepted for interface consistency.	required
`verbose`	`bool`	Whether to log refresh operations (default: True)	`True`

enable_bagel_teacache ¶

enable_bagel_teacache(
    pipeline: Any, config: DiffusionCacheConfig
) -> None

Enable TeaCache for Bagel model.

enable_flux2_klein_teacache ¶

enable_flux2_klein_teacache(
    pipeline: Any, config: DiffusionCacheConfig
) -> None

Enable TeaCache for Flux2 Klein model.

enable_hunyuan_image3_teacache ¶

enable_hunyuan_image3_teacache(
    pipeline: Any, config: DiffusionCacheConfig
) -> None

Enable TeaCache for HunyuanImage3 model.

HunyuanImage3 uses a GPT-based architecture with KV cache, which is incompatible with the standard hook-based TeaCache approach. Instead, we store the TeaCacheConfig on the pipeline so the denoising loop can implement caching directly.

enable_sensenova_u1_teacache ¶

enable_sensenova_u1_teacache(
    pipeline: Any, config: DiffusionCacheConfig
) -> None

Enable TeaCache for SenseNova-U1 denoising forwards.