Skip to content

vllm_omni.diffusion.cache.teacache.hook

Hook-based TeaCache implementation for vLLM-Omni.

This module implements a diffusers-style hook system that completely intercepts the transformer forward pass, eliminating the need for any TeaCache-specific code in model definitions. Model developers only need to add an extractor function to support new models.

TeaCacheHook

Bases: ModelHook

ModelHook implementing TeaCache for transformer models.

This hook completely intercepts the transformer's forward pass and implements adaptive caching based on timestep embedding similarity. It's model-agnostic and supports multiple model types through extractor functions.

Key features: - Zero changes to model code - CFG-aware with separate states for positive/negative branches - CFG-parallel compatible: properly detects branch identity across ranks - Model-specific polynomial rescaling - Auto-detection of model types

Attributes:

Name Type Description
config

TeaCache configuration with thresholds and callbacks

rescale_func

Polynomial function for rescaling L1 distances

state_manager

Manages TeaCacheState across forward passes

extractor_fn

Model-specific function to extract modulated input

config instance-attribute

config = config

extractor_fn instance-attribute

extractor_fn = None

rescale_func instance-attribute

rescale_func = poly1d(coefficients)

state_manager instance-attribute

state_manager = StateManager(TeaCacheState)

initialize_hook

initialize_hook(module: Module) -> Module

Initialize hook with extractor from config transformer model type.

Parameters:

Name Type Description Default
module Module

The module to initialize the hook for.

required

Returns:

Type Description
Module

The initialized module.

new_forward

new_forward(
    module: Module, *args: Any, **kwargs: Any
) -> Any

Generic forward handler that works for ANY model.

This method is completely model-agnostic. All model-specific logic is encapsulated in the extractor function that returns a CacheContext.

The extractor does: - Model-specific preprocessing - Extraction of modulated input for cache decision - Providing transformer execution callable - Providing postprocessing callable

This hook does: - CFG-aware state management - Cache decision logic (generic) - Residual caching and reuse

Parameters:

Name Type Description Default
module Module

Transformer module (any architecture)

required
*args Any

Positional arguments for model forward

()
**kwargs Any

Keyword arguments for model forward

{}

Returns:

Type Description
Any

Model output (format depends on model)

reset_state

reset_state(module: Module) -> Module

Reset all cached states for a new inference run.

Parameters:

Name Type Description Default
module Module

The module to reset state for.

required

Returns:

Type Description
Module

The module with reset state.

apply_teacache_hook

apply_teacache_hook(
    module: Module, config: TeaCacheConfig
) -> None

Apply TeaCache optimization to a transformer module.

This function registers a TeaCacheHook that completely intercepts the module's forward pass, implementing adaptive caching without any changes to the model code.

Parameters:

Name Type Description Default
module Module

Transformer model to optimize (e.g., QwenImageTransformer2DModel)

required
config TeaCacheConfig

TeaCacheConfig specifying caching parameters

required
Example

config = TeaCacheConfig( ... rel_l1_thresh=0.2, ... transformer_type="QwenImageTransformer2DModel" ... ) apply_teacache_hook(transformer, config)

Transformer bound to the pipeline now uses TeaCache automatically,

... # no code changes needed!