vllm_omni.diffusion.cache.teacache.hook ¶

Hook-based TeaCache implementation for vLLM-Omni.

This module implements a diffusers-style hook system that completely intercepts the transformer forward pass, eliminating the need for any TeaCache-specific code in model definitions. Model developers only need to add an extractor function to support new models.

TeaCacheHook ¶

Bases: ModelHook

ModelHook implementing TeaCache for transformer models.

This hook completely intercepts the transformer's forward pass and implements adaptive caching based on timestep embedding similarity. It's model-agnostic and supports multiple model types through extractor functions.

Key features: - Zero changes to model code - CFG-aware with separate states for positive/negative branches - CFG-parallel compatible: properly detects branch identity across ranks - Model-specific polynomial rescaling - Auto-detection of model types

Attributes:

Name	Type	Description
`config`		TeaCache configuration with thresholds and callbacks
`rescale_func`		Polynomial function for rescaling L1 distances
`state_manager`		Manages TeaCacheState across forward passes
`extractor_fn`		Model-specific function to extract modulated input

config `instance-attribute` ¶

config = config

extractor_fn `instance-attribute` ¶

extractor_fn = None

rescale_func `instance-attribute` ¶

rescale_func = np.poly1d(config.coefficients)

state_manager `instance-attribute` ¶

state_manager = StateManager(TeaCacheState)

initialize_hook ¶

initialize_hook(module: Module) -> Module

Initialize hook with extractor from config transformer model type.

Parameters:

Name	Type	Description	Default
`module`	`Module`	The module to initialize the hook for.	required

Returns:

Type	Description
`Module`	The initialized module.

new_forward ¶

new_forward(
    module: Module, *args: Any, **kwargs: Any
) -> Any

Generic forward handler that works for ANY model.

This method is completely model-agnostic. All model-specific logic is encapsulated in the extractor function that returns a CacheContext.

The extractor does: - Model-specific preprocessing - Extraction of modulated input for cache decision - Providing transformer execution callable - Providing postprocessing callable

This hook does: - CFG-aware state management - Cache decision logic (generic) - Residual caching and reuse

Parameters:

Name	Type	Description	Default
`module`	`Module`	Transformer module (any architecture)	required
`*args`	`Any`	Positional arguments for model forward	`()`
`**kwargs`	`Any`	Keyword arguments for model forward	`{}`

Returns:

Type	Description
`Any`	Model output (format depends on model)

reset_state ¶

reset_state(module: Module) -> Module

Reset all cached states for a new inference run.

Parameters:

Name	Type	Description	Default
`module`	`Module`	The module to reset state for.	required

Returns:

Type	Description
`Module`	The module with reset state.

apply_teacache_hook ¶

apply_teacache_hook(
    module: Module, config: TeaCacheConfig
) -> None

Apply TeaCache optimization to a transformer module.

This function registers a TeaCacheHook that completely intercepts the module's forward pass, implementing adaptive caching without any changes to the model code.

Parameters:

Name	Type	Description	Default
`module`	`Module`	Transformer model to optimize (e.g., QwenImageTransformer2DModel)	required
`config`	`TeaCacheConfig`	TeaCacheConfig specifying caching parameters	required

Example

config = TeaCacheConfig( ... rel_l1_thresh=0.2, ... transformer_type="QwenImageTransformer2DModel" ... ) apply_teacache_hook(transformer, config)

Transformer bound to the pipeline now uses TeaCache automatically,¶

... # no code changes needed!

vllm_omni.diffusion.cache.teacache.hook ¶

TeaCacheHook ¶

config instance-attribute ¶

extractor_fn instance-attribute ¶

rescale_func instance-attribute ¶

state_manager instance-attribute ¶

initialize_hook ¶

new_forward ¶

reset_state ¶

apply_teacache_hook ¶

Transformer bound to the pipeline now uses TeaCache automatically,¶

config `instance-attribute` ¶

extractor_fn `instance-attribute` ¶

rescale_func `instance-attribute` ¶

state_manager `instance-attribute` ¶