vllm_omni.diffusion.cache.teacache.hook ¶
Hook-based TeaCache implementation for vLLM-Omni.
This module implements a diffusers-style hook system that completely intercepts the transformer forward pass, eliminating the need for any TeaCache-specific code in model definitions. Model developers only need to add an extractor function to support new models.
TeaCacheHook ¶
Bases: ModelHook
ModelHook implementing TeaCache for transformer models.
This hook completely intercepts the transformer's forward pass and implements adaptive caching based on timestep embedding similarity. It's model-agnostic and supports multiple model types through extractor functions.
Key features: - Zero changes to model code - CFG-aware with separate states for positive/negative branches - CFG-parallel compatible: properly detects branch identity across ranks - Model-specific polynomial rescaling - Auto-detection of model types
Attributes:
| Name | Type | Description |
|---|---|---|
config | TeaCache configuration with thresholds and callbacks | |
rescale_func | Polynomial function for rescaling L1 distances | |
state_manager | Manages TeaCacheState across forward passes | |
extractor_fn | Model-specific function to extract modulated input |
initialize_hook ¶
Initialize hook with extractor from config transformer model type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
module | Module | The module to initialize the hook for. | required |
Returns:
| Type | Description |
|---|---|
Module | The initialized module. |
new_forward ¶
Generic forward handler that works for ANY model.
This method is completely model-agnostic. All model-specific logic is encapsulated in the extractor function that returns a CacheContext.
The extractor does: - Model-specific preprocessing - Extraction of modulated input for cache decision - Providing transformer execution callable - Providing postprocessing callable
This hook does: - CFG-aware state management - Cache decision logic (generic) - Residual caching and reuse
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
module | Module | Transformer module (any architecture) | required |
*args | Any | Positional arguments for model forward | () |
**kwargs | Any | Keyword arguments for model forward | {} |
Returns:
| Type | Description |
|---|---|
Any | Model output (format depends on model) |
reset_state ¶
Reset all cached states for a new inference run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
module | Module | The module to reset state for. | required |
Returns:
| Type | Description |
|---|---|
Module | The module with reset state. |
apply_teacache_hook ¶
apply_teacache_hook(
module: Module, config: TeaCacheConfig
) -> None
Apply TeaCache optimization to a transformer module.
This function registers a TeaCacheHook that completely intercepts the module's forward pass, implementing adaptive caching without any changes to the model code.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
module | Module | Transformer model to optimize (e.g., QwenImageTransformer2DModel) | required |
config | TeaCacheConfig | TeaCacheConfig specifying caching parameters | required |
Example
config = TeaCacheConfig( ... rel_l1_thresh=0.2, ... transformer_type="QwenImageTransformer2DModel" ... ) apply_teacache_hook(transformer, config)
Transformer bound to the pipeline now uses TeaCache automatically,¶
... # no code changes needed!