vllm_omni.diffusion.offloader ¶
Modules:
| Name | Description |
|---|---|
base | |
layerwise_backend | |
module_collector | |
sequential_backend | |
LayerWiseOffloadBackend ¶
Bases: OffloadBackend
Layer-wise (block-level) offloading backend.
Implements sliding window offloading where only a small number of transformer blocks reside on GPU at a time. Blocks are prefetched asynchronously while previous blocks compute, and freed after use.
get_blocks_attr_names staticmethod ¶
Get block attribute names from model class.
ModelLevelOffloadBackend ¶
Bases: OffloadBackend
Model-level (sequential) offloading backend.
Uses SequentialOffloadHook registered via HookRegistry for automatic module swapping.
OffloadBackend ¶
Bases: ABC
Base class for CPU offload backends
disable abstractmethod ¶
Disable offloading and cleanup resources.
Removes all registered hooks. Does NOT move modules back to original devices (caller responsible for that).
enable abstractmethod ¶
Enable offloading on the pipeline.
Discovers modules, moves them to appropriate devices, and registers forward hooks for swapping/prefetching.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pipeline | Module | Diffusion pipeline model (e.g., Wan22Pipeline) | required |
OffloadConfig dataclass ¶
from_od_config classmethod ¶
from_od_config(
od_config: OmniDiffusionConfig,
) -> OffloadConfig
Extract and validate offload settings from OmniDiffusionConfig.
For now, enforces mutual exclusion between model-level and layer-wise offloading. Layer-wise takes priority if both are enabled.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
od_config | OmniDiffusionConfig | OmniDiffusionConfig with offload settings | required |
Returns:
| Type | Description |
|---|---|
OffloadConfig | OffloadConfig with validated settings |
OffloadStrategy ¶
apply_sequential_offload ¶
apply_sequential_offload(
dit_modules: list[Module],
encoder_modules: list[Module],
device: device,
pin_memory: bool = True,
use_hsdp: bool = False,
) -> None
Apply sequential offloading hooks to DiT and encoder modules.
Registers hooks on modules to implement mutual-exclusion GPU allocation. - Before DiT runs, encoders are offloaded to CPU. - Before encoders run, DiT is offloaded to CPU.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dit_modules | list[Module] | DiT/transformer modules to register hooks on | required |
encoder_modules | list[Module] | Encoder modules to register hooks on | required |
device | device | Target GPU device for loading | required |
pin_memory | bool | Whether to pin CPU memory for faster transfers | True |
use_hsdp | bool | Whether HSDP is enabled (affects non_blocking behavior) | False |
Example
apply_sequential_offload( ... dit_modules=[pipeline.transformer], ... encoder_modules=[pipeline.text_encoder, pipeline.vae], ... device=torch.device("cuda:0"), ... )
Modules of pipeline now automatically swap between CPU and GPU¶
get_offload_backend ¶
get_offload_backend(
od_config: OmniDiffusionConfig,
device: device | None = None,
) -> OffloadBackend | None
Create appropriate offload backend based on configuration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
od_config | OmniDiffusionConfig | OmniDiffusionConfig with offload settings | required |
device | device | None | Target device (auto-detected if None) | None |
Returns:
| Type | Description |
|---|---|
OffloadBackend | None | OffloadBackend instance or None if offloading disabled |
Example
backend = get_offload_backend(od_config, device=torch.device("cuda:0")) if backend: ... backend.enable(pipeline)
remove_sequential_offload ¶
remove_sequential_offload(modules: list[Module]) -> None
Remove sequential offloading hooks from modules.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
modules | list[Module] | Modules to remove hooks from | required |
Example
all_modules = [dit_modules, encoder_modules] remove_sequential_offload(all_modules)