vllm_omni.diffusion.offloader.sequential_backend ¶
ModelLevelOffloadBackend ¶
Bases: OffloadBackend
Model-level (sequential) offloading backend.
Uses SequentialOffloadHook registered via HookRegistry for automatic module swapping.
SequentialOffloadHook ¶
Bases: ModelHook
Hook for sequential offloading with mutual exclusion on encoder and DiT modules.
To be used as a model-level (or "component-level") of CPU offloading method; When a module's forward is called, this hook offloads target modules to CPU and loads the current module to GPU.
apply_sequential_offload ¶
apply_sequential_offload(
dit_modules: list[Module],
encoder_modules: list[Module],
device: device,
pin_memory: bool = True,
use_hsdp: bool = False,
) -> None
Apply sequential offloading hooks to DiT and encoder modules.
Registers hooks on modules to implement mutual-exclusion GPU allocation. - Before DiT runs, encoders are offloaded to CPU. - Before encoders run, DiT is offloaded to CPU.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dit_modules | list[Module] | DiT/transformer modules to register hooks on | required |
encoder_modules | list[Module] | Encoder modules to register hooks on | required |
device | device | Target GPU device for loading | required |
pin_memory | bool | Whether to pin CPU memory for faster transfers | True |
use_hsdp | bool | Whether HSDP is enabled (affects non_blocking behavior) | False |
Example
apply_sequential_offload( ... dit_modules=[pipeline.transformer], ... encoder_modules=[pipeline.text_encoder, pipeline.vae], ... device=torch.device("cuda:0"), ... )
Modules of pipeline now automatically swap between CPU and GPU¶
remove_sequential_offload ¶
remove_sequential_offload(modules: list[Module]) -> None
Remove sequential offloading hooks from modules.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
modules | list[Module] | Modules to remove hooks from | required |
Example
all_modules = [dit_modules, encoder_modules] remove_sequential_offload(all_modules)