Skip to content

vllm_omni.diffusion.offloader.sequential_backend

logger module-attribute

logger = init_logger(__name__)

ModelLevelOffloadBackend

Bases: OffloadBackend

Model-level (sequential) offloading backend.

Uses SequentialOffloadHook registered via HookRegistry for automatic module swapping.

disable

disable() -> None

enable

enable(pipeline: Module) -> None

SequentialOffloadHook

Bases: ModelHook

Hook for sequential offloading with mutual exclusion on encoder and DiT modules.

To be used as a model-level (or "component-level") of CPU offloading method; When a module's forward is called, this hook offloads target modules to CPU and loads the current module to GPU.

device instance-attribute

device = device

offload_targets instance-attribute

offload_targets = offload_targets

pin_memory instance-attribute

pin_memory = pin_memory

use_hsdp instance-attribute

use_hsdp = use_hsdp

pre_forward

pre_forward(
    module: Module, *args, **kwargs
) -> tuple[tuple, dict]

apply_sequential_offload

apply_sequential_offload(
    dit_modules: list[Module],
    encoder_modules: list[Module],
    device: device,
    pin_memory: bool = True,
    use_hsdp: bool = False,
) -> None

Apply sequential offloading hooks to DiT and encoder modules.

Registers hooks on modules to implement mutual-exclusion GPU allocation. - Before DiT runs, encoders are offloaded to CPU. - Before encoders run, DiT is offloaded to CPU.

Parameters:

Name Type Description Default
dit_modules list[Module]

DiT/transformer modules to register hooks on

required
encoder_modules list[Module]

Encoder modules to register hooks on

required
device device

Target GPU device for loading

required
pin_memory bool

Whether to pin CPU memory for faster transfers

True
use_hsdp bool

Whether HSDP is enabled (affects non_blocking behavior)

False
Example

apply_sequential_offload( ... dit_modules=[pipeline.transformer], ... encoder_modules=[pipeline.text_encoder, pipeline.vae], ... device=torch.device("cuda:0"), ... )

Modules of pipeline now automatically swap between CPU and GPU

remove_sequential_offload

remove_sequential_offload(modules: list[Module]) -> None

Remove sequential offloading hooks from modules.

Parameters:

Name Type Description Default
modules list[Module]

Modules to remove hooks from

required
Example

all_modules = [dit_modules, encoder_modules] remove_sequential_offload(all_modules)