Skip to content

vllm_omni.diffusion.distributed.hsdp

logger module-attribute

logger = init_logger(__name__)

HSDPInferenceConfig dataclass

Configuration for HSDP inference.

This is a runtime config created from DiffusionParallelConfig's HSDP settings.

enabled class-attribute instance-attribute

enabled: bool = False

hsdp_replicate_size class-attribute instance-attribute

hsdp_replicate_size: int = 1

hsdp_shard_size class-attribute instance-attribute

hsdp_shard_size: int = -1

output_dtype class-attribute instance-attribute

output_dtype: dtype | None = None

param_dtype class-attribute instance-attribute

param_dtype: dtype = torch.bfloat16

reduce_dtype class-attribute instance-attribute

reduce_dtype: dtype = torch.float32

reshard_after_forward class-attribute instance-attribute

reshard_after_forward: bool = True

apply_hsdp_to_model

apply_hsdp_to_model(
    model: Module,
    hsdp_config: HSDPInferenceConfig,
    target_device: device | None = None,
) -> Module

Apply HSDP sharding to a model that already has weights loaded.

This function redistributes the model's parameters across GPUs using HSDP. The model should already have its weights loaded via the standard load_weights method.

Parameters:

Name Type Description Default
model Module

Model instance with weights already loaded

required
hsdp_config HSDPInferenceConfig

HSDP configuration with HSDP mesh dimensions

required
target_device device | None

Worker's execution device. When the model declares _hsdp_ignored_modules, those modules are excluded from FSDP's mesh-driven device placement, so the caller must specify where to put them. Optional only when there are no ignored modules.

None

Returns:

Type Description
Module

HSDP-wrapped model ready for inference

shard_model

shard_model(
    model: Module,
    *,
    reshard_after_forward: bool = True,
    mp_policy: MixedPrecisionPolicy | None = None,
    mesh: DeviceMesh | None = None,
    hsdp_shard_conditions: list[
        Callable[[str, Module], bool]
    ],
    ignored_params: set[Parameter] | None = None,
) -> None

Apply HSDP sharding to model modules based on shard conditions.

ignored_params (if provided) are excluded from the root fully_shard wrap, so they are not collected into the root flat-parameter, are not subject to MixedPrecisionPolicy, and retain their original dtype. Per-submodule shard wraps do not receive ignored_params because the ignored modules are expected to live at the root level, not inside any block matched by hsdp_shard_conditions.