vllm_omni.diffusion.distributed.hsdp ¶
HSDPInferenceConfig dataclass ¶
Configuration for HSDP inference.
This is a runtime config created from DiffusionParallelConfig's HSDP settings.
apply_hsdp_to_model ¶
apply_hsdp_to_model(
model: Module,
hsdp_config: HSDPInferenceConfig,
target_device: device | None = None,
) -> Module
Apply HSDP sharding to a model that already has weights loaded.
This function redistributes the model's parameters across GPUs using HSDP. The model should already have its weights loaded via the standard load_weights method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model | Module | Model instance with weights already loaded | required |
hsdp_config | HSDPInferenceConfig | HSDP configuration with HSDP mesh dimensions | required |
target_device | device | None | Worker's execution device. When the model declares _hsdp_ignored_modules, those modules are excluded from FSDP's mesh-driven device placement, so the caller must specify where to put them. Optional only when there are no ignored modules. | None |
Returns:
| Type | Description |
|---|---|
Module | HSDP-wrapped model ready for inference |
shard_model ¶
shard_model(
model: Module,
*,
reshard_after_forward: bool = True,
mp_policy: MixedPrecisionPolicy | None = None,
mesh: DeviceMesh | None = None,
hsdp_shard_conditions: list[
Callable[[str, Module], bool]
],
ignored_params: set[Parameter] | None = None,
) -> None
Apply HSDP sharding to model modules based on shard conditions.
ignored_params (if provided) are excluded from the root fully_shard wrap, so they are not collected into the root flat-parameter, are not subject to MixedPrecisionPolicy, and retain their original dtype. Per-submodule shard wraps do not receive ignored_params because the ignored modules are expected to live at the root level, not inside any block matched by hsdp_shard_conditions.