vllm_omni.diffusion.distributed.hsdp ¶
HSDPInferenceConfig dataclass ¶
Configuration for HSDP inference.
This is a runtime config created from DiffusionParallelConfig's HSDP settings.
apply_hsdp_to_model ¶
apply_hsdp_to_model(
model: Module, hsdp_config: HSDPInferenceConfig
) -> Module
Apply HSDP sharding to a model that already has weights loaded.
This function redistributes the model's parameters across GPUs using HSDP. The model should already have its weights loaded via the standard load_weights method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model | Module | Model instance with weights already loaded | required |
hsdp_config | HSDPInferenceConfig | HSDP configuration with HSDP mesh dimensions | required |
Returns:
| Type | Description |
|---|---|
Module | HSDP-wrapped model ready for inference |