Expert Parallelism Guide¶

Table of Content¶

Overview
Quick Start
Configuration Parameters
Best Practices
Summary

Overview¶

Unlike Tensor Parallelism which shards every layer's weights, Expert Parallelism (EP) only shards the MoE expert MLP blocks. This significantly reduces the memory footprint of MoE models (e.g., HunyuanImage3.0) while maintaining constant dense-equivalent compute efficiency.

During the forward pass, a gating mechanism routes tokens to their designated experts, requiring all-to-all communication to dispatch tokens to the correct ranks and combine results.

See supported models list in Supported Models.

EP Size Constraint

The effective EP size equals tp × sp × cfg × dp. At least one of TP/SP/CFG/DP must be set when EP is enabled.

Quick Start¶

Basic Usage¶

from vllm_omni import Omni
from vllm_omni.inputs.data import OmniDiffusionSamplingParams
from vllm_omni.diffusion.data import DiffusionParallelConfig

omni = Omni(
    model="tencent/HunyuanImage-3.0",
    parallel_config=DiffusionParallelConfig(
        tensor_parallel_size=8,
        enable_expert_parallel=True,
    ),
)

outputs = omni.generate(
    "A brown and white dog is running on the grass",
    OmniDiffusionSamplingParams(
        num_inference_steps=50,
        width=1024,
        height=1024,
    ),
)

Configuration Parameters¶

In DiffusionParallelConfig:

Parameter	Type	Default	Description
`enable_expert_parallel`	bool	False	Enable Expert Parallelism for MoE models

EP size is derived automatically as tp × sp × cfg × dp — configure at least one of those to set the EP degree.

Best Practices¶

When to Use¶

Good for:

MoE models (e.g., HunyuanImage3.0) with numbers of experts
Memory-constrained multi-GPU setups where only expert blocks need sharding

Not for:

Dense models (no MoE layers) — EP has no effect
Single GPU setups

Summary¶

✅ Enable EP - Set enable_expert_parallel=True in DiffusionParallelConfig for MoE models
✅ Set parallelism degree - At least one of tensor_parallel_size / ulysses_degree / cfg_parallel_size must be > 1 to define the EP size