vllm_omni.diffusion.models.magi_human.magi_human_dit ¶
Adapter ¶
Bases: Module
audio_embedder instance-attribute ¶
rope instance-attribute ¶
rope = ElementWiseFourierEmbed(
hidden_size // num_attention_heads,
in_pixels=False,
learnable=False,
)
text_embedder instance-attribute ¶
video_embedder instance-attribute ¶
AdapterConfig dataclass ¶
Attention ¶
Bases: Module
linear_gating instance-attribute ¶
linear_gating = ColumnParallelLinear(
input_size=hidden_size,
output_size=num_heads_q,
bias=False,
gather_output=False,
return_bias=False,
)
linear_proj instance-attribute ¶
linear_proj = RowParallelLinear(
input_size=num_heads_q * head_dim,
output_size=hidden_size,
bias=False,
input_is_parallel=True,
return_bias=False,
)
linear_qkv instance-attribute ¶
linear_qkv = QKVParallelLinear(
hidden_size=hidden_size,
head_size=head_dim,
total_num_heads=num_heads_q,
total_num_kv_heads=num_heads_kv,
bias=False,
return_bias=False,
)
pre_norm instance-attribute ¶
pre_norm = MultiModalityRMSNorm(
hidden_size, eps=1e-06, num_modality=num_modality
)
forward ¶
forward(
hidden_states: Tensor,
rope: Tensor,
permute_mapping: Tensor,
inv_permute_mapping: Tensor,
varlen_handler: VarlenHandler,
local_attn_handler: FFAHandler | None,
modality_dispatcher: ModalityDispatcher,
) -> Tensor
AttentionConfig dataclass ¶
BaseLinear ¶
Bases: Module
num_layers_for_initialization instance-attribute ¶
weight instance-attribute ¶
forward ¶
forward(
input: Tensor,
output_dtype: dtype | None = None,
modality_dispatcher: ModalityDispatcher | None = None,
) -> Tensor
DiTModel ¶
Bases: Module
config instance-attribute ¶
config: TransformerConfig = TransformerConfig(
hidden_size=hidden_size,
video_in_channels=video_in_channels,
audio_in_channels=audio_in_channels,
text_in_channels=text_in_channels,
params_dtype=params_dtype,
post_process_dtype=float32,
)
final_linear_audio instance-attribute ¶
final_linear_video instance-attribute ¶
forward ¶
forward(
x: Tensor,
coords_mapping: Tensor,
modality_mapping: Tensor,
varlen_handler: VarlenHandler,
local_attn_handler: FFAHandler | None,
)
ElementWiseFourierEmbed ¶
Bases: Module
FFAHandler dataclass ¶
MLP ¶
Bases: Module
down_proj instance-attribute ¶
down_proj = RowParallelLinear(
input_size=intermediate_size,
output_size=hidden_size,
bias=False,
input_is_parallel=True,
return_bias=False,
)
pre_norm instance-attribute ¶
pre_norm = MultiModalityRMSNorm(
hidden_size, num_modality=num_modality
)
up_gate_proj instance-attribute ¶
up_gate_proj = ColumnParallelLinear(
input_size=hidden_size,
output_size=intermediate_size_up,
bias=False,
gather_output=False,
return_bias=False,
)
MLPActivationType ¶
MLPConfig dataclass ¶
MagiHumanDiTConfig dataclass ¶
checkpoint_qk_layernorm_rope class-attribute instance-attribute ¶
checkpoint_qk_layernorm_rope: bool = False
gelu7_layers class-attribute instance-attribute ¶
local_attn_layers class-attribute instance-attribute ¶
mm_layers class-attribute instance-attribute ¶
post_norm_layers class-attribute instance-attribute ¶
MoEColumnParallelLinear ¶
Bases: Module
Per-expert ColumnParallelLinear with modality dispatch.
Forward: dispatch → per-expert column-parallel matmul → undispatch. Output stays TP-local (no gather).
MoEQKVParallelLinear ¶
Bases: Module
Per-expert QKVParallelLinear with modality dispatch.
Wraps num_experts independent QKVParallelLinear instances. Forward: dispatch tokens by modality → per-expert QKV matmul (TP-sharded) → undispatch.
MoERowParallelLinear ¶
Bases: Module
Per-expert RowParallelLinear with modality dispatch.
Forward: dispatch → per-expert row-parallel matmul (includes all-reduce) → undispatch.
Modality ¶
MultiModalityRMSNorm ¶
Bases: Module
weight instance-attribute ¶
forward_multi_experts ¶
forward_multi_experts(
x: Tensor, modality_dispatcher: ModalityDispatcher
) -> Tensor
forward_single_expert ¶
forward_single_expert(
x: Tensor,
modality_dispatcher: ModalityDispatcher | None = None,
) -> Tensor
NativeMoELinear ¶
Bases: BaseLinear
forward ¶
forward(
input: Tensor,
output_dtype: dtype | None = None,
modality_dispatcher: ModalityDispatcher | None = None,
) -> Tensor
SimplePackedData dataclass ¶
SingleData dataclass ¶
TransFormerLayer ¶
Bases: Module
attn_post_norm instance-attribute ¶
attn_post_norm = MultiModalityRMSNorm(
hidden_size, num_modality=num_modality
)
mlp_post_norm instance-attribute ¶
mlp_post_norm = MultiModalityRMSNorm(
hidden_size, num_modality=num_modality
)
forward ¶
forward(
hidden_states: Tensor,
rope: Tensor,
permute_mapping: Tensor,
inv_permute_mapping: Tensor,
varlen_handler: VarlenHandler,
local_attn_handler: FFAHandler | None,
modality_dispatcher: ModalityDispatcher,
) -> Tensor
TransformerBlock ¶
Bases: Module
forward ¶
forward(
x: Tensor,
rope: Tensor,
permute_mapping: Tensor,
inv_permute_mapping: Tensor,
varlen_handler: VarlenHandler,
local_attn_handler: FFAHandler | None,
modality_dispatcher: ModalityDispatcher,
) -> Tensor
TransformerConfig dataclass ¶
VarlenHandler dataclass ¶
calc_local_attn_ffa_handler ¶
calc_local_attn_ffa_handler(
num_video_tokens,
num_audio_and_txt_tokens,
num_frames,
frame_receptive_field,
)
calc_local_qk_range ¶
calc_local_qk_range(
num_video_tokens,
num_audio_and_txt_tokens,
num_frames,
frame_receptive_field,
)
create_linear ¶
create_linear(
in_features,
out_features,
num_layers=1,
num_experts=1,
bias=True,
device=None,
dtype=None,
) -> BaseLinear | NativeMoELinear
flex_flash_attn_func ¶
flex_flash_attn_func(
query: Tensor,
key: Tensor,
value: Tensor,
q_ranges: Tensor,
k_ranges: Tensor,
) -> tuple[Tensor, Tensor]
flex_flash_attn_no_cp ¶
flex_flash_attn_no_cp(
q: Tensor,
k: Tensor,
v: Tensor,
q_ranges: Tensor,
k_ranges: Tensor,
) -> Tensor
freq_bands ¶
freq_bands(
num_bands: int,
temperature: float = 10000.0,
step: int = 2,
device: device | None = None,
) -> Tensor
get_coords ¶
get_coords(
shape: list[int],
ref_feat_shape: list[int],
offset_thw: list[int] | None = None,
device: device = device("cpu"),
dtype: dtype = float32,
)
magi_compile ¶
No-op stub — vllm-omni handles execution; magi compilation is skipped.
validate_magi_human_tp_constraints ¶
validate_magi_human_tp_constraints(
*,
hidden_size: int,
num_heads_q: int,
num_heads_kv: int,
tensor_parallel_size: int,
) -> None
Validate MagiHuman TP divisibility constraints.
Both shared layers (num_modality == 1) and MoE layers (num_modality == 3) support TP via vLLM's parallel linear layers (QKVParallelLinear / ColumnParallelLinear / RowParallelLinear). MoE layers use per-expert parallel layers with modality dispatch.
Supported tp_sizes given default config (hidden=5120, heads_q=40, kv=8): 1, 2, 4.