vllm_omni.diffusion.models.helios.helios_transformer ¶
ColumnParallelGELU ¶
Bases: Module
Column parallel linear with GELU activation.
DistributedRMSNorm ¶
HeliosCrossAttention ¶
Bases: Module
Optimized cross-attention for Helios.
attn instance-attribute ¶
attn = Attention(
num_heads=num_heads,
head_size=head_dim,
num_kv_heads=num_heads,
softmax_scale=1.0 / head_dim**0.5,
causal=False,
)
to_k instance-attribute ¶
to_k = ColumnParallelLinear(
dim,
inner_dim,
bias=True,
gather_output=False,
return_bias=False,
quant_config=quant_config,
)
to_out instance-attribute ¶
to_out = RowParallelLinear(
inner_dim,
dim,
bias=True,
input_is_parallel=True,
return_bias=False,
quant_config=quant_config,
)
to_q instance-attribute ¶
to_q = ColumnParallelLinear(
dim,
inner_dim,
bias=True,
gather_output=False,
return_bias=False,
quant_config=quant_config,
)
to_v instance-attribute ¶
to_v = ColumnParallelLinear(
dim,
inner_dim,
bias=True,
gather_output=False,
return_bias=False,
quant_config=quant_config,
)
HeliosFeedForward ¶
Bases: Module
TP-enabled FeedForward network for Helios.
net_0 instance-attribute ¶
net_0 = ColumnParallelGELU(
dim,
inner_dim,
approximate="tanh",
bias=bias,
quant_config=quant_config,
)
net_2 instance-attribute ¶
net_2 = RowParallelLinear(
inner_dim,
dim_out,
bias=bias,
input_is_parallel=True,
return_bias=False,
quant_config=quant_config,
)
HeliosOutputNorm ¶
Bases: Module
Output normalization that extracts only original_context_length tokens.
HeliosRotaryPosEmbed ¶
HeliosSelfAttention ¶
Bases: Module
Optimized self-attention for Helios with history amplification support.
attn instance-attribute ¶
attn = Attention(
num_heads=num_heads,
head_size=head_dim,
num_kv_heads=num_kv_heads,
softmax_scale=1.0 / head_dim**0.5,
causal=False,
)
to_out instance-attribute ¶
to_out = RowParallelLinear(
inner_dim,
dim,
bias=True,
input_is_parallel=True,
return_bias=False,
quant_config=quant_config,
)
to_qkv instance-attribute ¶
to_qkv = QKVParallelLinear(
hidden_size=dim,
head_size=head_dim,
total_num_heads=num_heads,
bias=True,
quant_config=quant_config,
)
HeliosTimeTextEmbedding ¶
HeliosTransformer3DModel ¶
Bases: Module
Optimized Helios Transformer model for video generation using vLLM layers.
Helios extends the Wan2.2 architecture with multi-term memory patches, guidance cross-attention, and chunked video generation support.
blocks instance-attribute ¶
blocks = ModuleList(
[
(
HeliosTransformerBlock(
inner_dim,
ffn_dim,
num_attention_heads,
eps,
cross_attn_norm,
guidance_cross_attn=guidance_cross_attn,
is_amplify_history=is_amplify_history,
history_scale_mode=history_scale_mode,
quant_config=quant_config,
)
)
for _ in (range(num_layers))
]
)
condition_embedder instance-attribute ¶
condition_embedder = HeliosTimeTextEmbedding(
dim=inner_dim,
time_freq_dim=freq_dim,
time_proj_dim=inner_dim * 6,
text_embed_dim=text_dim,
)
config instance-attribute ¶
config = type(
"Config",
(),
{
"patch_size": patch_size,
"num_attention_heads": num_attention_heads,
"attention_head_dim": attention_head_dim,
"in_channels": in_channels,
"out_channels": out_channels,
"text_dim": text_dim,
"freq_dim": freq_dim,
"ffn_dim": ffn_dim,
"num_layers": num_layers,
"cross_attn_norm": cross_attn_norm,
"qk_norm": qk_norm,
"eps": eps,
"added_kv_proj_dim": added_kv_proj_dim,
"rope_dim": rope_dim,
"rope_theta": rope_theta,
"guidance_cross_attn": guidance_cross_attn,
"zero_history_timestep": zero_history_timestep,
"has_multi_term_memory_patch": has_multi_term_memory_patch,
"is_amplify_history": is_amplify_history,
"history_scale_mode": history_scale_mode,
},
)()
has_multi_term_memory_patch instance-attribute ¶
packed_modules_mapping class-attribute instance-attribute ¶
patch_embedding instance-attribute ¶
patch_embedding = Conv3dLayer(
in_channels=in_channels,
out_channels=inner_dim,
kernel_size=patch_size,
stride=patch_size,
)
patch_long instance-attribute ¶
patch_long = Conv3dLayer(
in_channels=in_channels,
out_channels=inner_dim,
kernel_size=(4, 8, 8),
stride=(4, 8, 8),
)
patch_mid instance-attribute ¶
patch_mid = Conv3dLayer(
in_channels=in_channels,
out_channels=inner_dim,
kernel_size=(2, 4, 4),
stride=(2, 4, 4),
)
patch_short instance-attribute ¶
patch_short = Conv3dLayer(
in_channels=in_channels,
out_channels=inner_dim,
kernel_size=(1, 2, 2),
stride=(1, 2, 2),
)
forward ¶
forward(
hidden_states: Tensor,
timestep: LongTensor,
encoder_hidden_states: Tensor,
indices_hidden_states: Tensor | None = None,
indices_latents_history_short: Tensor | None = None,
indices_latents_history_mid: Tensor | None = None,
indices_latents_history_long: Tensor | None = None,
latents_history_short: Tensor | None = None,
latents_history_mid: Tensor | None = None,
latents_history_long: Tensor | None = None,
return_dict: bool = True,
attention_kwargs: dict[str, Any] | None = None,
) -> Tensor | Transformer2DModelOutput
HeliosTransformerBlock ¶
Bases: Module
Transformer block with guidance cross-attention and history support.
attn1 instance-attribute ¶
attn1 = HeliosSelfAttention(
dim=dim,
num_heads=num_heads,
head_dim=head_dim,
eps=eps,
is_amplify_history=is_amplify_history,
history_scale_mode=history_scale_mode,
quant_config=quant_config,
)
attn2 instance-attribute ¶
attn2 = HeliosCrossAttention(
dim=dim,
num_heads=num_heads,
head_dim=head_dim,
eps=eps,
quant_config=quant_config,
)
ffn instance-attribute ¶
ffn = HeliosFeedForward(
dim=dim,
inner_dim=ffn_dim,
dim_out=dim,
quant_config=quant_config,
)
norm2 instance-attribute ¶
apply_rotary_emb_helios ¶
Apply Helios-style rotary embeddings.
freqs_cis contains [cos_t, cos_y, cos_x, sin_t, sin_y, sin_x] concatenated along the last dimension, with shape [B, seq, D*2] where D = DT+DY+DX. hidden_states has shape [B, seq, H, head_dim].