vllm_omni.diffusion.models.dreamid_omni.fusion ¶
FusedBlock ¶
Bases: Module
Wrapper pairing a video block and audio block for layerwise offloading.
Registers both blocks as submodules so their parameters are visible to the offload hooks.
forward ¶
forward(
hidden_states,
encoder_hidden_states,
attn: Attention,
vid_e,
vid_seq_lens,
vid_grid_sizes,
vid_freqs,
vid_context,
vid_context_lens,
vid_ref_lengths,
vid_freqs_scaling,
audio_e,
audio_seq_lens,
audio_grid_sizes,
audio_freqs,
audio_context,
audio_context_lens,
audio_ref_lengths,
audio_freqs_scaling,
)
FusionModel ¶
Bases: Module
attn instance-attribute ¶
attn = Attention(
num_heads=num_heads,
head_size=head_dim,
num_kv_heads=num_heads,
softmax_scale=1.0 / head_dim**0.5,
causal=False,
)
audio_model instance-attribute ¶
audio_model = WanModel(
quant_config=quant_config,
prefix="audio_model",
**audio_config,
)
fused_blocks instance-attribute ¶
fused_blocks = ModuleList(
[
(FusedBlock(blocks[i], blocks[i], device))
for i in (range(num_blocks))
]
)
packed_modules_mapping class-attribute instance-attribute ¶
video_model instance-attribute ¶
video_model = WanModel(
quant_config=quant_config,
prefix="video_model",
**video_config,
)
forward ¶
forward(
vid,
audio,
t,
vid_context,
audio_context,
vid_seq_len,
audio_seq_len,
ref_ip_lengths=None,
ref_audio_lengths=None,
slg_layer=False,
freqs_scaling=None,
)
inject_cross_attention_kv_projections ¶
load_state_dict ¶
Remap checkpoints where blocks are stored under video_model.blocks.N.* / audio_model.blocks.N.* to the current fused_blocks.N.vid_block.* / fused_blocks.N.audio_block.*.
merge_kwargs ¶
keys in each kwarg: e seq_lens grid_sizes freqs context context_lens