vllm_omni.diffusion.models.magi_human.pipeline_magi_human ¶
EvalInput dataclass ¶
FlowUniPCMultistepScheduler ¶
Bases: SchedulerMixin, ConfigMixin
convert_model_output ¶
multistep_uni_c_bh_update ¶
multistep_uni_c_bh_update(
this_model_output: Tensor,
*args,
last_sample: Tensor = None,
this_sample: Tensor = None,
order: int | None = None,
**kwargs,
) -> Tensor
multistep_uni_p_bh_update ¶
multistep_uni_p_bh_update(
model_output: Tensor,
*args,
sample: Tensor | None = None,
order: int | None = None,
**kwargs,
) -> Tensor
set_timesteps ¶
set_timesteps(
num_inference_steps: int | None = None,
device: str | device = None,
sigmas: list[float] | None = None,
mu: float | None | None = None,
shift: float | None | None = None,
)
step ¶
step(
model_output: Tensor,
timestep: int | Tensor,
sample: Tensor,
return_dict: bool = True,
generator=None,
) -> SchedulerOutput | tuple
step_ddim ¶
step_ddim(
velocity: FloatTensor,
t: int,
curr_state: FloatTensor,
prev_state: FloatTensor | None = None,
generator: Generator | None = None,
)
MagiHumanPipeline ¶
Bases: Module, ProgressBarMixin, DiffusionPipelineProfilerMixin
audio_txt_guidance_scale instance-attribute ¶
audio_vae instance-attribute ¶
audio_vae = SAAudioFeatureExtractor(
device=device,
model_path=_resolve_subdir(
model_path,
"audio_vae",
local_files_only,
required_files=[
"config.json",
"model_config.json",
"model.safetensors",
],
),
)
data_proxy instance-attribute ¶
data_proxy = MagiDataProxy(
patch_size=get("patch_size", 2),
t_patch_size=get("t_patch_size", 1),
frame_receptive_field=get("frame_receptive_field", 11),
spatial_rope_interpolation=get(
"spatial_rope_interpolation", "extra"
),
ref_audio_offset=get("ref_audio_offset", 1000),
text_offset=get("text_offset", 0),
coords_style=get("coords_style", "v2"),
)
num_inference_steps_default instance-attribute ¶
sr_data_proxy instance-attribute ¶
sr_data_proxy = MagiDataProxy(
patch_size=get("patch_size", 2),
t_patch_size=get("t_patch_size", 1),
frame_receptive_field=get("frame_receptive_field", 11),
spatial_rope_interpolation=get(
"spatial_rope_interpolation", "extra"
),
ref_audio_offset=get("ref_audio_offset", 1000),
text_offset=get("text_offset", 0),
coords_style="v1",
)
sr_num_inference_steps_default instance-attribute ¶
sr_video_txt_guidance_scale instance-attribute ¶
t5_gemma_target_length instance-attribute ¶
text_encoder instance-attribute ¶
text_encoder = _T5GemmaEncoder(
model_path=txt_enc_path,
device=device,
weight_dtype=dtype,
subfolder=txt_enc_subfolder,
)
vae_latent_mean instance-attribute ¶
vae_latent_std instance-attribute ¶
video_txt_guidance_scale instance-attribute ¶
weights_sources instance-attribute ¶
weights_sources = [
ComponentSource(
model_or_path=model_path,
subfolder=dit_subfolder,
revision=None,
prefix="dit.",
fall_back_to_pt=True,
),
ComponentSource(
model_or_path=model_path,
subfolder=sr_dit_subfolder,
revision=None,
prefix="sr_dit.",
fall_back_to_pt=True,
),
]
zerosnr_sigmas instance-attribute ¶
zerosnr_sigmas = ZeroSNRDDPMDiscretization()(
1000, do_append_zero=False, flip=True
)
encode_prompt ¶
Encode prompt with the T5-Gemma text encoder and pad to fixed length.
This is the single text-encoder entrypoint so the runner-level prompt-embedding cache (see vllm_omni/diffusion/cache/prompt_embed_cache.py) can transparently memoize results when the same prompt is submitted repeatedly.
Returns:
| Type | Description |
|---|---|
Tensor |
|
int | func: |
SAAudioFeatureExtractor ¶
ZeroSNRDDPMDiscretization ¶
ZeroSNR DDPM sigma schedule, ported from daVinci-MagiHuman. Used to compute sigma values for SR noise injection.