vllm_omni.diffusion.models.dreamzero.pipeline_dreamzero ¶
DreamZero pipeline for vllm-omni.
Entry point for DiffusionEngine.step() → pipeline.forward(req)
DreamZeroPipeline ¶
Bases: Module, CFGParallelMixin
DreamZero world model pipeline.
Multi-output: predict_noise() returns (video_pred, action_pred). CFG: video gets standard CFG, action takes positive branch only. State: DreamZeroState manages KV cache + frame buffer across forward() calls.
decouple_inference_noise instance-attribute ¶
decouple_inference_noise: bool = ah_config[
"decouple_inference_noise"
]
default_robot_embodiment instance-attribute ¶
default_robot_embodiment = get(
"default_robot_embodiment", DEFAULT_EMBODIMENT
)
embodiment_name_to_id instance-attribute ¶
embodiment_name_to_id: dict[str, int] = get(
"embodiment_name_to_id", DEFAULT_EMBODIMENT_NAME_TO_ID
)
negative_prompt instance-attribute ¶
negative_prompt: str = get(
"negative_prompt", DEFAULT_NEGATIVE_PROMPT
)
num_frame_per_block instance-attribute ¶
num_frame_per_block: int = ah_config['num_frame_per_block']
num_inference_steps instance-attribute ¶
num_inference_steps: int = get(
"num_inference_steps", DEFAULT_NUM_INFERENCE_STEPS
)
scheduler instance-attribute ¶
scheduler = FlowUniPCMultistepScheduler(
num_train_timesteps=1000,
shift=1,
use_dynamic_shifting=False,
)
video_inference_final_noise instance-attribute ¶
video_inference_final_noise: float = ah_config[
"video_inference_final_noise"
]
combine_cfg_noise ¶
combine_cfg_noise(
positive_noise_pred: Tensor | tuple[Tensor, ...],
negative_noise_pred: Tensor | tuple[Tensor, ...],
true_cfg_scale: float,
cfg_normalize: bool = False,
) -> Tensor | tuple[Tensor, ...]
Video: standard CFG. Action: positive only (no CFG). action = cond only (no uncond blending)
decode_video_latents ¶
Decode normalized VAE latents into RGB video tensors.
diffuse ¶
diffuse(
video_latents: Tensor,
action_latents: Tensor,
timesteps_video: Tensor,
timesteps_action: Tensor,
prompt_embeds: Tensor,
negative_prompt_embeds: Tensor | None,
video_action_scheduler: VideoActionScheduler,
do_true_cfg: bool,
state: DreamZeroState,
**kwargs,
) -> tuple[Tensor, Tensor]
Denoising loop with CFG parallel support.
For each timestep
- Build positive_kwargs / negative_kwargs
- predict_noise_maybe_with_cfg() → (video_pred, action_pred)
- scheduler_step_maybe_with_cfg() → VideoActionScheduler
- _synchronize_cfg_parallel_step_output()
forward ¶
forward(
req: OmniDiffusionRequest, **kwargs
) -> DiffusionOutput
Full inference step. Called by DiffusionEngine.step().
load_weights ¶
Load checkpoint weights with key remapping.