Skip to content

vllm_omni.diffusion.models.dreamzero.pipeline_dreamzero

DreamZero pipeline for vllm-omni.

Entry point for DiffusionEngine.step() → pipeline.forward(req)

MAX_DREAMZERO_SESSIONS module-attribute

MAX_DREAMZERO_SESSIONS = 64

logger module-attribute

logger = getLogger(__name__)

DreamZeroPipeline

Bases: Module, CFGParallelMixin

DreamZero world model pipeline.

Multi-output: predict_noise() returns (video_pred, action_pred). CFG: video gets standard CFG, action takes positive branch only. State: DreamZeroState manages KV cache + frame buffer across forward() calls.

action_horizon instance-attribute

action_horizon: int = ah_config['action_horizon']

action_norm_stats instance-attribute

action_norm_stats = _parse_action_norm_stats(metadata)

cfg_scale instance-attribute

cfg_scale: float = get('cfg_scale', DEFAULT_CFG_SCALE)

decouple_inference_noise instance-attribute

decouple_inference_noise: bool = ah_config[
    "decouple_inference_noise"
]

default_robot_embodiment instance-attribute

default_robot_embodiment = get(
    "default_robot_embodiment", DEFAULT_EMBODIMENT
)

embodiment_name_to_id instance-attribute

embodiment_name_to_id: dict[str, int] = get(
    "embodiment_name_to_id", DEFAULT_EMBODIMENT_NAME_TO_ID
)

image_encoder instance-attribute

image_encoder = DreamZeroImageEncoder()

max_action_dim instance-attribute

max_action_dim: int = ah_config['max_action_dim']

max_state_dim instance-attribute

max_state_dim: int = ah_config['max_state_dim']

negative_prompt instance-attribute

negative_prompt: str = get(
    "negative_prompt", DEFAULT_NEGATIVE_PROMPT
)

num_frame_per_block instance-attribute

num_frame_per_block: int = ah_config['num_frame_per_block']

num_frames instance-attribute

num_frames: int = ah_config['num_frames']

num_inference_steps instance-attribute

num_inference_steps: int = get(
    "num_inference_steps", DEFAULT_NUM_INFERENCE_STEPS
)

od_config instance-attribute

od_config = od_config

relative_action instance-attribute

relative_action: bool = get('relative_action', True)

relative_action_dim instance-attribute

relative_action_dim: int = get('relative_action_dim', 7)

scheduler instance-attribute

scheduler = FlowUniPCMultistepScheduler(
    num_train_timesteps=1000,
    shift=1,
    use_dynamic_shifting=False,
)

seed instance-attribute

seed: int = get('seed', DEFAULT_SEED)

sigma_shift instance-attribute

sigma_shift: float = get("sigma_shift", DEFAULT_SIGMA_SHIFT)

state instance-attribute

state = _get_or_create_state('default')

state_norm_stats instance-attribute

state_norm_stats = _parse_state_norm_stats(metadata)

text_encoder instance-attribute

text_encoder = UMT5EncoderModel(umt5_config)

tokenizer instance-attribute

tokenizer = from_pretrained(tokenizer_source)

transformer instance-attribute

transformer = CausalWanModel(**transformer_kwargs)

vae instance-attribute

vae = from_pretrained(vae_source, torch_dtype=float32)

video_inference_final_noise instance-attribute

video_inference_final_noise: float = ah_config[
    "video_inference_final_noise"
]

weights_sources property

weights_sources

ComponentSource list for DiffusersPipelineLoader.

combine_cfg_noise

combine_cfg_noise(
    positive_noise_pred: Tensor | tuple[Tensor, ...],
    negative_noise_pred: Tensor | tuple[Tensor, ...],
    true_cfg_scale: float,
    cfg_normalize: bool = False,
) -> Tensor | tuple[Tensor, ...]

Video: standard CFG. Action: positive only (no CFG). action = cond only (no uncond blending)

decode_video_latents

decode_video_latents(video_latents: Tensor) -> Tensor

Decode normalized VAE latents into RGB video tensors.

diffuse

diffuse(
    video_latents: Tensor,
    action_latents: Tensor,
    timesteps_video: Tensor,
    timesteps_action: Tensor,
    prompt_embeds: Tensor,
    negative_prompt_embeds: Tensor | None,
    video_action_scheduler: VideoActionScheduler,
    do_true_cfg: bool,
    state: DreamZeroState,
    **kwargs,
) -> tuple[Tensor, Tensor]

Denoising loop with CFG parallel support.

For each timestep
  1. Build positive_kwargs / negative_kwargs
  2. predict_noise_maybe_with_cfg() → (video_pred, action_pred)
  3. scheduler_step_maybe_with_cfg() → VideoActionScheduler
  4. _synchronize_cfg_parallel_step_output()

forward

forward(
    req: OmniDiffusionRequest, **kwargs
) -> DiffusionOutput

Full inference step. Called by DiffusionEngine.step().

load_weights

load_weights(
    weights: Iterable[tuple[str, Tensor]],
) -> set[str]

Load checkpoint weights with key remapping.

predict_noise

predict_noise(**kwargs) -> tuple[Tensor, Tensor]

Call CausalWanModel, return (video_pred, action_pred).

VideoActionScheduler

Wraps video + action schedulers into single .step() interface.

action_scheduler instance-attribute

action_scheduler = action_scheduler

video_scheduler instance-attribute

video_scheduler = video_scheduler

step

step(
    noise_pred,
    t,
    latents,
    return_dict=False,
    generator=None,
)