Skip to content

vllm_omni.diffusion.models.dreamzero.transform.base

Base transform interface for DreamZero robot policy serving.

Transforms handle dataset-specific concerns ONLY
  • Observation key mapping
  • Multi-view stitching (embodiment-specific layout)
  • Language template wrapping (embodiment-specific)
  • Raw state extraction (dataset-specific keys)
  • Output action slicing (to actual action_dim)
Model-specific concerns belong in the pipeline
  • Tokenization (pipeline owns tokenizer)
  • State padding (pipeline knows MAX_STATE_DIM)
  • Negative prompt (pipeline owns the string)
  • Noise generation, encoding, decoding
Flow

raw obs (dataset format) → DreamZeroPipeline selects transform by embodiment → unified dict (stitched video, templated prompt str, raw state) → tokenize, pad, encode, denoise → transform_action_output() → ndarray (N, action_dim)

TRANSFORMS module-attribute

TRANSFORMS: dict[str, RobotPolicyTransform] = {}

RobotPolicyTransform

Base class for dataset-specific observation transforms.

Subclasses MUST define

IMAGE_KEY_MAP: dict — dataset obs keys → unified keys EMBODIMENT_NAME: str — embodiment identity (pipeline maps to numeric ID) ACTION_DIM: int — actual action dimensions (for output slicing)

Subclasses MUST override

_stitch_views() — multi-view → single stitched image _language_template() — prompt → embodiment-aware template _extract_raw_state() — obs → raw state ndarray

ACTION_DIM instance-attribute

ACTION_DIM: int

EMBODIMENT_NAME instance-attribute

EMBODIMENT_NAME: str

IMAGE_KEY_MAP instance-attribute

IMAGE_KEY_MAP: dict[str, str]

transform_action_output

transform_action_output(actions: Any) -> ndarray

Adapt model action output to this transform's action dimensions.

transform_input

transform_input(obs: dict) -> dict

Dataset-specific transform: key map → stitch → template → state.

get_transform

get_transform(name: str) -> RobotPolicyTransform

register_transform

register_transform(
    name: str, transform: RobotPolicyTransform
) -> None