vllm_omni.diffusion.models.dreamzero.transform.base ¶
Base transform interface for DreamZero robot policy serving.
Transforms handle dataset-specific concerns ONLY
- Observation key mapping
- Multi-view stitching (embodiment-specific layout)
- Language template wrapping (embodiment-specific)
- Raw state extraction (dataset-specific keys)
- Output action slicing (to actual action_dim)
Model-specific concerns belong in the pipeline
- Tokenization (pipeline owns tokenizer)
- State padding (pipeline knows MAX_STATE_DIM)
- Negative prompt (pipeline owns the string)
- Noise generation, encoding, decoding
Flow
raw obs (dataset format) → DreamZeroPipeline selects transform by embodiment → unified dict (stitched video, templated prompt str, raw state) → tokenize, pad, encode, denoise → transform_action_output() → ndarray (N, action_dim)
RobotPolicyTransform ¶
Base class for dataset-specific observation transforms.
Subclasses MUST define
IMAGE_KEY_MAP: dict — dataset obs keys → unified keys EMBODIMENT_NAME: str — embodiment identity (pipeline maps to numeric ID) ACTION_DIM: int — actual action dimensions (for output slicing)
Subclasses MUST override
_stitch_views() — multi-view → single stitched image _language_template() — prompt → embodiment-aware template _extract_raw_state() — obs → raw state ndarray
transform_action_output ¶
Adapt model action output to this transform's action dimensions.