vllm_omni.diffusion.models.dreamzero.action_encoder ¶
Action encoder/decoder for DreamZero.
CategorySpecificLinear ¶
CategorySpecificMLP ¶
Bases: Module
Two-layer MLP: layer1 (relu) → layer2
layer1 instance-attribute ¶
layer1 = CategorySpecificLinear(
num_categories, input_dim, hidden_dim
)
layer2 instance-attribute ¶
layer2 = CategorySpecificLinear(
num_categories, hidden_dim, output_dim
)
MultiEmbodimentActionEncoder ¶
Bases: Module
Encode actions with embodiment-specific weights + sinusoidal timestep.
Flow: actions → W1 → concat(a_emb, pos_enc(timesteps)) → W2 (swish) → W3
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
action_dim | int | action vector dimension (e.g. 32) | required |
hidden_size | int | output/hidden dimension (e.g. 5120 = model dim) | required |
num_embodiments | int | number of robot types (e.g. 32) | required |
W2 instance-attribute ¶
W2 = CategorySpecificLinear(
num_embodiments, 2 * hidden_size, hidden_size
)
forward ¶
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
actions | Tensor | (B, T, action_dim) | required |
timesteps | Tensor | (B, T) — per-token timestep | required |
cat_ids | Tensor | (B,) — embodiment id per sample | required |
Returns: (B, T, hidden_size)