vllm_omni.diffusion.models.gr00t.modeling.gr00t_n1d7 ¶
Gr00tN1d7 ¶
Bases: PreTrainedModel
Gr00tN1d7: VLA model with Cosmos-Reason2-2B (Qwen3-VL) backbone.
backbone instance-attribute ¶
backbone = backbone_cls(
model_name=config.model_name,
select_layer=config.select_layer,
backbone_embedding_dim=config.backbone_embedding_dim,
load_bf16=config.load_bf16,
transformers_loading_kwargs=transformers_loading_kwargs,
)
collator instance-attribute ¶
collator = Gr00tN1d7DataCollator(
model_name=config.model_name,
model_type=config.backbone_model_type,
transformers_loading_kwargs=transformers_loading_kwargs,
)
supports_gradient_checkpointing class-attribute instance-attribute ¶
forward ¶
forward(inputs: dict) -> BatchFeature
Forward pass through the complete model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
inputs | dict | Dictionary containing: - Action inputs (state, action, embodiment_id, etc.) | required |
Returns:
| Type | Description |
|---|---|
BatchFeature | BatchFeature containing loss and other outputs |
get_action ¶
Generate actions using the complete model.
Gr00tN1d7ActionHead ¶
Bases: Module
Action head component for flow matching diffusion policy.
action_decoder instance-attribute ¶
action_decoder = CategorySpecificMLP(
num_categories=config.max_num_embodiments,
input_dim=self.hidden_size,
hidden_dim=self.hidden_size,
output_dim=self.action_dim,
)
action_encoder instance-attribute ¶
action_encoder = MultiEmbodimentActionEncoder(
action_dim=self.action_dim,
hidden_size=self.input_embedding_dim,
num_embodiments=config.max_num_embodiments,
)
model instance-attribute ¶
model = AlternateVLDiT(
**(config.diffusion_model_cfg),
cross_attention_dim=config.backbone_embedding_dim,
attend_text_every_n_blocks=config.attend_text_every_n_blocks,
)
num_inference_timesteps instance-attribute ¶
position_embedding instance-attribute ¶
state_encoder instance-attribute ¶
state_encoder = CategorySpecificMLP(
num_categories=config.max_num_embodiments,
input_dim=config.max_state_dim
* config.state_history_length,
hidden_dim=self.hidden_size,
output_dim=self.input_embedding_dim,
)
supports_gradient_checkpointing class-attribute instance-attribute ¶
vl_self_attention instance-attribute ¶
vl_self_attention = SelfAttentionTransformer(
**vl_self_attention_cfg
)
vlln instance-attribute ¶
get_action ¶
get_action(
backbone_output: BatchFeature,
action_input: BatchFeature,
options: dict[str, Any] | None = None,
) -> BatchFeature
Generate actions using the flow matching diffusion process.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
backbone_output | BatchFeature | Output from the backbone model containing: - backbone_features: [B, seq_len, backbone_embedding_dim] - backbone_attention_mask: [B, seq_len] | required |
action_input | BatchFeature | Input containing: - state: [B, state_dim] - embodiment_id: [B] (embodiment IDs) | required |
Returns:
| Type | Description |
|---|---|
BatchFeature | BatchFeature containing: - action_pred: [B, action_horizon, action_dim] predicted actions |
get_action_with_features ¶
get_action_with_features(
backbone_features: Tensor,
state_features: Tensor,
embodiment_id: Tensor,
backbone_output: BatchFeature,
action_input: BatchFeature,
options: dict[str, Any] | None = None,
) -> BatchFeature
Generate actions using the flow matching diffusion process.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
backbone_features | Tensor | [B, seq_len, backbone_embedding_dim] | required |
state_features | Tensor | [B, state_horizon, input_embedding_dim] | required |
embodiment_id | Tensor | [B] (embodiment IDs) | required |
backbone_output | BatchFeature | Output from the backbone model | required |