Skip to content

vllm_omni.diffusion.models.gr00t.modeling.gr00t_n1d7

logger module-attribute

logger = logging.getLogger(__name__)

Gr00tN1d7

Bases: PreTrainedModel

Gr00tN1d7: VLA model with Cosmos-Reason2-2B (Qwen3-VL) backbone.

action_head instance-attribute

action_head = Gr00tN1d7ActionHead(config)

all_tied_weights_keys property

all_tied_weights_keys: dict[str, Any]

backbone instance-attribute

backbone = backbone_cls(
    model_name=config.model_name,
    select_layer=config.select_layer,
    backbone_embedding_dim=config.backbone_embedding_dim,
    load_bf16=config.load_bf16,
    transformers_loading_kwargs=transformers_loading_kwargs,
)

collator instance-attribute

collator = Gr00tN1d7DataCollator(
    model_name=config.model_name,
    model_type=config.backbone_model_type,
    transformers_loading_kwargs=transformers_loading_kwargs,
)

config instance-attribute

config = config

config_class class-attribute instance-attribute

config_class = Gr00tN1d7Config

device property

device

dtype property

dtype

supports_gradient_checkpointing class-attribute instance-attribute

supports_gradient_checkpointing = True

forward

forward(inputs: dict) -> BatchFeature

Forward pass through the complete model.

Parameters:

Name Type Description Default
inputs dict

Dictionary containing: - Action inputs (state, action, embodiment_id, etc.)

required

Returns:

Type Description
BatchFeature

BatchFeature containing loss and other outputs

get_action

get_action(
    inputs: dict, options: dict[str, Any] | None = None
) -> BatchFeature

Generate actions using the complete model.

prepare_input

prepare_input(
    inputs: dict,
) -> tuple[BatchFeature, BatchFeature]

Prepare inputs for backbone and action head.

Gr00tN1d7ActionHead

Bases: Module

Action head component for flow matching diffusion policy.

action_decoder instance-attribute

action_decoder = CategorySpecificMLP(
    num_categories=config.max_num_embodiments,
    input_dim=self.hidden_size,
    hidden_dim=self.hidden_size,
    output_dim=self.action_dim,
)

action_dim instance-attribute

action_dim = config.max_action_dim

action_encoder instance-attribute

action_encoder = MultiEmbodimentActionEncoder(
    action_dim=self.action_dim,
    hidden_size=self.input_embedding_dim,
    num_embodiments=config.max_num_embodiments,
)

action_horizon instance-attribute

action_horizon = config.action_horizon

config instance-attribute

config = config

device property

device

dtype property

dtype

hidden_size instance-attribute

hidden_size = config.hidden_size

input_embedding_dim instance-attribute

input_embedding_dim = config.input_embedding_dim

model instance-attribute

model = AlternateVLDiT(
    **(config.diffusion_model_cfg),
    cross_attention_dim=config.backbone_embedding_dim,
    attend_text_every_n_blocks=config.attend_text_every_n_blocks,
)

num_inference_timesteps instance-attribute

num_inference_timesteps = config.num_inference_timesteps

num_timestep_buckets instance-attribute

num_timestep_buckets = config.num_timestep_buckets

position_embedding instance-attribute

position_embedding = nn.Embedding(
    config.max_seq_len, self.input_embedding_dim
)

state_encoder instance-attribute

state_encoder = CategorySpecificMLP(
    num_categories=config.max_num_embodiments,
    input_dim=config.max_state_dim
    * config.state_history_length,
    hidden_dim=self.hidden_size,
    output_dim=self.input_embedding_dim,
)

supports_gradient_checkpointing class-attribute instance-attribute

supports_gradient_checkpointing = True

vl_self_attention instance-attribute

vl_self_attention = SelfAttentionTransformer(
    **vl_self_attention_cfg
)

vlln instance-attribute

vlln = (
    nn.LayerNorm(config.backbone_embedding_dim)
    if config.use_vlln
    else nn.Identity()
)

get_action

get_action(
    backbone_output: BatchFeature,
    action_input: BatchFeature,
    options: dict[str, Any] | None = None,
) -> BatchFeature

Generate actions using the flow matching diffusion process.

Parameters:

Name Type Description Default
backbone_output BatchFeature

Output from the backbone model containing: - backbone_features: [B, seq_len, backbone_embedding_dim] - backbone_attention_mask: [B, seq_len]

required
action_input BatchFeature

Input containing: - state: [B, state_dim] - embodiment_id: [B] (embodiment IDs)

required

Returns:

Type Description
BatchFeature

BatchFeature containing: - action_pred: [B, action_horizon, action_dim] predicted actions

get_action_with_features

get_action_with_features(
    backbone_features: Tensor,
    state_features: Tensor,
    embodiment_id: Tensor,
    backbone_output: BatchFeature,
    action_input: BatchFeature,
    options: dict[str, Any] | None = None,
) -> BatchFeature

Generate actions using the flow matching diffusion process.

Parameters:

Name Type Description Default
backbone_features Tensor

[B, seq_len, backbone_embedding_dim]

required
state_features Tensor

[B, state_horizon, input_embedding_dim]

required
embodiment_id Tensor

[B] (embodiment IDs)

required
backbone_output BatchFeature

Output from the backbone model

required

prepare_input

prepare_input(batch: dict) -> BatchFeature

Prepare input batch for the action head.

process_backbone_output

process_backbone_output(
    backbone_output: BatchFeature,
) -> BatchFeature

get_backbone_cls

get_backbone_cls(config: Gr00tN1d7Config)