vllm_omni.diffusion.models.nextstep_1_1.modeling_nextstep ¶
NextStepModel ¶
Bases: Module
embed_tokens instance-attribute ¶
image_head instance-attribute ¶
image_head = FlowMatchingHead(
input_dim=token_dim,
cond_dim=config.hidden_size,
dim=config.fm_head_dim,
layers=config.fm_head_layers,
)
image_in_projector instance-attribute ¶
image_out_projector instance-attribute ¶
layers instance-attribute ¶
layers = nn.ModuleList(
[
(LlamaDecoderLayer(config, layer_idx))
for layer_idx in (range(config.num_hidden_layers))
]
)
lm_head instance-attribute ¶
forward_model ¶
forward_model(
inputs_embeds: FloatTensor,
attention_mask: Tensor | None = None,
past_key_values: Cache
| list[FloatTensor]
| None = None,
use_cache: bool | None = None,
output_attentions: bool | None = None,
output_hidden_states: bool | None = None,
cache_position: LongTensor | None = None,
) -> BaseModelOutputWithPast