vllm.model_executor.models.olmo ¶
Inference-only OLMo model compatible with HuggingFace weights.
Classes:
-
OlmoAttention–This is the attention block where the output is computed as
-
OlmoDecoderLayer–This is a typical transformer block where the output is
-
OlmoForCausalLM–Extremely barebones HF model wrapper.
-
OlmoMLP–This is the MLP block where the output is computed as
-
OlmoModel–
OlmoAttention ¶
Bases: Module
This is the attention block where the output is computed as Attention(LN(x)) in MLP(LN(x + Attention(LN(x)))) (plus another skip connection).
Source code in vllm/model_executor/models/olmo.py
OlmoDecoderLayer ¶
Bases: Module
This is a typical transformer block where the output is computed as MLP(LN(x + Attention(LN(x)))) (plus another skip connection).
Source code in vllm/model_executor/models/olmo.py
OlmoForCausalLM ¶
Bases: Module, SupportsPP, SupportsLoRA
Extremely barebones HF model wrapper.
Source code in vllm/model_executor/models/olmo.py
OlmoMLP ¶
Bases: Module
This is the MLP block where the output is computed as MLP(LN(x)) in MLP(LN(x + Attention(LN(x)))) (plus another skip connection).
Source code in vllm/model_executor/models/olmo.py
OlmoModel ¶
Bases: Module
Methods:
-
forward–Args:
Source code in vllm/model_executor/models/olmo.py
forward(input_ids, positions, intermediate_tensors, inputs_embeds=None) ¶
Parameters: