llmcompressor.modeling.deepseekv32.model
Classes:
-
Block–Transformer block combining attention and feed-forward layers.
-
DeepseekV32ForCausalLM– -
Expert–Expert layer for Mixture-of-Experts (MoE) models.
-
Gate–Gating mechanism for routing inputs in a mixture-of-experts (MoE) model.
-
LayerNorm–Layer Normalization.
-
MLA–Multi-Head Latent Attention (MLA) Layer.
-
MLP–Multi-Layer Perceptron (MLP) used as a feed-forward layer.
-
MoE–Mixture-of-Experts (MoE) module.
-
ParallelEmbedding–Embedding layer with parallelism support across distributed processes.
-
RMSNorm–Root Mean Square Layer Normalization (RMSNorm).
-
Transformer–Transformer model with positional embeddings, multiple layers, and output projection.
Functions:
-
apply_rotary_emb–Applies rotary positional embeddings to the input tensor.
-
precompute_freqs_cis–Precomputes frequency-based complex exponential values for rotary positional embeddings.
Block
Bases: Module
Transformer block combining attention and feed-forward layers.
Attributes: self_attn (nn.Module): Attention layer (MLA). mlp (nn.Module): Feed-forward network (MLP or MoE). input_layernorm (nn.Module): Layer normalization for attention. post_attention_layernorm (nn.Module): Layer normalization for feed-forward network.
Initializes the Transformer block.
Args: layer_id (int): Layer index in the transformer. args (ModelConfig): Model arguments containing block parameters.
Methods:
-
forward–Forward pass for the Transformer block.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
forward
forward(
x: Tensor,
residual: Tensor,
start_pos: int,
freqs_cis: Tensor,
mask: Optional[Tensor],
) -> torch.Tensor
Forward pass for the Transformer block.
Args: x (torch.Tensor): Input tensor. start_pos (int): Starting position in the sequence. freqs_cis (torch.Tensor): Precomputed complex exponential values for rotary embeddings. mask (Optional[torch.Tensor]): Mask tensor to exclude certain positions from attention.
Returns: torch.Tensor: Output tensor after block computation.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
DeepseekV32ForCausalLM
Bases: DeepseekV32PreTrainedModel, GenerationMixin
Methods:
-
forward–Example:
Source code in src/llmcompressor/modeling/deepseekv32/model.py
forward
forward(
input_ids: LongTensor | None = None,
labels: LongTensor | None = None,
logits_to_keep: int | Tensor = 0,
**kwargs: Unpack[TransformersKwargs],
) -> CausalLMOutputWithPast
Example:
>>> from transformers import AutoTokenizer, DeepseekV3ForCausalLM
>>> model = DeepseekV3ForCausalLM.from_pretrained("meta-deepseek_v3/DeepseekV3-2-7b-hf")
>>> tokenizer = AutoTokenizer.from_pretrained("meta-deepseek_v3/DeepseekV3-2-7b-hf")
>>> prompt = "Hey, are you conscious? Can you talk to me?"
>>> inputs = tokenizer(prompt, return_tensors="pt")
>>> # Generate
>>> generate_ids = model.generate(inputs.input_ids, max_length=30)
>>> tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
"Hey, are you conscious? Can you talk to me?\nI'm not conscious, but I can talk to you."
Source code in src/llmcompressor/modeling/deepseekv32/model.py
Expert
Bases: Module
Expert layer for Mixture-of-Experts (MoE) models.
Attributes: gate_proj (nn.Module): Linear layer for input-to-hidden transformation. down_proj (nn.Module): Linear layer for hidden-to-output transformation. up_proj (nn.Module): Additional linear layer for feature transformation.
Initializes the Expert layer.
Args: dim (int): Input and output dimensionality. inter_dim (int): Hidden layer dimensionality.
Methods:
-
forward–Forward pass for the Expert layer.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
forward
Forward pass for the Expert layer.
Args: x (torch.Tensor): Input tensor.
Returns: torch.Tensor: Output tensor after expert computation.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
Gate
Bases: Module
Gating mechanism for routing inputs in a mixture-of-experts (MoE) model.
Attributes: dim (int): Dimensionality of input features. topk (int): Number of top experts activated for each input. n_groups (int): Number of groups for routing. topk_groups (int): Number of groups to route inputs to. score_func (str): Scoring function ('softmax' or 'sigmoid'). route_scale (float): Scaling factor for routing weights. weight (torch.nn.Parameter): Learnable weights for the gate. bias (Optional[torch.nn.Parameter]): Optional bias term for the gate.
Initializes the Gate module.
Args: args (ModelConfig): Model arguments containing gating parameters.
Methods:
-
forward–Forward pass for the gating mechanism.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
forward
Forward pass for the gating mechanism.
Args: x (torch.Tensor): Input tensor.
Returns: Tuple[torch.Tensor, torch.Tensor]: Routing weights and selected expert indices.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
LayerNorm
Bases: Module
Layer Normalization.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
MLA
Bases: Module
Multi-Head Latent Attention (MLA) Layer.
Attributes: dim (int): Dimensionality of the input features. n_heads (int): Number of attention heads. n_local_heads (int): Number of local attention heads for distributed systems. q_lora_rank (int): Rank for low-rank query projection. kv_lora_rank (int): Rank for low-rank key/value projection. qk_nope_head_dim (int): Dimensionality of non-positional query/key projections. qk_rope_head_dim (int): Dimensionality of rotary-positional query/key projections. qk_head_dim (int): Total dimensionality of query/key projections. v_head_dim (int): Dimensionality of value projections. softmax_scale (float): Scaling factor for softmax in attention computation.
Methods:
-
forward–Forward pass for the Multi-Head Latent Attention (MLA) Layer.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
forward
Forward pass for the Multi-Head Latent Attention (MLA) Layer.
Args: x (torch.Tensor): Input tensor of shape (batch_size, seq_len, dim). start_pos (int): Starting position in the sequence for caching. freqs_cis (torch.Tensor): Precomputed complex exponential values for rotary embeddings. mask (Optional[torch.Tensor]): Mask tensor to exclude certain positions from attention.
Returns: torch.Tensor: Output tensor with the same shape as the input.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 | |
MLP
Bases: Module
Multi-Layer Perceptron (MLP) used as a feed-forward layer.
Attributes: gate_proj (nn.Module): Linear layer for input-to-hidden transformation. down_proj (nn.Module): Linear layer for hidden-to-output transformation. up_proj (nn.Module): Additional linear layer for feature transformation.
Initializes the MLP layer.
Args: dim (int): Input and output dimensionality. inter_dim (int): Hidden layer dimensionality.
Methods:
-
forward–Forward pass for the MLP layer.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
forward
Forward pass for the MLP layer.
Args: x (torch.Tensor): Input tensor.
Returns: torch.Tensor: Output tensor after MLP computation.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
MoE
Bases: Module
Mixture-of-Experts (MoE) module.
Attributes: dim (int): Dimensionality of input features. n_routed_experts (int): Total number of experts in the model. n_local_experts (int): Number of experts handled locally in distributed systems. n_activated_experts (int): Number of experts activated for each input. gate (nn.Module): Gating mechanism to route inputs to experts. experts (nn.ModuleList): List of expert modules. shared_experts (nn.Module): Shared experts applied to all inputs.
Initializes the MoE module.
Args: args (ModelConfig): Model arguments containing MoE parameters.
Methods:
-
forward–Forward pass for the MoE module.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
forward
Forward pass for the MoE module.
Args: x (torch.Tensor): Input tensor.
Returns: torch.Tensor: Output tensor after expert routing and computation.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
ParallelEmbedding
Bases: Module
Embedding layer with parallelism support across distributed processes.
Args: vocab_size (int): Vocabulary size. dim (int): Embedding dimension.
Methods:
-
forward–Forward pass for parallel embedding layer.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
forward
Forward pass for parallel embedding layer.
Args: x (torch.Tensor): Input tensor containing token indices.
Returns: torch.Tensor: Embedded representations.
Raises:
ValueError: If world_size is not defined.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
RMSNorm
Bases: Module
Root Mean Square Layer Normalization (RMSNorm).
Args: dim (int): Dimension of the input tensor. eps (float): Epsilon value for numerical stability. Defaults to 1e-6.
Methods:
-
forward–Forward pass for RMSNorm.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
forward
Forward pass for RMSNorm.
Args: x (torch.Tensor): Input tensor.
Returns: torch.Tensor: Normalized tensor with the same shape as input.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
Transformer
Bases: Module
Transformer model with positional embeddings, multiple layers, and output projection.
Attributes: max_seq_len (int): Maximum sequence length for the transformer. embed_tokens (nn.Module): Embedding layer for input tokens. layers (torch.nn.ModuleList): List of transformer blocks. norm (nn.Module): Layer normalization applied after all blocks. lm_head (nn.Module): Output projection layer mapping to vocabulary size. freqs_cis (torch.Tensor): Precomputed complex exponential values for rotary embeddings.
Initializes the Transformer model.
Args: args (ModelConfig): Model arguments containing transformer parameters.
Methods:
-
forward–Forward pass for the Transformer model.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
forward
Forward pass for the Transformer model.
Args: input_ids (torch.Tensor): Input tensor of token IDs with shape (batch_size, seq_len). start_pos (int, optional): Starting position in the sequence for rotary embeddings. Defaults to 0.
Returns: torch.Tensor: Logits tensor of shape (batch_size, vocab_size).
Source code in src/llmcompressor/modeling/deepseekv32/model.py
apply_rotary_emb
Applies rotary positional embeddings to the input tensor.
Args: x (torch.Tensor): Input tensor with positional embeddings to be applied. freqs_cis (torch.Tensor): Precomputed complex exponential values for positional embeddings.
Returns: torch.Tensor: Tensor with rotary embeddings applied.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
precompute_freqs_cis
Precomputes frequency-based complex exponential values for rotary positional embeddings.
Args: args (ModelConfig): Model arguments containing positional embedding parameters.
Returns: torch.Tensor: Precomputed complex exponential values for positional embeddings.
Source code in src/llmcompressor/modeling/deepseekv32/model.py
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 | |