llmcompressor.modeling.deepseekv32.config
Classes:
-
ModelConfig–Data class for defining model arguments and hyperparameters.
ModelConfig
ModelConfig(
max_batch_size: int = 8,
max_seq_len: int = 4096 * 4,
dtype: Literal["bf16", "fp8"] = "bf16",
scale_fmt: Optional[str] = None,
vocab_size: int = 102400,
dim: int = 2048,
inter_dim: int = 10944,
moe_inter_dim: int = 1408,
n_layers: int = 27,
n_dense_layers: int = 1,
n_heads: int = 16,
n_routed_experts: int = 64,
n_shared_experts: int = 2,
n_activated_experts: int = 6,
n_expert_groups: int = 1,
n_limited_groups: int = 1,
score_func: Literal["softmax", "sigmoid"] = "softmax",
route_scale: float = 1.0,
q_lora_rank: int = 0,
kv_lora_rank: int = 512,
qk_nope_head_dim: int = 128,
qk_rope_head_dim: int = 64,
v_head_dim: int = 128,
original_seq_len: int = 4096,
rope_theta: float = 10000.0,
rope_factor: float = 40,
beta_fast: int = 32,
beta_slow: int = 1,
mscale: float = 1.0,
index_n_heads: int = 64,
index_head_dim: int = 128,
index_topk: int = 2048,
**kwargs,
)
Bases: PretrainedConfig
Data class for defining model arguments and hyperparameters.
Attributes: max_batch_size (int): Maximum batch size. max_seq_len (int): Maximum sequence length. dtype (Literal["bf16", "fp8"]): Data type for computations. scale_fmt (Optional[str]): Format for quantization scale. vocab_size (int): Vocabulary size. dim (int): Model dimension. inter_dim (int): Intermediate dimension for MLP layers. moe_inter_dim (int): Intermediate dimension for MoE layers. n_layers (int): Number of transformer layers. n_dense_layers (int): Number of dense layers in the model. n_heads (int): Number of attention heads. n_routed_experts (int): Number of routed experts for MoE layers. n_shared_experts (int): Number of shared experts for MoE layers. n_activated_experts (int): Number of activated experts in MoE layers. n_expert_groups (int): Number of expert groups. n_limited_groups (int): Number of limited groups for MoE routing. score_func (Literal["softmax", "sigmoid"]): Scoring function for MoE routing. route_scale (float): Scaling factor for routing scores. q_lora_rank (int): LoRA rank for query projections. kv_lora_rank (int): LoRA rank for key-value projections. qk_nope_head_dim (int): Dimension for query-key projections without positional embeddings. qk_rope_head_dim (int): Dimension for query-key projections with rotary embeddings. v_head_dim (int): Dimension for value projections. original_seq_len (int): Original sequence length. rope_theta (float): Base for rotary positional encoding. rope_factor (float): Scaling factor for extended sequence lengths. beta_fast (int): Fast beta correction factor. beta_slow (int): Slow beta correction factor. mscale (float): Scaling factor for extended attention. index_head_dim (int): Dimension for index head. index_topk (int): Top-k for index head.
Source code in src/llmcompressor/modeling/deepseekv32/config.py
94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 | |