llmcompressor.modeling.glm_moe_dsa

Classes:

CalibrationGlmMoeDsaMoE –

Calibration version of GlmMoeDsaMoE that unpacks experts for sequential

CalibrationGlmMoeDsaMoE

CalibrationGlmMoeDsaMoE(
    original: GlmMoeDsaMoE,
    config: GlmMoeDsaConfig,
    calibrate_all_experts: bool = True,
)

Bases: MoECalibrationModule

Calibration version of GlmMoeDsaMoE that unpacks experts for sequential processing.

This module: 1. Unpacks the packed expert weights (3D -> 2D) for calibration 2. Optionally sends all tokens to all experts during calibration 3. Stays in unpacked form (permanent) for vLLM compatibility

Subclasses (e.g. :class:CalibrationGlm4MoeLiteMoE) override :meth:_get_num_experts and :meth:_make_experts to handle model-specific config fields and MLP classes, while inheriting the shared routing and forward logic.

Source code in src/llmcompressor/modeling/glm_moe_dsa.py

def __init__(
    self,
    original: GlmMoeDsaMoE,
    config: GlmMoeDsaConfig,
    calibrate_all_experts: bool = True,
):
    super().__init__()
    self.top_k = config.num_experts_per_tok
    self.num_experts = self._get_num_experts(config)
    self.n_routed_experts = config.n_routed_experts
    self.n_group = config.n_group
    self.topk_group = config.topk_group
    self.norm_topk_prob = config.norm_topk_prob
    self.routed_scaling_factor = config.routed_scaling_factor

    self.experts = self._make_experts(config, original.experts)
    self.gate = original.gate
    self.shared_experts = original.shared_experts
    self.calibrate_all_experts = calibrate_all_experts