llmcompressor.modeling.qwen3_5_moe
Classes:
-
CalibrationQwen3_5MoeSparseMoeBlock–Calibration version of Qwen3_5MoeSparseMoeBlock that unfuses 3D expert
-
SequentialQwen3_5MoeExperts–Unfuses 3D expert parameter tensors into individual Qwen3_5MoeMLP modules
CalibrationQwen3_5MoeSparseMoeBlock
CalibrationQwen3_5MoeSparseMoeBlock(
original: Qwen3_5MoeSparseMoeBlock,
config,
calibrate_all_experts: bool = True,
)
Bases: MoECalibrationModule
Calibration version of Qwen3_5MoeSparseMoeBlock that unfuses 3D expert parameters into individual MLP modules (nn.Linear) so they can be individually quantized. Sends all tokens to all experts during calibration.
is_permanent = True because the unfused structure must persist for quantization to target the individual nn.Linear expert weights.
Source code in src/llmcompressor/modeling/qwen3_5_moe.py
SequentialQwen3_5MoeExperts
Bases: ModuleList
Unfuses 3D expert parameter tensors into individual Qwen3_5MoeMLP modules so that each expert's weights are nn.Linear and can be targeted by quantization with targets="Linear".