Key Models
The following models are among the most commonly used with LLM Compressor: Llama 4, Qwen3.5, Qwen3.6, Kimi-K2, and Mistral Large 3. Each model page contains quantization examples with tested configurations and recommended parameters.
-
DeepSeek V4
DeepSeek V4 with HCA, CSA, and mHC, quantized to FP8 + NVFP4
-
Qwen3.5
Qwen3.5 vision-language and sparse MoE models.
-
Qwen3.6
Qwen3.6-35B-A3B sparse MoE model.
-
Kimi-K2.6
Moonshot AI's latest multimodal agentic model.
-
Gemma 4
Google's latest multimodal model.
-
Llama 4
Meta's Llama 4 Scout multimodal model.
-
Mistral Large 3
Mistral's 675B parameter model.