Skip to content

LLM Compressor Docs

Key Models

vllm-project/llm-compressor

LLM Compressor Docs

vllm-project/llm-compressor

Home
Why use LLM Compressor?
Compressing your model, step-by-step
Compressing your model, step-by-step
Deploying with vLLM
Getting started
Getting started
- Installing LLM Compressor
Key Models
Key Models
- DeepSeek V4
  DeepSeek V4
  - NVFP4 + FP8 Example
- Kimi-K2.6
  Kimi-K2.6
  - NVFP4 Example
  - FP8 Block Example
- Qwen3.5
  Qwen3.5
  - NVFP4A16 VL Example
  - NVFP4 MoE Example
- Qwen3.6
  Qwen3.6
  - NVFP4 MoE Example
- Gemma 4
  Gemma 4
- Llama 4
  Llama 4
  - FP8 Example
- Mistral Large 3
  Mistral Large 3
  - FP8 Example
User Guides
User Guides
- Entrypoints
  Entrypoints
- Compression Schemes
- Observers
- Big Models and Distributed Support
  Big Models and Distributed Support
- Saving a Compressed Model
- Memory Requirements
- Runtime Performance
Developer Guides
Developer Guides
Examples
Examples
Experimental
Experimental
- Attention Quantization in LLM Compressor
- Mistral-format model compression (experimental)
Developer
Developer
API Reference
API Reference
- llmcompressor
  llmcompressor
  - args
    
    args
    
    dataset_arguments
    
    dataset_arguments
    
    model_arguments
    
    model_arguments
    
    recipe_arguments
    
    recipe_arguments
    
    utils
    
    utils
  - core
    
    core
    
    events
    
    events
    
    event
    
    event
    
    lifecycle
    
    lifecycle
    
    model_layer
    
    model_layer
    
    session
    
    session
    
    session_functions
    
    session_functions
    
    state
    
    state
  - datasets
    
    datasets
    
    utils
    
    utils
  - entrypoints
    
    entrypoints
    
    model_free
    
    model_free
    
    helpers
    
    helpers
    
    lifecycle
    
    lifecycle
    
    microscale
    
    microscale
    
    process
    
    process
    
    reindex_fused_weights
    
    reindex_fused_weights
    
    save_utils
    
    save_utils
    
    validate
    
    validate
    
    oneshot
    
    oneshot
    
    utils
    
    utils
  - logger
    
    logger
  - modeling
    
    modeling
    
    deepseekv32
    
    deepseekv32
    
    config
    
    config
    
    kernel
    
    kernel
    
    model
    
    model
    
    fuse
    
    fuse
    
    offset_norm
    
    offset_norm
  - modifiers
    
    modifiers
    
    autoround
    
    autoround
    
    base
    
    base
    
    awq
    
    awq
    
    experimental
    
    experimental
    
    factory
    
    factory
    
    gptq
    
    gptq
    
    base
    
    base
    
    gptq_quantize
    
    gptq_quantize
    
    interface
    
    interface
    
    logarithmic_equalization
    
    logarithmic_equalization
    
    base
    
    base
    
    modifier
    
    modifier
    
    obcq
    
    obcq
    
    sgpt_base
    
    sgpt_base
    
    pruning
    
    pruning
    
    constant
    
    constant
    
    base
    
    base
    
    helpers
    
    helpers
    
    magnitude
    
    magnitude
    
    base
    
    base
    
    sparsegpt
    
    sparsegpt
    
    base
    
    base
    
    sgpt_base
    
    sgpt_base
    
    sgpt_sparsify
    
    sgpt_sparsify
    
    utils
    
    utils
    
    pytorch
    
    pytorch
    
    layer_mask
    
    layer_mask
    
    mask_factory
    
    mask_factory
    
    wanda
    
    wanda
    
    base
    
    base
    
    wanda_sparsify
    
    wanda_sparsify
    
    quantization
    
    quantization
    
    calibration
    
    calibration
    
    gptq
    
    gptq
    
    group_size_validation
    
    group_size_validation
    
    quantization
    
    quantization
    
    base
    
    base
    
    mixin
    
    mixin
    
    smoothquant
    
    smoothquant
    
    base
    
    base
    
    utils
    
    utils
    
    transform
    
    transform
    
    awq
    
    awq
    
    base
    
    base
    
    dynamic_mappings
    
    dynamic_mappings
    
    mappings
    
    mappings
    
    imatrix
    
    imatrix
    
    base
    
    base
    
    quip
    
    quip
    
    base
    
    base
    
    smoothquant
    
    smoothquant
    
    base
    
    base
    
    utils
    
    utils
    
    spinquant
    
    spinquant
    
    base
    
    base
    
    mappings
    
    mappings
    
    norm_mappings
    
    norm_mappings
    
    utils
    
    utils
    
    constants
    
    constants
    
    helpers
    
    helpers
    
    hooks
    
    hooks
    
    pytorch_helpers
    
    pytorch_helpers
  - observers
    
    observers
    
    base
    
    base
    
    helpers
    
    helpers
    
    imatrix
    
    imatrix
    
    min_max
    
    min_max
    
    mse
    
    mse
  - pipelines
    
    pipelines
    
    basic
    
    basic
    
    pipeline
    
    pipeline
    
    cache
    
    cache
    
    data_free
    
    data_free
    
    pipeline
    
    pipeline
    
    independent
    
    independent
    
    pipeline
    
    pipeline
    
    registry
    
    registry
    
    sequential
    
    sequential
    
    ast_helpers
    
    ast_helpers
    
    ast_utils
    
    ast_utils
    
    auto_wrapper
    
    auto_wrapper
    
    control_flow_analyzer
    
    control_flow_analyzer
    
    name_analyzer
    
    name_analyzer
    
    helpers
    
    helpers
    
    pipeline
    
    pipeline
    
    transformers_helpers
    
    transformers_helpers
  - pytorch
    
    pytorch
    
    model_load
    
    model_load
    
    helpers
    
    helpers
    
    utils
    
    utils
    
    helpers
    
    helpers
    
    sparsification
    
    sparsification
    
    sparsification_info
    
    sparsification_info
    
    configs
    
    configs
    
    helpers
    
    helpers
    
    module_sparsification_info
    
    module_sparsification_info
  - recipe
    
    recipe
    
    metadata
    
    metadata
    
    recipe
    
    recipe
    
    utils
    
    utils
  - sentinel
    
    sentinel
  - transformers
    
    transformers
    
    compression
    
    compression
    
    compressed_tensors_utils
    
    compressed_tensors_utils
    
    sparsity_helpers
    
    sparsity_helpers
    
    data
    
    data
    
    base
    
    base
    
    c4
    
    c4
    
    cnn_dailymail
    
    cnn_dailymail
    
    custom
    
    custom
    
    data_helpers
    
    data_helpers
    
    evolcodealpaca
    
    evolcodealpaca
    
    flickr_30k
    
    flickr_30k
    
    gsm8k
    
    gsm8k
    
    open_platypus
    
    open_platypus
    
    peoples_speech
    
    peoples_speech
    
    ultrachat_200k
    
    ultrachat_200k
    
    wikitext
    
    wikitext
    
    tracing
    
    tracing
    
    debug
    
    debug
    
    utils
    
    utils
    
    helpers
    
    helpers
  - utils
    
    utils
    
    dev
    
    dev
    
    dist
    
    dist
    
    helpers
    
    helpers
    
    metric_logging
    
    metric_logging
    
    pytorch
    
    pytorch
    
    module
    
    module
    
    utils
    
    utils
    
    transformers
    
    transformers
FAQ
FAQ
- Frequently Asked Questions

Key Models

The following models are among the most commonly used with LLM Compressor: Llama 4, Qwen3.5, Qwen3.6, Kimi-K2, and Mistral Large 3. Each model page contains quantization examples with tested configurations and recommended parameters.

DeepSeek V4

DeepSeek V4 with HCA, CSA, and mHC, quantized to FP8 + NVFP4

DeepSeek V4
Qwen3.5

Qwen3.5 vision-language and sparse MoE models.

Qwen3.5
Qwen3.6

Qwen3.6-35B-A3B sparse MoE model.

Qwen3.6
Kimi-K2.6

Moonshot AI's latest multimodal agentic model.

Kimi-K2.6
Gemma 4

Google's latest multimodal model.

Gemma 4
Llama 4

Meta's Llama 4 Scout multimodal model.

Llama 4
Mistral Large 3

Mistral's 675B parameter model.

Mistral Large 3