Skip to content
LLM Compressor Docs
lifecycle
Search
vllm-project/llm-compressor
LLM Compressor Docs
vllm-project/llm-compressor
Home
Why use LLM Compressor?
Compressing your model, step-by-step
Compressing your model, step-by-step
Choosing your model
Choosing the right compression scheme
Choosing the right compression algorithm
Choosing a dataset
Compressing your model
Deploying with vLLM
Getting started
Getting started
Installing LLM Compressor
Key Models
Key Models
DeepSeek V4
DeepSeek V4
NVFP4 + FP8 Example
Kimi-K2.6
Kimi-K2.6
NVFP4 Example
FP8 Block Example
Qwen3.5
Qwen3.5
NVFP4A16 VL Example
NVFP4 MoE Example
Qwen3.6
Qwen3.6
NVFP4 MoE Example
Gemma 4
Gemma 4
NVFP4 Example
FP8 Block Example
NVFP4 MoE Example
Llama 4
Llama 4
FP8 Example
Mistral Large 3
Mistral Large 3
FP8 Example
User Guides
User Guides
Entrypoints
Entrypoints
oneshot
model-free-ptq
convert_checkpoint
Compression Schemes
Observers
Big Models and Distributed Support
Big Models and Distributed Support
Model Loading
Sequential Onloading
Distributed Oneshot
Saving a Compressed Model
Memory Requirements
Runtime Performance
Developer Guides
Developer Guides
Adding a New Modifier
Adding a New Observer
Adding MoE Calibration Support for a New Model
Examples
Examples
`AutoRound` Quantization
AWQ Quantization
Big Model Quantization with Sequential Onloading
Disk Offloading
iMatrix Importance-Weighted Quantization
Model-free Quantization
Multimodal Audio Model Quantization
Multimodal Vision-Language Quantization
Attention Quantization in LLM Compressor
KV Cache Quantization
Non-uniform Quantization
`int4` Weight Quantization
`fp4` Quantization with NVFP4
`fp8` Weight and Activation Quantization
`int8` Weight and Activation Quantization
Quantizing Mixture of Experts (MoE) models
Applying Transforms to Improve Quantization Accuracy
Experimental
Experimental
Attention Quantization in LLM Compressor
Mistral-format model compression (experimental)
Developer
Developer
API Reference
API Reference
llmcompressor
llmcompressor
args
args
dataset_arguments
dataset_arguments
model_arguments
model_arguments
recipe_arguments
recipe_arguments
utils
utils
core
core
events
events
event
event
lifecycle
lifecycle
model_layer
model_layer
session
session
session_functions
session_functions
state
state
datasets
datasets
utils
utils
entrypoints
entrypoints
model_free
model_free
helpers
helpers
lifecycle
lifecycle
microscale
microscale
process
process
reindex_fused_weights
reindex_fused_weights
save_utils
save_utils
validate
validate
oneshot
oneshot
utils
utils
logger
logger
modeling
modeling
afmoe
afmoe
deepseek_v3
deepseek_v3
deepseekv32
deepseekv32
config
config
kernel
kernel
model
model
fuse
fuse
gemma4
gemma4
glm4_moe
glm4_moe
glm4_moe_lite
glm4_moe_lite
glm_moe_dsa
glm_moe_dsa
gpt_oss
gpt_oss
granite4
granite4
llama4
llama4
moe_context
moe_context
offset_norm
offset_norm
qwen3_5_moe
qwen3_5_moe
qwen3_moe
qwen3_moe
qwen3_next_moe
qwen3_next_moe
qwen3_vl_moe
qwen3_vl_moe
modifiers
modifiers
autoround
autoround
base
base
awq
awq
experimental
experimental
factory
factory
gptq
gptq
base
base
gptq_quantize
gptq_quantize
interface
interface
logarithmic_equalization
logarithmic_equalization
base
base
modifier
modifier
obcq
obcq
sgpt_base
sgpt_base
pruning
pruning
constant
constant
base
base
helpers
helpers
magnitude
magnitude
base
base
sparsegpt
sparsegpt
base
base
sgpt_base
sgpt_base
sgpt_sparsify
sgpt_sparsify
utils
utils
pytorch
pytorch
layer_mask
layer_mask
mask_factory
mask_factory
wanda
wanda
base
base
wanda_sparsify
wanda_sparsify
quantization
quantization
calibration
calibration
gptq
gptq
group_size_validation
group_size_validation
quantization
quantization
base
base
mixin
mixin
smoothquant
smoothquant
base
base
utils
utils
transform
transform
awq
awq
base
base
dynamic_mappings
dynamic_mappings
mappings
mappings
imatrix
imatrix
base
base
quip
quip
base
base
smoothquant
smoothquant
base
base
utils
utils
spinquant
spinquant
base
base
mappings
mappings
norm_mappings
norm_mappings
utils
utils
constants
constants
helpers
helpers
hooks
hooks
pytorch_helpers
pytorch_helpers
observers
observers
base
base
helpers
helpers
imatrix
imatrix
min_max
min_max
mse
mse
pipelines
pipelines
basic
basic
pipeline
pipeline
cache
cache
data_free
data_free
pipeline
pipeline
independent
independent
pipeline
pipeline
registry
registry
sequential
sequential
ast_helpers
ast_helpers
ast_utils
ast_utils
auto_wrapper
auto_wrapper
control_flow_analyzer
control_flow_analyzer
name_analyzer
name_analyzer
helpers
helpers
pipeline
pipeline
transformers_helpers
transformers_helpers
pytorch
pytorch
model_load
model_load
helpers
helpers
utils
utils
helpers
helpers
sparsification
sparsification
sparsification_info
sparsification_info
configs
configs
helpers
helpers
module_sparsification_info
module_sparsification_info
recipe
recipe
metadata
metadata
recipe
recipe
utils
utils
sentinel
sentinel
transformers
transformers
compression
compression
compressed_tensors_utils
compressed_tensors_utils
sparsity_helpers
sparsity_helpers
data
data
base
base
c4
c4
cnn_dailymail
cnn_dailymail
custom
custom
data_helpers
data_helpers
evolcodealpaca
evolcodealpaca
flickr_30k
flickr_30k
gsm8k
gsm8k
open_platypus
open_platypus
peoples_speech
peoples_speech
ultrachat_200k
ultrachat_200k
wikitext
wikitext
tracing
tracing
debug
debug
utils
utils
helpers
helpers
utils
utils
dev
dev
dist
dist
helpers
helpers
metric_logging
metric_logging
pytorch
pytorch
module
module
utils
utils
transformers
transformers
FAQ
FAQ
Frequently Asked Questions
Home
API Reference
llmcompressor
entrypoints
model_free
lifecycle
llmcompressor.entrypoints.model_free.lifecycle
Back to top