Quantization Quantization trades off model precision for smaller memory footprint, allowing large models to be run on a wider range of devices. Contents: Supported Hardware AutoAWQ BitsAndBytes BitBLAS GGUF GPTQModel INT4 W4A16 INT8 W8A8 FP8 W8A8 NVIDIA TensorRT Model Optimizer AMD Quark Quantized KV Cache TorchAO