llmcompressor.entrypoints.model_free
Modules:
-
helpers– -
microscale– -
process– -
reindex_fused_weights– -
save_utils–
Functions:
-
model_free_ptq–Quantize a model without the need for a model definition. This function
model_free_ptq
model_free_ptq(
model_stub: str | PathLike,
save_directory: str | PathLike,
scheme: QuantizationScheme | str,
ignore: Iterable[str] = tuple(),
max_workers: int = 1,
device: Optional[device | str] = None,
converter: Converter | None = None,
)
Quantize a model without the need for a model definition. This function operates on a model stub or folder containing weights saved in safetensors files.
For microscale schemes (NVFP4, MXFP4), fused weight sets (q/k/v, gate/up) are handled correctly even when split across shards. Each shard job receives a precomputed inverse_weight_map specifying exactly which tensors to load from which files — enabling true partial reads with no runtime discovery and no redundant tensor reads.
Parameters:
-
model_stub(str | PathLike) –huggingface model hub or path to local weights files
-
save_directory(str | PathLike) –directory to save quantized weights to
-
scheme(QuantizationScheme | str) –weight quantization scheme or preset scheme name
-
ignore(Iterable[str], default:tuple()) –modules to ignore. Modules ending with "norm" are automatically ignored
-
max_workers(int, default:1) –number of worker threads to process files with
-
device(Optional[device | str], default:None) –gpu device to accelerate quantization with
-
converter(Converter | None, default:None) –optional converter to apply to the checkpoint to convert it to compressed-tensors format before running model-free PTQ