llmcompressor.transformers.compression.compressed_tensors_utils
Functions:
-
modify_save_pretrained–Overrides a PreTrainedModel's save_pretrained() method with a wrapped version that
get_model_compressor
get_model_compressor(
model: Module,
sparsity_config: SparsityCompressionConfig
| None = None,
quantization_format: str | None = None,
save_compressed: bool = True,
skip_sparsity_compression_stats: bool = True,
disable_sparse_compression: bool = False,
)
Obtain the compressor based on the config and the quantization_format
Parameters:
-
model(Module) –torch model
-
sparsify_config–Sparsity Compression config
-
quantization_format(str | None, default:None) –Format that the model was quantized to. if not provivided, will be extrapolated from
infer_quantization_format -
save_compressed(bool, default:True) –boolean representing to save in a compressed format
-
skip_sparsity_compression_stats(bool, default:True) –bool allowing compression stats on std out
-
disable_sparse_compression(bool, default:False) –bool to skip sparse compression
Source code in src/llmcompressor/transformers/compression/compressed_tensors_utils.py
modify_save_pretrained
Overrides a PreTrainedModel's save_pretrained() method with a wrapped version that supports compression. The new save_pretrained function performs the following saving operations:
- Saves the model state, potentially in a compressed format
- Saves the recipe, appending any current recipes to existing recipe files
- Copies any necessary python files from the model cache
Source code in src/llmcompressor/transformers/compression/compressed_tensors_utils.py
update_and_save_recipe
Save a recipe ontop of any existing recipe files located at model_stub
Parameters:
-
model_stub(str) –path to existing model or model stub which may contain an existing recipe
-
save_directory(str) –path to save combined existing recipe and current recipe