llmcompressor.utils.helpers
General utility helper functions. Common functions for interfacing with python primitives and directories/files.
Functions:
-
DisableQuantization–Disable quantization during forward passes after applying a quantization config
-
calibration_forward_context–Context in which all calibration forward passes should occur.
-
disable_cache–Temporarily disable the key-value cache for transformer models. Used to prevent
-
disable_hf_kernels–In transformers>=4.50.0, some module forward methods may be
-
disable_lm_head–Disable the lm_head of a model by moving it to the meta device. This function
-
eval_context–Disable pytorch training mode for the given module
-
import_from_path–Import the module and the name of the function/class separated by :
DisableQuantization
Disable quantization during forward passes after applying a quantization config
Source code in src/llmcompressor/utils/helpers.py
calibration_forward_context
Context in which all calibration forward passes should occur.
- Remove gradient calculations
- Disable the KV cache
- Disable train mode and enable eval mode
- Disable hf kernels which could bypass hooks
- Disable lm head (input and weights can still be calibrated, output will be meta)
Source code in src/llmcompressor/utils/helpers.py
disable_cache
Temporarily disable the key-value cache for transformer models. Used to prevent excess memory use in one-shot cases where the model only performs the prefill phase and not the generation phase.
Example:
model = AutoModel.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0") input = torch.randint(0, 32, size=(1, 32)) with disable_cache(model): ... output = model(input)
Source code in src/llmcompressor/utils/helpers.py
disable_hf_kernels
In transformers>=4.50.0, some module forward methods may be replaced by calls to hf hub kernels. This has the potential to bypass hooks added by LLM Compressor
Source code in src/llmcompressor/utils/helpers.py
disable_lm_head
Disable the lm_head of a model by moving it to the meta device. This function does not untie parameters and restores the model proper loading upon exit
Source code in src/llmcompressor/utils/helpers.py
eval_context
Disable pytorch training mode for the given module
Source code in src/llmcompressor/utils/helpers.py
import_from_path
Import the module and the name of the function/class separated by : Examples: path = "/path/to/file.py:func_or_class_name" path = "/path/to/file:focn" path = "path.to.file:focn"
Parameters:
-
path(str) –path including the file path and object name