llmcompressor.modifiers.quantization.calibration
Functions:
-
calibrate_input_hook–Hook to calibrate input activations by accumulating statistics in the observer.
-
calibrate_output_hook–Hook to calibrate output activations by accumulating statistics in the observer.
-
freeze_module_quantization–deletes observers when calibration is complete.
-
get_modules–Extract all modules from parent modules and return a deduplicated list
-
initialize_observer–Initialize observer module and attach as submodule.
-
observe–Run observers to accumulate statistics on modules.
-
update_qparams–Compute quantization parameters from observer statistics and store on module.
calibrate_input_hook
Hook to calibrate input activations by accumulating statistics in the observer.
Source code in src/llmcompressor/modifiers/quantization/calibration.py
calibrate_output_hook
Hook to calibrate output activations by accumulating statistics in the observer.
Source code in src/llmcompressor/modifiers/quantization/calibration.py
freeze_module_quantization
deletes observers when calibration is complete.
apply to full model with model.apply(freeze_module_quantization)
Parameters:
-
module(Module) –module to freeze quantization for
Source code in src/llmcompressor/modifiers/quantization/calibration.py
get_modules
Extract all modules from parent modules and return a deduplicated list preserving iteration order.
This is critical for DDP: all ranks must process modules in the same order to avoid NCCL deadlocks when collective operations (e.g., all_reduce) are called during observer synchronization.
Parameters:
-
parents(Iterable[Module]) –iterable of parent modules
Returns:
-
list[Module]–deduplicated list of all modules in iteration order
Source code in src/llmcompressor/modifiers/quantization/calibration.py
initialize_observer
Initialize observer module and attach as submodule. The name of the observer is fetched from the quantization_args. The name is then used to load the observer from the registry and attached to the module. The name of the observer uses the base_name provided.
This function always initializes memoryless observers for weights
Parameters:
-
module(Module) –torch.nn.Module that the observer is being attached to
-
base_name(str) –str used to name the observer attribute
Source code in src/llmcompressor/modifiers/quantization/calibration.py
observe
Run observers to accumulate statistics on modules. Must be called before update_qparams.
Parameters:
-
module(Module | Iterable[Module]) –module or iterable of modules with observer attributes
-
base_name(str) –substring used to fetch the observer and value to observe
Source code in src/llmcompressor/modifiers/quantization/calibration.py
update_qparams
update_qparams(
module: Module | Iterable[Module],
base_name: str | Iterable[str],
only_update_onload: bool = False,
)
Compute quantization parameters from observer statistics and store on module.
For dynamic quantization, scale/zp updates are skipped (scale/zp are computed at inference time). For non-TENSOR_GROUP strategies, global_scale is None and naturally skipped.
:only_update_onload: option to only update the onloaded value, useful when we want to do a temporary update or in DDP situations where we want only want one rank to update the offload+onload to avoid multiple writes to the offload (rest just update onload)
Parameters:
-
module(Module | Iterable[Module]) –torch.nn.Module with attached observer (or iterable of modules)
-
base_name(str | Iterable[str]) –substring used to fetch the observer, scales, and zp. Can be a string or iterable of strings.