llmcompressor.modifiers.pruning.sparsegpt.base
Classes:
-
SparseGPTModifier–Modifier for applying the one-shot SparseGPT algorithm to a model
SparseGPTModifier
Bases: SparsityModifierBase
Modifier for applying the one-shot SparseGPT algorithm to a model
Sample yaml:
test_stage:
obcq_modifiers:
SparseGPTModifier:
sparsity: 0.5
mask_structure: "2:4"
dampening_frac: 0.001
block_size: 128
targets: ['Linear']
ignore: ['re:.*lm_head']
Lifecycle:
- on_initialize
- register_hook(module, calibrate_module, "forward")
- on_sequential_batch_end
- sparsify_weight
- on_finalize
- remove_hooks()
Parameters:
-
sparsity–Sparsity to compress model to
-
sparsity_profile–Can be set to 'owl' to use Outlier Weighed Layerwise Sparsity (OWL), more information can be found in the paper https://arxiv.org/pdf/2310.05175
-
mask_structure–String to define the structure of the mask to apply. Must be of the form N:M where N, M are integers that define a custom block shape. Defaults to 0:0 which represents an unstructured mask.
-
owl_m–Number of outliers to use for OWL
-
owl_lmbda–Lambda value to use for OWL
-
block_size–Used to determine number of columns to compress in one pass
-
dampening_frac–Amount of dampening to apply to H, as a fraction of the diagonal norm
-
preserve_sparsity_mask–Whether or not to preserve the sparsity mask during when applying sparsegpt, this becomes useful when starting from a previously pruned model, defaults to False.
-
offload_hessians–Set to True for decreased memory usage but increased runtime.
-
targets–list of layer names to compress during SparseGPT, or 'ALL' to compress every layer in the model
-
ignore–optional list of module class names or submodule names to not quantize even if they match a target. Defaults to empty list.
Methods:
-
calibrate_module–Calibration hook used to accumulate the hessian of the input to the module
-
compress_modules–Sparsify modules which have been calibrated
calibrate_module
Calibration hook used to accumulate the hessian of the input to the module
Parameters:
-
module(Module) –module being calibrated
-
args(tuple[Tensor, ...]) –inputs to the module, the first element of which is the canonical input
-
_output(Tensor) –uncompressed module output, unused
Source code in src/llmcompressor/modifiers/pruning/sparsegpt/base.py
compress_modules
Sparsify modules which have been calibrated