llmcompressor.pytorch.utils
Generic code used as utilities and helpers for PyTorch
Modules:
-
helpers–Utility / helper functions
-
sparsification–Helper functions for retrieving information related to model sparsification
-
sparsification_info–
Classes:
-
ModuleSparsificationInfo–Helper class for providing information related to torch Module parameters
Functions:
-
get_quantized_layers–:param module: the module to get the quantized layers from
-
tensor_sparsity–:param tens: the tensor to calculate the sparsity for
-
tensors_module_forward–Default function for calling into a model with data for a forward execution.
-
tensors_to_device–Default function for putting a tensor or collection of tensors to the proper device.
-
tensors_to_precision–:param tensors: the tensors to change the precision of
ModuleSparsificationInfo
Helper class for providing information related to torch Module parameters and the amount of sparsification applied. Includes information for pruning and quantization
Parameters:
-
module(Module) –torch Module to analyze
-
state_dict(Optional[Dict[str, Tensor]], default:None) –optional state_dict to analyze in place of the torch model. This is used when analyzing an FSDP model, where the full weights may not be accessible
Attributes:
-
params_quantized(int) –:return: number of parameters across quantized layers
-
params_quantized_percent(float) –:return: percentage of parameters that have been quantized
-
params_sparse(int) –:return: total number of sparse (0) trainable parameters in the model
-
params_sparse_percent(float) –:return: percent of sparsified parameters in the entire model
-
params_total(int) –:return: total number of trainable parameters in the model
Source code in src/llmcompressor/pytorch/utils/sparsification.py
params_quantized
property
Returns:
-
int–number of parameters across quantized layers
params_quantized_percent
property
Returns:
-
float–percentage of parameters that have been quantized
params_sparse
property
Returns:
-
int–total number of sparse (0) trainable parameters in the model
params_sparse_percent
property
Returns:
-
float–percent of sparsified parameters in the entire model
get_quantized_layers
Parameters:
-
module(Module) –the module to get the quantized layers from
Returns:
-
list[tuple[str, Module]]–a list containing the names and modules of the quantized layers (Embedding, Linear, Conv2d, Conv3d)
Source code in src/llmcompressor/pytorch/utils/helpers.py
tensor_sparsity
Parameters:
-
tens(Tensor) –the tensor to calculate the sparsity for
-
dim(None | int | list[int] | tuple[int, ...], default:None) –the dimension(s) to split the calculations over; ex, can split over batch, channels, or combos
Returns:
-
Tensor–the sparsity of the input tens, ie the fraction of numbers that are zero
Source code in src/llmcompressor/pytorch/utils/helpers.py
tensors_module_forward
tensors_module_forward(
tensors: Tensor
| Iterable[Tensor]
| Mapping[Any, Tensor],
module: Module,
check_feat_lab_inp: bool = True,
) -> Any
Default function for calling into a model with data for a forward execution. Returns the model result. Note, if an iterable the features to be passed into the model are considered to be at index 0 and other indices are for labels.
Supported use cases: single tensor, iterable with first tensor taken as the features to pass into the model
Parameters:
-
tensors(Tensor | Iterable[Tensor] | Mapping[Any, Tensor]) –the data to be passed into the model, if an iterable the features to be passed into the model are considered to be at index 0 and other indices are for labels
-
module(Module) –the module to pass the data into
-
check_feat_lab_inp(bool, default:True) –True to check if the incoming tensors looks like it's made up of features and labels ie a tuple or list with 2 items (typical output from a data loader) and will call into the model with just the first element assuming it's the features False to not check
Returns:
-
Any–the result of calling into the model for a forward pass
Source code in src/llmcompressor/pytorch/utils/helpers.py
tensors_to_device
tensors_to_device(
tensors: Tensor | Iterable[Tensor] | dict[Any, Tensor],
device: str,
) -> Tensor | Iterable[Tensor] | dict[Any, Tensor]
Default function for putting a tensor or collection of tensors to the proper device. Returns the tensor references after being placed on the proper device.
Supported use cases: - single tensor - Dictionary of single tensors - Dictionary of iterable of tensors - Dictionary of dictionary of tensors - Iterable of single tensors - Iterable of iterable of tensors - Iterable of dictionary of tensors
Parameters:
-
tensors(Tensor | Iterable[Tensor] | dict[Any, Tensor]) –the tensors or collection of tensors to put onto a device
-
device(str) –the string representing the device to put the tensors on, ex: 'cpu', 'cuda', 'cuda:1'
Returns:
-
Tensor | Iterable[Tensor] | dict[Any, Tensor]–the tensors or collection of tensors after being placed on the device
Source code in src/llmcompressor/pytorch/utils/helpers.py
tensors_to_precision
tensors_to_precision(
tensors: Tensor | Iterable[Tensor] | dict[Any, Tensor],
full_precision: bool,
) -> Tensor | Iterable[Tensor] | dict[Any, Tensor]
Parameters:
-
tensors(Tensor | Iterable[Tensor] | dict[Any, Tensor]) –the tensors to change the precision of
-
full_precision(bool) –True for full precision (float 32) and False for half (float 16)
Returns:
-
Tensor | Iterable[Tensor] | dict[Any, Tensor]–the tensors converted to the desired precision