llmcompressor.pytorch.utils

Generic code used as utilities and helpers for PyTorch

Modules:

helpers –

Utility / helper functions
sparsification –

Helper functions for retrieving information related to model sparsification
sparsification_info –

Classes:

ModuleSparsificationInfo –

Helper class for providing information related to torch Module parameters

Functions:

get_quantized_layers –

:param module: the module to get the quantized layers from
tensor_sparsity –

:param tens: the tensor to calculate the sparsity for
tensors_module_forward –

Default function for calling into a model with data for a forward execution.
tensors_to_device –

Default function for putting a tensor or collection of tensors to the proper device.
tensors_to_precision –

:param tensors: the tensors to change the precision of

ModuleSparsificationInfo

ModuleSparsificationInfo(
    module: Module,
    state_dict: Optional[Dict[str, Tensor]] = None,
)

Helper class for providing information related to torch Module parameters and the amount of sparsification applied. Includes information for pruning and quantization

Parameters:

module (Module) –

torch Module to analyze
state_dict (Optional[Dict[str, Tensor]], default: None ) –

optional state_dict to analyze in place of the torch model. This is used when analyzing an FSDP model, where the full weights may not be accessible

Attributes:

params_quantized (int) –

:return: number of parameters across quantized layers
params_quantized_percent (float) –

:return: percentage of parameters that have been quantized
params_sparse (int) –

:return: total number of sparse (0) trainable parameters in the model
params_sparse_percent (float) –

:return: percent of sparsified parameters in the entire model
params_total (int) –

:return: total number of trainable parameters in the model

Source code in src/llmcompressor/pytorch/utils/sparsification.py

def __init__(
    self, module: Module, state_dict: Optional[Dict[str, torch.Tensor]] = None
):
    self.module = module

    if state_dict is not None:
        # when analyzing an FSDP model, the state_dict does not differentiate
        # between trainable and non-trainable parameters
        # (e.g. it can contain buffers) this means that the
        # self.trainable_parameters may be overestimated
        self.trainable_params = state_dict
    else:
        if hasattr(module, "_hf_hook"):
            self.trainable_params = get_state_dict_offloaded_model(module)
        else:
            self.trainable_params = {
                k: v for k, v in self.module.named_parameters() if v.requires_grad
            }

params_quantized `property`

params_quantized: int

Returns:

int –

number of parameters across quantized layers

params_quantized_percent `property`

params_quantized_percent: float

Returns:

float –

percentage of parameters that have been quantized

params_sparse `property`

params_sparse: int

Returns:

int –

total number of sparse (0) trainable parameters in the model

params_sparse_percent `property`

params_sparse_percent: float

Returns:

float –

percent of sparsified parameters in the entire model

params_total `property`

params_total: int

Returns:

int –

total number of trainable parameters in the model

get_quantized_layers

get_quantized_layers(
    module: Module,
) -> list[tuple[str, Module]]

Parameters:

module (Module) –

the module to get the quantized layers from

Returns:

list[tuple[str, Module]] –

a list containing the names and modules of the quantized layers (Embedding, Linear, Conv2d, Conv3d)

Source code in src/llmcompressor/pytorch/utils/helpers.py

def get_quantized_layers(module: Module) -> list[tuple[str, Module]]:
    """
    :param module: the module to get the quantized layers from
    :return: a list containing the names and modules of the quantized layers
        (Embedding, Linear, Conv2d, Conv3d)
    """

    quantized_layers = []
    for name, mod in module.named_modules():
        if hasattr(mod, "quantization_scheme"):
            weight_scheme = getattr(mod.quantization_scheme, "weights", None)
            if weight_scheme is not None and hasattr(mod, "weight"):
                quantized_layers.append((name, mod))

    return quantized_layers

tensor_sparsity

tensor_sparsity(
    tens: Tensor,
    dim: None | int | list[int] | tuple[int, ...] = None,
) -> Tensor

Parameters:

tens (Tensor) –

the tensor to calculate the sparsity for
dim (None | int | list[int] | tuple[int, ...], default: None ) –

the dimension(s) to split the calculations over; ex, can split over batch, channels, or combos

Returns:

Tensor –

the sparsity of the input tens, ie the fraction of numbers that are zero

Source code in src/llmcompressor/pytorch/utils/helpers.py

def tensor_sparsity(
    tens: Tensor, dim: None | int | list[int] | tuple[int, ...] = None
) -> Tensor:
    """
    :param tens: the tensor to calculate the sparsity for
    :param dim: the dimension(s) to split the calculations over;
        ex, can split over batch, channels, or combos
    :return: the sparsity of the input tens, ie the fraction of numbers that are zero
    """
    if dim is None:
        zeros = (tens.cpu() == 0).sum()
        total = tens.numel()

        return zeros.float() / float(total)

    if isinstance(dim, int):
        dim = [dim]

    if max(dim) >= len(tens.shape):
        raise ValueError(
            "Unsupported dim given of {} in {} for tensor shape {}".format(
                max(dim), dim, tens.shape
            )
        )

    sum_dims = [ind for ind in range(len(tens.shape)) if ind not in dim]
    zeros = (tens == 0).sum(dim=sum_dims) if sum_dims else tens == 0
    total = numpy.prod(
        [tens.shape[ind] for ind in range(len(tens.shape)) if ind not in dim]
    )

    permute_order = sorted(
        ((d, len(dim) - i - 1) for i, d in enumerate(dim)), reverse=True
    )
    permute = [d[1] for d in permute_order]

    if permute != [i for i in range(len(permute))]:
        # need to permute to get desired dimensions at the front
        zeros = zeros.permute(*permute).contiguous()

    return zeros.float() / float(total)

tensors_module_forward

tensors_module_forward(
    tensors: Tensor
    | Iterable[Tensor]
    | Mapping[Any, Tensor],
    module: Module,
    check_feat_lab_inp: bool = True,
) -> Any

Default function for calling into a model with data for a forward execution. Returns the model result. Note, if an iterable the features to be passed into the model are considered to be at index 0 and other indices are for labels.

Supported use cases: single tensor, iterable with first tensor taken as the features to pass into the model

Parameters:

tensors (Tensor | Iterable[Tensor] | Mapping[Any, Tensor]) –

the data to be passed into the model, if an iterable the features to be passed into the model are considered to be at index 0 and other indices are for labels
module (Module) –

the module to pass the data into
check_feat_lab_inp (bool, default: True ) –

True to check if the incoming tensors looks like it's made up of features and labels ie a tuple or list with 2 items (typical output from a data loader) and will call into the model with just the first element assuming it's the features False to not check

Returns:

Any –

the result of calling into the model for a forward pass

Source code in src/llmcompressor/pytorch/utils/helpers.py

def tensors_module_forward(
    tensors: Tensor | Iterable[Tensor] | Mapping[Any, Tensor],
    module: Module,
    check_feat_lab_inp: bool = True,
) -> Any:
    """
    Default function for calling into a model with data for a forward execution.
    Returns the model result.
    Note, if an iterable the features to be passed into the model are considered
    to be at index 0 and other indices are for labels.

    Supported use cases: single tensor,
    iterable with first tensor taken as the features to pass into the model

    :param tensors: the data to be passed into the model, if an iterable the features
        to be passed into the model are considered to be at index 0 and other indices
        are for labels
    :param module: the module to pass the data into
    :param check_feat_lab_inp: True to check if the incoming tensors looks like
        it's made up of features and labels ie a tuple or list with 2 items
        (typical output from a data loader) and will call into the model with just
        the first element assuming it's the features False to not check
    :return: the result of calling into the model for a forward pass
    """
    if isinstance(tensors, (tuple, list)) and len(tensors) == 2 and check_feat_lab_inp:
        # assume if this is a list or tuple of 2 items that it is made up of
        # (features, labels) pass the features into a recursive call for the model
        return tensors_module_forward(tensors[0], module, check_feat_lab_inp=False)

    match tensors:
        case Tensor():
            return module(tensors)

        case Mapping():
            return module(**tensors)

        case Iterable():
            return module(*tensors)

        case _:
            raise ValueError(
                f"unrecognized type for data given of {tensors.__class__.__name__}"
            )

tensors_to_device

tensors_to_device(
    tensors: Tensor | Iterable[Tensor] | dict[Any, Tensor],
    device: str,
) -> Tensor | Iterable[Tensor] | dict[Any, Tensor]

Default function for putting a tensor or collection of tensors to the proper device. Returns the tensor references after being placed on the proper device.

Supported use cases: - single tensor - Dictionary of single tensors - Dictionary of iterable of tensors - Dictionary of dictionary of tensors - Iterable of single tensors - Iterable of iterable of tensors - Iterable of dictionary of tensors

Parameters:

tensors (Tensor | Iterable[Tensor] | dict[Any, Tensor]) –

the tensors or collection of tensors to put onto a device
device (str) –

the string representing the device to put the tensors on, ex: 'cpu', 'cuda', 'cuda:1'

Returns:

Tensor | Iterable[Tensor] | dict[Any, Tensor] –

the tensors or collection of tensors after being placed on the device

Source code in src/llmcompressor/pytorch/utils/helpers.py

def tensors_to_device(
    tensors: Tensor | Iterable[Tensor] | dict[Any, Tensor], device: str
) -> Tensor | Iterable[Tensor] | dict[Any, Tensor]:
    """
    Default function for putting a tensor or collection of tensors to the proper device.
    Returns the tensor references after being placed on the proper device.

    Supported use cases:
        - single tensor
        - Dictionary of single tensors
        - Dictionary of iterable of tensors
        - Dictionary of dictionary of tensors
        - Iterable of single tensors
        - Iterable of iterable of tensors
        - Iterable of dictionary of tensors

    :param tensors: the tensors or collection of tensors to put onto a device
    :param device: the string representing the device to put the tensors on,
        ex: 'cpu', 'cuda', 'cuda:1'
    :return: the tensors or collection of tensors after being placed on the device
    """
    match tensors:
        case Tensor():
            return tensors.to(device)

        case OrderedDict():
            return OrderedDict(
                [
                    (key, tensors_to_device(tens, device))
                    for key, tens in tensors.items()
                ]
            )

        case Mapping():
            return {
                key: tensors_to_device(tens, device) for key, tens in tensors.items()
            }

        case tuple():
            return tuple(tensors_to_device(tens, device) for tens in tensors)

        case Iterable():
            return [tensors_to_device(tens, device) for tens in tensors]

        case _:
            raise ValueError(
                f"unrecognized type for tensors given of {tensors.__class__.__name__}"
            )

tensors_to_precision

tensors_to_precision(
    tensors: Tensor | Iterable[Tensor] | dict[Any, Tensor],
    full_precision: bool,
) -> Tensor | Iterable[Tensor] | dict[Any, Tensor]

Parameters:

tensors (Tensor | Iterable[Tensor] | dict[Any, Tensor]) –

the tensors to change the precision of
full_precision (bool) –

True for full precision (float 32) and False for half (float 16)

Returns:

Tensor | Iterable[Tensor] | dict[Any, Tensor] –

the tensors converted to the desired precision

Source code in src/llmcompressor/pytorch/utils/helpers.py

def tensors_to_precision(
    tensors: Tensor | Iterable[Tensor] | dict[Any, Tensor], full_precision: bool
) -> Tensor | Iterable[Tensor] | dict[Any, Tensor]:
    """
    :param tensors: the tensors to change the precision of
    :param full_precision: True for full precision (float 32) and
        False for half (float 16)
    :return: the tensors converted to the desired precision
    """
    match tensors:
        case Tensor():
            return tensors.float() if full_precision else tensors.half()

        case OrderedDict():
            return OrderedDict(
                [
                    (key, tensors_to_precision(tens, full_precision))
                    for key, tens in tensors.items()
                ]
            )

        case Mapping():
            return {
                key: tensors_to_precision(tens, full_precision)
                for key, tens in tensors.items()
            }

        case tuple():
            return tuple(tensors_to_precision(tens, full_precision) for tens in tensors)

        case Iterable():
            return [tensors_to_precision(tens, full_precision) for tens in tensors]

        case _:
            raise ValueError(
                f"unrecognized type for tensors given of {tensors.__class__.__name__}"
            )

llmcompressor.pytorch.utils

ModuleSparsificationInfo

params_quantized property

params_quantized_percent property

params_sparse property

params_sparse_percent property

params_total property

get_quantized_layers

tensor_sparsity

tensors_module_forward

tensors_to_device

tensors_to_precision

params_quantized `property`

params_quantized_percent `property`

params_sparse `property`

params_sparse_percent `property`

params_total `property`