llmcompressor.observers.base
Classes:
-
Observer–Base class for observers which compute quantization parameters given
-
QParamsDict–Dictionary containing quantization parameters.
Observer
Bases: InternalModule, RegistryMixin
Base class for observers which compute quantization parameters given observations of weights, activations, or attention states.
Parameters:
-
base_name(str) –str used to name the observer attribute
-
args(QuantizationArgs) –quantization args used to calibrate and quantize the observed value
-
**observer_kwargs–keyword arguments for observer initialization
Methods:
-
attach–Called when the observer is attached to a module.
-
detach–Called before the observer is deleted from a module.
-
forward–Update observer statistics from observed value.
-
fuse–Link all observers in the list with each other for shared global_scale.
-
get_qparams–Compute quantization parameters from accumulated statistics.
-
sync_activation_stats–All-reduce accumulated activation statistics across DDP ranks.
-
update_statistics_from_observed–Update internal observer statistics (min_vals, max_vals) from observed tensor.
Source code in src/llmcompressor/observers/base.py
attach
Called when the observer is attached to a module. Subclasses can override to register hooks or initialize state.
Parameters:
-
module(Module) –the module this observer is being attached to
Source code in src/llmcompressor/observers/base.py
detach
Called before the observer is deleted from a module. Subclasses can override to remove hooks and clean up module attributes.
Parameters:
-
module(Module) –the module this observer is being removed from
Source code in src/llmcompressor/observers/base.py
forward
Update observer statistics from observed value.
Parameters:
-
observed(Tensor) –value being observed
Returns:
-
Observer–self for method chaining
Source code in src/llmcompressor/observers/base.py
fuse
staticmethod
Link all observers in the list with each other for shared global_scale.
Parameters:
-
observers(Iterable[Observer]) –list of observers to fuse together
Source code in src/llmcompressor/observers/base.py
get_qparams
Compute quantization parameters from accumulated statistics.
For TENSOR_GROUP, global_scale is computed from the absmax of this observer and all fused observers. Fused observers must already have statistics — call observe_weight on all modules before calling get_qparams on any of them.
Returns:
-
QParamsDict–dict with keys "scale", "zero_point", and "global_scale"
Source code in src/llmcompressor/observers/base.py
sync_activation_stats
All-reduce accumulated activation statistics across DDP ranks.
note: weight statistics don't need to be synced since weights
are synced across ranks, only data (activations) differs by rank.
Returns:
-
List[Work]–list of async communication handles
Source code in src/llmcompressor/observers/base.py
update_statistics_from_observed
abstractmethod
Update internal observer statistics (min_vals, max_vals) from observed tensor.
Parameters:
-
observed(Tensor) –flattened observed value of shape (num_observations, *qparam_shape, group_size)
Source code in src/llmcompressor/observers/base.py
QParamsDict
Bases: TypedDict
Dictionary containing quantization parameters.