Adding a New Modifier
Modifiers are the core extension point in LLM Compressor. Each compression algorithm — GPTQ, AWQ, SmoothQuant, and others — is implemented as a modifier. This tutorial walks through the modifier contract, lifecycle, and how to implement a custom modifier.
What is a Modifier?
A modifier is a subclass of the Modifier base class that hooks into the compression pipeline at well-defined lifecycle points. When you call oneshot, LLM Compressor:
- Instantiate modifiers from the recipe
- Call
initializeon each modifier, which callson_initialize - For each calibration pipeline:
- Dispatch the model
- For each calibration epoch:
- Fire
CALIBRATION_STARTevent, callingon_calibration_start - Run calibration forward passes (quantization disabled)
- Fire
SEQUENTIAL_EPOCH_ENDevent after each layer group (sequential pipeline) or once for the entire model (basic/data-free pipelines), callingon_sequential_epoch_end - Fire
CALIBRATION_ENDevent, callingon_calibration_end
- Fire
- Call
finalizeon each modifier, which callson_finalize
Modifiers express what they want to do at each stage by implementing lifecycle hooks.
The Modifier Contract
All modifiers subclass llmcompressor.modifiers.Modifier and must implement on_initialize. All other lifecycle hooks are optional.
from llmcompressor.modifiers import Modifier
from llmcompressor.core import State, Event
class MyModifier(Modifier):
# Pydantic fields — declare your parameters here
my_param: float = 1.0
def on_initialize(self, state: State, **kwargs) -> bool:
# Called once before calibration begins.
# Set up hooks, attach attributes to modules, etc.
# Return True if initialization succeeded.
...
return True
def on_finalize(self, state: State, **kwargs) -> bool:
# Called after calibration completes.
# Clean up hooks, apply final transformations, etc.
# Return True if finalization succeeded.
...
return True
def on_event(self, state: State, event: Event, **kwargs):
# Called on every event, unconditionally, before lifecycle events are
# dispatched. Override to respond to custom event types or to implement
# cross-cutting behavior.
...
## Training lifecycle events ##
def on_start(self, state: State, event: Event, **kwargs):
# Called when the modifier starts based on the `start` parameter.
# The base class automatically dispatches this when `start <= event.current_index`.
# For training scenarios with explicit start/end steps.
...
def on_update(self, state: State, event: Event, **kwargs):
# Called on every event while the modifier is active (between on_start and
# on_end). Rarely needed — only useful for per-batch callbacks such as
# dynamic pruning schedules. Compression modifiers (GPTQ, AWQ, etc.) do
# not use this hook.
...
def on_end(self, state: State, event: Event, **kwargs):
# Called when the modifier ends based on the `end` parameter.
# The base class automatically dispatches this when `end >= event.current_index`.
# For training scenarios with explicit start/end steps.
...
## Calibration lifecycle events ##
def on_calibration_start(self, state: State, event: Event, **kwargs):
# Called at the start of each calibration epoch.
# This is where most compression modifiers initialize their state for
# the calibration pass (e.g., set up observers, reset statistics).
...
def on_sequential_epoch_end(self, state: State, event: Event, **kwargs):
# Called at the end of a sequential layer group (sequential pipeline) or
# once for the entire model (basic/data-free pipelines).
# This is where quantization modifiers compute weight and activation
# quantization parameters.
...
def on_calibration_end(self, state: State, event: Event, **kwargs):
# Called at the end of each calibration epoch.
# This is where most compression modifiers finalize their calibration
# (e.g., apply compression, enable quantization).
...
Lifecycle Summary
| Hook | When it runs | Required |
|---|---|---|
on_initialize |
Once, during initialize(), before calibration |
Yes |
on_finalize |
Once, during finalize(), after all pipelines complete |
No |
on_event |
Every event, unconditionally (before lifecycle dispatch) | No |
on_calibration_start |
At the start of each calibration epoch (fired by CALIBRATION_START event) |
No |
on_sequential_epoch_end |
After each layer group (sequential) or once for entire model (basic/data-free) (fired by SEQUENTIAL_EPOCH_END event) |
No |
on_calibration_end |
At the end of each calibration epoch (fired by CALIBRATION_END event) |
No |
on_start |
When start <= event.current_index (training lifecycle, auto-dispatched) |
No |
on_update |
Every event while active (between on_start and on_end); rarely used |
No |
on_end |
When end >= event.current_index (training lifecycle, auto-dispatched) |
No |
Note on calibration vs training lifecycle: The base
Modifierclass now provides first-class support for both calibration and training lifecycles: - Calibration lifecycle (on_calibration_start,on_sequential_epoch_end,on_calibration_end): These hooks are automatically dispatched when the corresponding event types are fired by the calibration pipeline. All compression modifiers (GPTQ, AWQ, SmoothQuant, SparseGPT, etc.) use these hooks. - Training lifecycle (on_start,on_update,on_end): These hooks are auto-dispatched based on thestartandendparameters. They are primarily used for training scenarios with explicit step ranges (e.g., dynamic pruning schedules). -on_event: Called on every event before any lifecycle dispatch. Use this for cross-cutting behavior or custom event types.Note on
SEQUENTIAL_EPOCH_END: This event is fired by all pipelines (sequential, basic, and data-free), not just the sequential pipeline. For the sequential pipeline, it fires after each layer group with a subgraph scoped to that group. For basic and data-free pipelines, it fires once with a subgraph covering the entire model. Built-in quantization modifiers (QuantizationModifier, GPTQModifier) use this event to compute weight and activation quantization parameters — DDP synchronization of activation statistics, weight observation, and qparam computation (for both activations and weights) all happen here.Quantization is disabled during calibration. All pipelines disable quantization during calibration forward passes via
DisableQuantization. This means calibration hooks see unquantized activations. Quantization parameters are computed atSEQUENTIAL_EPOCH_ENDand quantization is enabled atCALIBRATION_END.
The State Object
state.model gives you the torch.nn.Module being compressed. This is the primary object you will interact with in most hooks.
Pydantic Parameters
The Modifier base class is a subclass of the Pydantic BaseModel class, meaning that all algorithm parameters are declared as class-level fields. This structure allows modifiers to be instantiated directly as python objects or loaded a YAML recipe.
from pydantic import Field
class MyModifier(Modifier):
targets: list[str] = Field(default_factory=lambda: ["Linear"])
scale_factor: float = 0.5
ignore: list[str] = Field(default_factory=list)
Attaching Hooks with HooksMixin
Modifier inherits from HooksMixin, which provides a managed way to register PyTorch forward hooks. Hooks registered through HooksMixin are automatically removed when finalize is called.
from llmcompressor.modifiers import Modifier
from llmcompressor.core import State
class MyModifier(Modifier):
def on_initialize(self, state: State, **kwargs) -> bool:
for name, module in state.model.named_modules():
if "Linear" in type(module).__name__:
self.register_hook(
module,
self._forward_hook,
"forward",
)
return True
def _forward_hook(self, module, inputs, output):
"""
Runs after every forward pass through this module
Using the `HooksMixin` interface ensures that your modifiers hooks are enabled during calibration passes through the model and disabled during propagation passes through the model. See [Sequential Pipeline](src/llmcompressor/pipelines/sequential/pipeline.py) for more information.
"""
...
Example: A Weight-Clamping Modifier
The following modifier clamps the stored weight tensors of all Linear layers to a fixed absolute magnitude after calibration completes (CALIBRATION_END).
import torch
from pydantic import Field, PrivateAttr
from compressed_tensors.utils import match_named_modules
from llmcompressor.modifiers import Modifier
from llmcompressor.core import State, Event, EventType
class WeightClampModifier(Modifier):
"""
Clamps the magnitude of Linear layer weight tensors to a maximum absolute
value. Applied layer-by-layer on SEQUENTIAL_EPOCH_END (sequential pipeline)
or all at once on CALIBRATION_END (basic pipeline).
:param max_weight_magnitude: maximum allowed absolute value for any weight
:param targets: module types to target
:param ignore: module names to skip
"""
max_weight_magnitude: float = 1.0
targets: list[str] = Field(default_factory=lambda: ["Linear"])
ignore: list[str] = Field(default_factory=list)
_clamped: set[str] = PrivateAttr(default_factory=set)
def on_initialize(self, state: State, **kwargs) -> bool:
if self.max_weight_magnitude <= 0:
raise ValueError("max_weight_magnitude must be positive")
# Verify that at least one target module exists in the model
matched = list(match_named_modules(state.model, self.targets, self.ignore))
if not matched:
raise ValueError(
f"No modules matched targets={self.targets} ignore={self.ignore}"
)
return True
def on_calibration_end(self, state: State, event: Event, **kwargs):
# Clamp all target modules at the end of calibration
for name, module in match_named_modules(
state.model, self.targets, self.ignore
):
if name in self._clamped:
continue
with torch.no_grad():
module.weight.clamp_(
-self.max_weight_magnitude,
self.max_weight_magnitude,
)
self._clamped.add(name)
def on_finalize(self, state: State, **kwargs) -> bool:
self._clamped.clear()
return True
Using the Modifier with oneshot
from transformers import AutoModelForCausalLM, AutoTokenizer
from llmcompressor import oneshot
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
oneshot(
model=model,
recipe=[WeightClampModifier(max_weight_magnitude=0.5, targets=["Linear"], ignore=["lm_head"])],
)
model.save_pretrained("Qwen3-0.6B-clamped")
tokenizer.save_pretrained("Qwen3-0.6B-clamped")
Using the Modifier from a YAML Recipe
weight_clamp_stage:
weight_clamp_modifiers:
WeightClampModifier:
max_weight_magnitude: 0.5
targets:
- Linear
ignore:
- lm_head
Tips
- Only override what you need. The default implementations of all lifecycle hooks except
on_initializeare no-ops or returnTrue— you do not need to callsuper()for these. - Use
match_named_modules(fromcompressed_tensors.utils) to filter modules by type name or path pattern, consistent with how other modifiers handletargetsandignore. - Keep
on_initializelightweight. Expensive operations (e.g., full-model passes) should be deferred to calibration lifecycle hooks oron_finalize. on_updateis rarely needed. Only override it if you need a per-batch callback while the modifier is active — e.g.,MagnitudeModifieruses it to update sparsity each batch. Compression modifiers (GPTQ, AWQ, SmoothQuant, etc.) do not use it.- Modifiers never fire events — the pipeline does. All lifecycle events (
CALIBRATION_START,SEQUENTIAL_EPOCH_END,CALIBRATION_END) are fired by the calibration pipeline. Your modifier only reacts to them by implementing the corresponding hooks. The baseModifierclass automatically routes these events to the appropriate lifecycle methods. - All calibration pipelines fire
SEQUENTIAL_EPOCH_END. The sequential pipeline fires it between layer groups (scoped to each group), while the basic and data-free pipelines fire it once for the entire model. Modifiers like GPTQ, SparseGPT, and QuantizationModifier use this event to trigger compression and qparam computation.