llmcompressor.modifiers.transform.spinquant
Event dataclass
A class for defining an event that can be triggered during sparsification.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
type_ | Optional[EventType] | The type of event. | None |
steps_per_epoch | Optional[int] | The number of steps per epoch. | None |
batches_per_step | Optional[int] | The number of batches per step where step is an optimizer step invocation. For most pathways, these are the same. See the invocations_per_step parameter for more details when they are not. | None |
invocations_per_step | int | The number of invocations of the step wrapper before optimizer.step was called. Generally can be left as 1 (default). For older amp pathways, this is the number of times the scaler wrapper was invoked before the wrapped optimizer step function was called to handle accumulation in fp16. | 1 |
global_step | int | The current global step. | 0 |
global_batch | int | The current global batch. | 0 |
Source code in llmcompressor/core/events/event.py
62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 | |
current_index property writable
Calculates the current index of the event.
Returns:
| Type | Description |
|---|---|
float | The current index of the event, which is either the global step or the epoch with the fraction of the current step. |
Raises:
| Type | Description |
|---|---|
ValueError | if the event is not epoch based or if the steps per epoch are too many. |
epoch property
Calculates the current epoch.
Returns:
| Type | Description |
|---|---|
int | The current epoch. |
Raises:
| Type | Description |
|---|---|
ValueError | if the event is not epoch based. |
epoch_based property
Determines if the event is based on epochs.
Returns:
| Type | Description |
|---|---|
bool | True if the event is based on epochs, False otherwise. |
epoch_batch property
Calculates the current batch within the current epoch.
Returns:
| Type | Description |
|---|---|
int | The current batch within the current epoch. |
Raises:
| Type | Description |
|---|---|
ValueError | if the event is not epoch based. |
epoch_full property
Calculates the current epoch with the fraction of the current step.
Returns:
| Type | Description |
|---|---|
float | The current epoch with the fraction of the current step. |
Raises:
| Type | Description |
|---|---|
ValueError | if the event is not epoch based. |
epoch_step property
Calculates the current step within the current epoch.
Returns:
| Type | Description |
|---|---|
int | The current step within the current epoch. |
Raises:
| Type | Description |
|---|---|
ValueError | if the event is not epoch based. |
new_instance(**kwargs)
Creates a new instance of the event with the provided keyword arguments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kwargs | Keyword arguments to set in the new instance. | {} |
Returns:
| Type | Description |
|---|---|
Event | A new instance of the event with the provided kwargs. |
Source code in llmcompressor/core/events/event.py
should_update(start, end, update)
Determines if the event should trigger an update.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
start | Optional[float] | The start index to check against, set to None to ignore start. | required |
end | Optional[float] | The end index to check against, set to None to ignore end. | required |
update | Optional[float] | The update interval, set to None or 0.0 to always update, otherwise must be greater than 0.0, defaults to None. | required |
Returns:
| Type | Description |
|---|---|
bool | True if the event should trigger an update, False otherwise. |
Source code in llmcompressor/core/events/event.py
EventType
Bases: Enum
An Enum for defining the different types of events that can be triggered during model compression lifecycles. The purpose of each EventType is to trigger the corresponding modifier callback during training or post training pipelines.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
INITIALIZE | Event type for initialization. | required | |
FINALIZE | Event type for finalization. | required | |
BATCH_START | Event type for the start of a batch. | required | |
LOSS_CALCULATED | Event type for when loss is calculated. | required | |
BATCH_END | Event type for the end of a batch. | required | |
CALIBRATION_EPOCH_START | Event type for the start of a calibration epoch. | required | |
SEQUENTIAL_EPOCH_END | Event type for the end of a layer calibration epoch, specifically used by | required | |
CALIBRATION_EPOCH_END | Event type for the end of a calibration epoch. | required | |
OPTIM_PRE_STEP | Event type for pre-optimization step. | required | |
OPTIM_POST_STEP | Event type for post-optimization step. | required |
Source code in llmcompressor/core/events/event.py
Modifier
Bases: ModifierInterface, HooksMixin
A base class for all modifiers to inherit from. Modifiers are used to modify the training process for a model. Defines base attributes and methods available to all modifiers
Lifecycle: 1. initialize 2. on_event -> * on_start if self.start <= event.current_index * on_end if self.end >= event.current_index 5. finalize
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
index | The index of the modifier in the list of modifiers for the model | required | |
group | The group name for the modifier | required | |
start | The start step for the modifier | required | |
end | The end step for the modifier | required | |
update | The update step for the modifier | required |
Source code in llmcompressor/modifiers/modifier.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 | |
finalized property
Returns:
| Type | Description |
|---|---|
bool | True if the modifier has been finalized |
initialized property
Returns:
| Type | Description |
|---|---|
bool | True if the modifier has been initialized |
finalize(state, **kwargs)
Finalize the modifier for the given model and state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state | State | The current state of the model | required |
kwargs | Additional arguments for finalizing the modifier | {} |
Raises:
| Type | Description |
|---|---|
RuntimeError | if the modifier has not been initialized |
Source code in llmcompressor/modifiers/modifier.py
initialize(state, **kwargs)
Initialize the modifier for the given model and state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state | State | The current state of the model | required |
kwargs | Additional arguments for initializing the modifier | {} |
Raises:
| Type | Description |
|---|---|
RuntimeError | if the modifier has already been finalized |
Source code in llmcompressor/modifiers/modifier.py
on_end(state, event, **kwargs)
on_end is called when the modifier ends and must be implemented by the inheriting modifier.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state | State | The current state of the model | required |
event | Event | The event that triggered the end | required |
kwargs | Additional arguments for ending the modifier | {} |
Source code in llmcompressor/modifiers/modifier.py
on_event(state, event, **kwargs)
on_event is called whenever an event is triggered
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state | State | The current state of the model | required |
event | Event | The event that triggered the update | required |
kwargs | Additional arguments for updating the model | {} |
Source code in llmcompressor/modifiers/modifier.py
on_finalize(state, **kwargs)
on_finalize is called on modifier finalization and must be implemented by the inheriting modifier.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state | State | The current state of the model | required |
kwargs | Additional arguments for finalizing the modifier | {} |
Returns:
| Type | Description |
|---|---|
bool | True if the modifier was finalized successfully, False otherwise |
Source code in llmcompressor/modifiers/modifier.py
on_initialize(state, **kwargs) abstractmethod
on_initialize is called on modifier initialization and must be implemented by the inheriting modifier.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state | State | The current state of the model | required |
kwargs | Additional arguments for initializing the modifier | {} |
Returns:
| Type | Description |
|---|---|
bool | True if the modifier was initialized successfully, False otherwise |
Source code in llmcompressor/modifiers/modifier.py
on_start(state, event, **kwargs)
on_start is called when the modifier starts and must be implemented by the inheriting modifier.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state | State | The current state of the model | required |
event | Event | The event that triggered the start | required |
kwargs | Additional arguments for starting the modifier | {} |
Source code in llmcompressor/modifiers/modifier.py
on_update(state, event, **kwargs)
on_update is called when the model in question must be updated based on passed in event. Must be implemented by the inheriting modifier.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state | State | The current state of the model | required |
event | Event | The event that triggered the update | required |
kwargs | Additional arguments for updating the model | {} |
Source code in llmcompressor/modifiers/modifier.py
should_end(event)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
event | Event | The event to check if the modifier should end | required |
Returns:
| Type | Description |
|---|---|
| True if the modifier should end based on the given event |
Source code in llmcompressor/modifiers/modifier.py
should_start(event)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
event | Event | The event to check if the modifier should start | required |
Returns:
| Type | Description |
|---|---|
bool | True if the modifier should start based on the given event |
Source code in llmcompressor/modifiers/modifier.py
update_event(state, event, **kwargs)
Update modifier based on the given event. In turn calls on_start, on_update, and on_end based on the event and modifier settings. Returns immediately if the modifier is not initialized
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state | State | The current state of sparsification | required |
event | Event | The event to update the modifier with | required |
kwargs | Additional arguments for updating the modifier | {} |
Raises:
| Type | Description |
|---|---|
RuntimeError | if the modifier has been finalized |
Source code in llmcompressor/modifiers/modifier.py
NormMapping
Bases: BaseModel
SpinQuant needs to know where every norm layer exists in the model, as well as all the subsequent Linear layers the norm passes into. This is because the norm layer weights need to normalized before transforms can be fused into Linear layers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
norm | name or regex that matches norm layer in model | required | |
linears | list of names or regexes of Linear layers that receive input from norm. | required |
Source code in llmcompressor/modifiers/transform/spinquant/norm_mappings.py
SpinQuantMapping
Bases: BaseModel
SpinQuant needs to know the entire architecture of the model, as R1, R2, R3, and R4 rotations need to be applied to specific layers (https://arxiv.org/pdf/2405.16406 Fig. 1).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embedding | name or regex of embedding layer | required | |
attn_q | name or regex of q_proj layer in attention block | required | |
attn_k | name or regex of k_proj layer in attention block | required | |
attn_v | name or regex of v_proj layer in attention block | required | |
attn_o | name or regex of o_proj layer in attention block | required | |
attn_head_dim | head_dim of the attention module, needed because R2 needs to be applied "head-wisely" to v_proj and o_proj | required | |
mlp_in | list of names or regexes for the mlp blocks that receive the input to the MLP block, usually up_proj and gate_proj | required | |
mlp_out | list of names or regexes for the mlp blocks that consitute the output of the MLP block, usually down_proj | required |
Source code in llmcompressor/modifiers/transform/spinquant/mappings.py
SpinQuantModifier
Bases: Modifier
Implements the transforms according to "SpinQuant: LLM quantization with learned rotations" (https://arxiv.org/abs/2405.16406)
Transforms (rotations) are extra layers added to a model which reduce the accuracy loss induced by quantization. This is achived through "rotating" weights and activations into a space with a smaller dynamic range of values, thus decreasing the range of scales required for quantization.
The SpinQuant authors describe four different rotations which can be applied to a model. R1 and R2 are "offline" rotations, meaning that they can be fused into existing weights and therefore do not induce runtime cost. R3 and R4 are "online" rotations, meaning that they require additional computation at runtime.
Lifecycle: - on_initialize - infer SpinQuantMappings & NormMappings - as needed, create transform schemes for R1, R2, R3, & R4 - on_start - normalize embeddings - fuse norm layers into subsequent Linear layers - apply TransformConfig - fuse transforms into weights for mergeable transforms - add hooks for online transforms - on sequential epoch end - on_end - on_finalize
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rotations | A list containing the names of rotations to apply to the model. Possible rotations include R1, R2, R3, and R4 | required | |
transform_type | The type of transform to apply to the model. | required | |
randomize | if True, create distinct transforms for each application | required | |
learnable | if True, attach gradients to transform weights for training | required | |
precision | Precision at which all transforms should be applied. This applies to both weight fusing and online rotations | required | |
mappings | Specifies layers within a model to target for transforms. A mapping will be inferred if None is provided | required | |
norm_mappings | Specifies layers within a model to target for norm fusing. A mapping will be inferred if None is provided | required | |
transform_config | Optional transform config for overriding provided arguments | required |
Source code in llmcompressor/modifiers/transform/spinquant/base.py
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 | |
State dataclass
State class holds information about the current compression state.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model | Any | The model being used for compression | None |
teacher_model | Any | The teacher model being used for compression | None |
optimizer | Any | The optimizer being used for training | None |
optim_wrapped | bool | Whether or not the optimizer has been wrapped | None |
loss | Any | The loss function being used for training | None |
batch_data | Any | The current batch of data being used for compression | None |
data | Data | The data sets being used for training, validation, testing, and/or calibration, wrapped in a Data instance | Data() |
hardware | Hardware | Hardware instance holding info about the target hardware being used | Hardware() |
loggers | Optional[LoggerManager] | LoggerManager instance holding all the loggers to log | None |
model_log_cadence | Optional[float] | The cadence to log model information w.r.t epochs. If 1, logs every epoch. If 2, logs every other epoch, etc. Default is 1. | None |
Source code in llmcompressor/core/state.py
74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 | |
compression_ready property
Check if the model and optimizer are set for compression.
Returns:
| Type | Description |
|---|---|
bool | True if model and optimizer are set, False otherwise |
update(model=None, teacher_model=None, optimizer=None, attach_optim_callbacks=True, train_data=None, val_data=None, test_data=None, calib_data=None, copy_data=True, start=None, steps_per_epoch=None, batches_per_step=None, loggers=None, model_log_cadence=None, **kwargs)
Update the state with the given parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
model | Any | The model to update the state with | None |
teacher_model | Any | The teacher model to update the state with | None |
optimizer | Any | The optimizer to update the state with | None |
attach_optim_callbacks | bool | Whether or not to attach optimizer callbacks | True |
train_data | Any | The training data to update the state with | None |
val_data | Any | The validation data to update the state with | None |
test_data | Any | The testing data to update the state with | None |
calib_data | Any | The calibration data to update the state with | None |
copy_data | bool | Whether or not to copy the data | True |
start | float | The start index to update the state with | None |
steps_per_epoch | int | The steps per epoch to update the state with | None |
batches_per_step | int | The batches per step to update the state with | None |
loggers | Union[None, LoggerManager, List[BaseLogger]] | The metrics manager to setup logging important info and milestones to, also accepts a list of BaseLogger(s) | None |
model_log_cadence | Optional[float] | The cadence to log model information w.r.t epochs. If 1, logs every epoch. If 2, logs every other epoch, etc. Default is 1. | None |
kwargs | Additional keyword arguments to update the state with | {} |
Returns:
| Type | Description |
|---|---|
Dict | The updated state as a dictionary |
Source code in llmcompressor/core/state.py
127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 | |
center_embeddings(embedding)
Shift each embedding to have a mean of zero
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
embedding | Module | embedding module containing embeddings to center | required |
Source code in llmcompressor/modeling/fuse.py
fuse_norm_linears(norm, linears)
Fuse the scaling operation of norm layer into subsequent linear layers. This useful for ensuring transform invariance between norm and linear layers.
Note that unitary transforms (rotation) commute with normalization, but not scaling
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
norm | Module | norm layer whose weight will be fused into subsequent linears | required |
linears | Iterable[Linear] | linear layers which directly follow the norm layer | required |