llmcompressor.modifiers.transform.spinquant.mappings
Classes:
-
SpinQuantMapping–SpinQuant needs to know the entire architecture of the model,
SpinQuantMapping
Bases: BaseModel
SpinQuant needs to know the entire architecture of the model, as R1, R2, R3, and R4 rotations need to be applied to specific layers (https://arxiv.org/pdf/2405.16406 Fig. 1).
Parameters:
-
embedding–name or regex of embedding layer
-
attn–name or regex of attention block in decoder layer
-
attn_q–name or regex of q_proj layer in attention block
-
attn_k–name or regex of k_proj layer in attention block
-
attn_v–name or regex of v_proj layer in attention block
-
attn_o–name or regex of o_proj layer in attention block
-
attn_head_dim–head_dim of the attention module, needed because R2 needs to be applied "head-wisely" to v_proj and o_proj
-
mlp_in–list of names or regexes for the mlp blocks that receive the input to the MLP block, usually up_proj and gate_proj
-
mlp_out–list of names or regexes for the mlp blocks that constitute the output of the MLP block, usually down_proj