llmcompressor.modifiers.transform.awq.dynamic_mappings
Dynamic AWQ mapping builders for hybrid attention models.
Models with hybrid attention (mix of full self-attention and linear/Gated DeltaNet attention) need layer-index-specific AWQ mappings that vary by model size. This module provides runtime detection and mapping generation for such architectures (e.g. Qwen3Next, Qwen3.5).
Functions:
-
get_layer_mappings_from_model–Infer AWQ mappings from a model. Checks the dynamic mapping registry
build_hybrid_attention_mappings
Dynamically build AWQ mappings for models with hybrid attention (full self-attention + linear/Gated DeltaNet attention), such as Qwen3Next and Qwen3.5.
Reads layer_types from the model config to determine which layers use full vs linear attention, then inspects the model's module names to detect the correct linear attention projection names and MLP structure.
Returns None if the model is not a hybrid attention model.
Source code in src/llmcompressor/modifiers/transform/awq/dynamic_mappings.py
51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 | |
get_layer_mappings_from_model
Infer AWQ mappings from a model. Checks the dynamic mapping registry first (for models needing runtime-generated mappings), then falls back to the static registry, then to default mappings.
Parameters:
-
model(Module) –the model to infer mappings for
Returns:
-
list[AWQMapping]–list of AWQMapping for the model