llmcompressor.entrypoints.model_free.process
Functions:
-
process_file–Quantize and compress tensors in a given safetensors file.
-
process_file_microscale_scheme–Quantize and compress tensors for a single output shard using a microscale
-
validate_file–Validate that each quantizable tensor in a safetensors file can be quantized.
process_file
process_file(
inverse_weight_map: InverseWeightMap,
save_path: str | PathLike,
scheme: QuantizationScheme,
ignore: Iterable[str],
device: str | device,
converter: Converter | None = None,
) -> tuple[int, dict[str, str]]
Quantize and compress tensors in a given safetensors file.
Parameters:
-
inverse_weight_map(InverseWeightMap) –mapping of source file path -> tensor names. For standard mode: {{resolved_path: None}} means load all tensors to process
-
save_path(str | PathLike) –save path of file with quantized weights
-
scheme(QuantizationScheme) –quantization scheme to apply to tensors
-
ignore(Iterable[str]) –modules to ignore. Modules ending with "norm" are automatically ignored
-
device(str | device) –device used to quantize and compress weights
-
converter(Converter | None, default:None) –optional converter to apply to the checkpoint, e.g. conversion of some layers from some format to compressed-tensors
Source code in src/llmcompressor/entrypoints/model_free/process.py
process_file_microscale_scheme
process_file_microscale_scheme(
inverse_weight_map: InverseWeightMap,
save_path: str | PathLike,
scheme: QuantizationScheme,
ignore: Iterable[str],
device: str | device,
converter: Converter | None = None,
) -> tuple[int, dict[str, str]]
Quantize and compress tensors for a single output shard using a microscale scheme (NVFP4, MXFP4).
Accepts a precomputed inverse_weight_map that specifies exactly which tensors to load from which source files — including any fused partner tensors from other shards needed for global scale computation. This avoids runtime discovery of fused partners and redundant tensor reads.
Partner tensors fetched from other shards are re-saved into this shard's output. The caller updates the safetensors index to reflect new locations.
Parameters:
-
inverse_weight_map(InverseWeightMap) –mapping of resolved source file path -> list of tensor names to load from that file. Example: {"/path/shard0.safetensors": ["q_proj.weight"], "/path/shard1.safetensors": ["k_proj.weight", "v_proj.weight"]}
-
save_path(str | PathLike) –output path for this shard's compressed weights
-
scheme(QuantizationScheme) –microscale quantization scheme (NVFP4, MXFP4)
-
ignore(Iterable[str]) –modules to ignore. Modules ending with "norm" are automatically ignored
-
device(str | device) –device used to quantize and compress weights
-
converter(Converter | None, default:None) –optional converter to apply to the checkpoint, e.g. conversion of some layers from some format to compressed-tensors
Source code in src/llmcompressor/entrypoints/model_free/process.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 | |
split_fused_moe_experts
Find fused MoE experts (with gate_up_proj/down_proj). Split them from 3D tensors into individual 2D expert tensors.
Args: tensors: Dictionary of loaded tensors from safetensors file
Returns: split_tensors: New dictionary with split expert weights
Source code in src/llmcompressor/entrypoints/model_free/process.py
validate_file
validate_file(
inverse_weight_map: InverseWeightMap,
save_path: str | PathLike,
scheme: QuantizationScheme,
ignore: Iterable[str],
device: str | device,
converter: Converter | None = None,
)
Validate that each quantizable tensor in a safetensors file can be quantized.
Parameters:
-
inverse_weight_map(InverseWeightMap) –mapping of source file path -> tensor names to validate
-
save_path(str | PathLike) –save path of file with quantized weights
-
scheme(QuantizationScheme) –quantization scheme to apply to tensors
-
ignore(Iterable[str]) –modules to ignore. Modules ending with "norm" are automatically ignored
-
device(str | device) –device used to quantize and compress weights
-
converter(Converter | None, default:None) –optional converter to apply to the checkpoint, e.g. conversion of some layers from some format to compressed-tensors