vllm.model_executor.model_loader.reload.torchao_decorator ¶
Functions:
-
support_quantized_model_reload_from_hp_weights–Decorator for
load_weightsmethod for AutoWeightsLoader.load_weights to support
support_quantized_model_reload_from_hp_weights(original_load_weights) ¶
Decorator for load_weights method for AutoWeightsLoader.load_weights to support reloading high precision (bfloat16/float16/float32) weight for an already quantized model, this involves restoring the weights to a high precision weights and then online quantize the weights.
Only applies to torchao quantized models. Assumes that all model weights are loaded within a single weights iterator (cannot perform batched updates)