LLaMA-Factory#
Introduction
LLaMA-Factory 是一个易于使用且高效的平台,用于训练和微调大型语言模型。有了 LLaMA-Factory,你可以在本地对数百个预训练模型进行微调,无需编写任何代码。
LLaMA-Facotory users need to evaluate and inference the model after fine-tuning.
Business challenge
LLaMA-Factory uses Transformers to perform inference on Ascend NPUs, but the speed is slow.
Benefits with vLLM Ascend
With the joint efforts of LLaMA-Factory and vLLM Ascend (LLaMA-Factory#7739), LLaMA-Factory has achieved significant performance gains during model inference. Benchmark results show that its inference speed is now up to 2× faster compared to the Transformers implementation.
了解更多
See more details about LLaMA-Factory and how it uses vLLM Ascend for inference on Ascend NPUs in LLaMA-Factory Ascend NPU Inference.