安装#

本文档介绍如何手动安装 vllm-ascend。

环境要求#

  • 操作系统:Linux

  • Python:>= 3.9,< 3.12

  • 配备昇腾 NPU 的硬件,通常为 Atlas 800 A2 系列。

  • 软件依赖:

    软件

    支持的版本

    说明

    Ascend HDK

    请参考此处

    CANN 运行所必需

    CANN

    == 8.3.RC2

    vllm-ascend 与 torch-npu 所必需

    torch-npu

    == 2.7.1.post1

    vllm-ascend 所必需,无需手动安装,后续步骤中会自动安装

    torch

    == 2.7.1

    torch-npu 与 vllm 所必需

共有两种安装方式:

  • 使用 pip:先手动准备环境或使用 CANN 镜像准备环境,然后通过 pip 安装 vllm-ascend

  • 使用 Docker:直接使用 vllm-ascend 预构建的 Docker 镜像。

配置新环境#

在安装之前,请确保固件/驱动以及 CANN 已正确安装。更多详情请参考 Ascend 环境搭建指南

配置硬件环境#

要验证 Ascend NPU 固件和驱动是否正确安装,请执行:

npu-smi info

更多信息请参考 Ascend 环境搭建指南

配置软件环境#

准备软件环境的最简单方式是直接使用 CANN 镜像:

# Update DEVICE according to your device (/dev/davinci[0-7])
export DEVICE=/dev/davinci7
# Update the vllm-ascend image
export IMAGE=quay.io/ascend/cann:8.3.rc2-910b-ubuntu22.04-py3.11
docker run --rm \
    --name vllm-ascend-env \
    --device $DEVICE \
    --device /dev/davinci_manager \
    --device /dev/devmm_svm \
    --device /dev/hisi_hdc \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -v /root/.cache:/root/.cache \
    -it $IMAGE bash
点击此处查看“手动安装 CANN”

你也可以选择手动安装 CANN:

# Create a virtual environment.
python -m venv vllm-ascend-env
source vllm-ascend-env/bin/activate

# Install required Python packages.
pip3 install -i https://pypi.tuna.tsinghua.edu.cn/simple attrs 'numpy<2.0.0' decorator sympy cffi pyyaml pathlib2 psutil protobuf scipy requests absl-py wheel typing_extensions

# Download and install the CANN package.
wget --header="Referer: https://www.hiascend.com/" https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/CANN%208.3.RC2/Ascend-cann-toolkit_8.3.RC2_linux-"$(uname -i)".run
chmod +x ./Ascend-cann-toolkit_8.3.RC2_linux-"$(uname -i)".run
./Ascend-cann-toolkit_8.3.RC2_linux-"$(uname -i)".run --full
# https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/Milan-ASL/Milan-ASL%20V100R001C22B800TP052/Ascend-cann-kernels-910b_8.3.rc2_linux-aarch64.run

source /usr/local/Ascend/ascend-toolkit/set_env.sh
wget --header="Referer: https://www.hiascend.com/" https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/CANN%208.3.RC2/Ascend-cann-kernels-910b_8.3.RC2_linux-"$(uname -i)".run
chmod +x ./Ascend-cann-kernels-910b_8.3.RC2_linux-"$(uname -i)".run
./Ascend-cann-kernels-910b_8.3.RC2_linux-"$(uname -i)".run --install

wget --header="Referer: https://www.hiascend.com/" https://ascend-repo.obs.cn-east-2.myhuaweicloud.com/CANN/CANN%208.3.RC2/Ascend-cann-nnal_8.3.RC2_linux-"$(uname -i)".run
chmod +x ./Ascend-cann-nnal_8.3.RC2_linux-"$(uname -i)".run
./Ascend-cann-nnal_8.3.RC2_linux-"$(uname -i)".run --install

source /usr/local/Ascend/nnal/atb/set_env.sh

如果你使用的是 vllm-ascend 预构建 Docker 镜像,则无需额外步骤。

完成上述步骤后,即可开始配置 vllmvllm-ascend

安装 vllm 和 vllm-ascend#

首先安装系统依赖并配置 pip 镜像源:

# Using apt-get with mirror
sed -i 's|ports.ubuntu.com|mirrors.tuna.tsinghua.edu.cn|g' /etc/apt/sources.list
apt-get update -y && apt-get install -y gcc g++ cmake libnuma-dev wget git curl jq
# Or using yum
# yum update -y && yum install -y gcc g++ cmake numactl-devel wget git curl jq
# Config pip mirror
pip config set global.index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

【可选】 如果你在 x86 机器上工作,或使用 torch-npu 的开发版本,请配置 pip 的额外索引:

# For torch-npu post version or x86 machine
pip config set global.extra-index-url "https://download.pytorch.org/whl/cpu/ https://mirrors.huaweicloud.com/ascend/repos/pypi"

然后即可从预编译的 wheel 包安装 vllmvllm-ascend

# Install vllm-project/vllm. The newest supported version is v0.11.0.
# Because the version v0.11.0 has not been archived in pypi, so you need to install from source.
git clone --depth 1 --branch v0.11.0 https://github.com/vllm-project/vllm
cd vllm
VLLM_TARGET_DEVICE=empty pip install -v -e .
cd ..

# Install vllm-project/vllm-ascend from pypi.
pip install vllm-ascend==0.11.0
点击此处查看“从源码构建”

或从源码构建:

# Install vLLM.
git clone --depth 1 --branch v0.11.0 https://github.com/vllm-project/vllm
cd vllm
VLLM_TARGET_DEVICE=empty pip install -v -e .
cd ..

# Install vLLM Ascend.
git clone  --depth 1 --branch v0.11.0 https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
pip install -v -e .
cd ..

vllm-ascend 默认会编译自定义算子。如不需要,可通过设置环境变量 COMPILE_CUSTOM_KERNELS=0 来禁用。

备注

如果需要使用休眠模式功能,请手动设置 COMPILE_CUSTOM_KERNELS=1。构建自定义算子要求 gcc/g++ 版本高于 8,并支持 C++17 或更高标准。如果使用 pip install -e . 时遇到 torch-npu 版本冲突,请改用 pip install --no-build-isolation -e . 以在系统环境中构建。若编译过程中出现其他问题,通常是由于使用了非预期的编译器,可在编译前通过设置环境变量 CXX_COMPILERC_COMPILER 来指定 g++ 和 gcc 的路径。

你可以直接拉取预构建镜像并通过 bash 运行。

点击此处查看“从 Dockerfile 构建”

或从源码构建镜像:

git clone https://github.com/vllm-project/vllm-ascend.git
cd vllm-ascend
docker build -t vllm-ascend-dev-image:latest -f ./Dockerfile .
# Update DEVICE according to your device (/dev/davinci[0-7])
export DEVICE=/dev/davinci7
# Update the vllm-ascend image
export IMAGE=quay.io/ascend/vllm-ascend:v0.11.0
docker run --rm \
    --name vllm-ascend-env \
    --device $DEVICE \
    --device /dev/davinci_manager \
    --device /dev/devmm_svm \
    --device /dev/hisi_hdc \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -v /root/.cache:/root/.cache \
    -it $IMAGE bash

默认工作目录为 /workspace。vLLM 与 vLLM Ascend 的代码位于 /vllm-workspace,并以开发模式pip install -e)进行安装,以便开发者在修改代码后立即生效,而无需重新安装。

附加信息#

验证安装#

创建并运行一个简单的推理测试,example.py 示例:

from vllm import LLM, SamplingParams

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]

# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
# Create an LLM.
llm = LLM(model="Qwen/Qwen3-0.6B")

# Generate texts from the prompts.
outputs = llm.generate(prompts, sampling_params)
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

然后运行:

# Try `export VLLM_USE_MODELSCOPE=true` and `pip install modelscope`
# to speed up download if huggingface is not reachable.
python example.py

输出结果如下:

Prompt: 'Hello, my name is', Generated text: " Shinji, a teenage boy from New York City. I'm a computer science"
Prompt: 'The president of the United States is', Generated text: ' a very important person. When he or she is elected, many people think that'
Prompt: 'The capital of France is', Generated text: ' Paris. The oldest part of the city is Saint-Germain-des-Pr'
Prompt: 'The future of AI is', Generated text: ' not bright\n\nThere is no doubt that the evolution of AI will have a huge'