NPU¶
vLLM-Omni supports NPU through the vLLM Ascend Plugin (vllm-ascend). This is a community maintained hardware plugin for running vLLM on NPU.
Requirements¶
- OS: Linux
- Python: 3.12
Note
vLLM-Omni is currently not natively supported on Windows.
For detailed hardware and software requirements, please refer to the vllm-ascend installation documentation.
Installation¶
Set up using Docker¶
vLLM-Omni offers Docker images for Ascend NPU deployment. You can just pull the prebuilt image from the image repository ascend/vllm-omni and run it with bash.
Supported images as following.
| image name | Hardware | OS |
|---|---|---|
| image-tag | Atlas A2 | Ubuntu |
| image-tag-a3 | Atlas A3 | Ubuntu |
Here's an example deployment command that has been verified on 4 x NPUs:
# Atlas A2:
# export IMAGE=quay.io/ascend/vllm-omni:v0.18.0
# Atlas A3:
# export IMAGE=quay.io/ascend/vllm-omni:v0.18.0-a3
export IMAGE=quay.io/ascend/vllm-omni:v0.18.0
docker run --rm \
--name vllm-omni-npu \
--shm-size=1g \
--device /dev/davinci0 \
--device /dev/davinci1 \
--device /dev/davinci2 \
--device /dev/davinci3 \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /root/.cache:/root/.cache \
-p 8000:8000 \
-it $IMAGE bash
Tip
You can use this docker image to serve models the same way you would with in vLLM! To do so, make sure you overwrite the default entrypoint (vllm serve --omni) which works only for models supported in the vLLM-Omni project.
Or build IMAGE from source code:
Build wheel from source¶
The recommended way to use vLLM-Omni on NPU is through the vllm-ascend pre-built Docker images:
# Update the vllm-ascend image
# Atlas A2:
# export IMAGE=quay.io/ascend/vllm-ascend:v0.18.0rc1
# Atlas A3:
# export IMAGE=quay.io/ascend/vllm-ascend:v0.18.0rc1-a3
export IMAGE=quay.io/ascend/vllm-ascend:v0.18.0rc1
docker run --rm \
--name vllm-omni-npu \
--shm-size=1g \
--device /dev/davinci0 \
--device /dev/davinci1 \
--device /dev/davinci2 \
--device /dev/davinci3 \
--device /dev/davinci_manager \
--device /dev/devmm_svm \
--device /dev/hisi_hdc \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
-v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /root/.cache:/root/.cache \
-p 8000:8000 \
-it $IMAGE bash
# Inside the container, install vLLM-Omni from source
cd /vllm-workspace
git clone -b v0.18.0 https://github.com/vllm-project/vllm-omni.git
cd vllm-omni
pip install -v -e . --no-build-isolation
# or VLLM_OMNI_TARGET_DEVICE=npu pip install -v -e .
export VLLM_WORKER_MULTIPROC_METHOD=spawn
The default workdir is /workspace, with vLLM, vLLM-Ascend and vLLM-Omni code placed in /vllm-workspace installed in development mode.
For other installation methods (pip installation, building from source, custom Docker builds), please refer to the vllm-ascend installation guide.
We are keeping issue #886 up to date with the aligned versions of vLLM, vLLM-Ascend, and vLLM-Omni, and also outlining the Q1 roadmap there.
You can also build vLLM-Omni from the latest main branch if you want to use the latest features or bug fixes. (But sometimes it will break for a while. You can check issue #886 for the status of the latest commit of vLLM-Omni main branch on NPU.)
# Pin vLLM version to 0.18.0
git clone -b v0.18.0 https://github.com/vllm-project/vllm.git
VLLM_TARGET_DEVICE=empty pip install -v -e .
git clone -b v0.18.0rc1 https://github.com/vllm-project/vllm-ascend.git
pip install -v -e .
# Install vLLM-Omni from the latest main branch
git clone https://github.com/vllm-project/vllm-omni.git
cd /vllm-workspace/vllm-omni
pip install -v -e . --no-build-isolation
# or VLLM_OMNI_TARGET_DEVICE=npu pip install -v -e .
export VLLM_WORKER_MULTIPROC_METHOD=spawn