NPU¶

vLLM-Omni supports NPU through the vLLM Ascend Plugin (vllm-ascend). This is a community maintained hardware plugin for running vLLM on NPU.

Requirements¶

OS: Linux
Python: 3.12

Note

vLLM-Omni is currently not natively supported on Windows.

NPU

For detailed hardware and software requirements, please refer to the vLLM-Ascend installation documentation.

Installation¶

Set up using Docker¶

NPU

vLLM-Omni offers Docker images for Ascend NPU deployment. You can just pull the prebuilt image from the image repository ascend/vllm-omni and run it with bash.

Supported images as following.

image name	Hardware	OS
image-tag	Atlas A2	Ubuntu
image-tag-a3	Atlas A3	Ubuntu

Here's an example deployment command that has been verified on 4 x NPUs:

# Atlas A2: quay.io/ascend/vllm-omni:v0.24.0
# Atlas A3: quay.io/ascend/vllm-omni:v0.24.0-a3
docker run --rm \
    --name vllm-omni-a3 \
    --shm-size=4g \
    --device /dev/davinci0 \
    --device /dev/davinci1 \
    --device /dev/davinci2 \
    --device /dev/davinci3 \
    --device /dev/davinci_manager \
    --device /dev/devmm_svm \
    --device /dev/hisi_hdc \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
    --device /dev/davinci_manager \
    --device /dev/devmm_svm \
    --device /dev/hisi_hdc \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -v ~/.cache:/root/.cache \
    -p 8091:8091 \
    -it quay.io/ascend/vllm-omni:v0.24.0-a3 bash

Tip

You can use this docker image to serve models the same way you would with in vLLM! To do so, make sure you overwrite the default entrypoint (vllm serve --omni) which works only for models supported in the vLLM-Omni project.

Or build IMAGE from source code:

git clone https://github.com/vllm-project/vllm-omni.git
cd vllm-omni
# A2
docker build -t vllm-omni-dev-image:latest -f ./docker/Dockerfile.npu .
# A3
# docker build -t vllm-omni-dev-image:latest -f ./docker/Dockerfile.npu.a3 .

Build wheel from source¶

NPU releaseNPU from main

The recommended way to use vLLM-Omni on NPU is through the vLLM-Ascend pre-built Docker images:

# vLLM-Ascend image
# Atlas A2: quay.io/atlas-ci/vllm-ascend:v0.24.0
# Atlas A3: quay.io/atlas-ci/vllm-ascend:v0.24.0-a3
docker run --rm \
    --name vllm-omni-npu \
    --shm-size=1g \
    --device /dev/davinci0 \
    --device /dev/davinci1 \
    --device /dev/davinci2 \
    --device /dev/davinci3 \
    --device /dev/davinci_manager \
    --device /dev/devmm_svm \
    --device /dev/hisi_hdc \
    -v /usr/local/dcmi:/usr/local/dcmi \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/Ascend/driver/lib64/:/usr/local/Ascend/driver/lib64/ \
    -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \
    -v /etc/ascend_install.info:/etc/ascend_install.info \
    -v /root/.cache:/root/.cache \
    -p 8000:8000 \
    -it quay.io/atlas-ci/vllm-ascend:0.24.0-a3 bash

# Inside the container, install vLLM-Omni from source
cd /vllm-workspace
git clone -b v0.24.0 https://github.com/vllm-project/vllm-omni.git
cd vllm-omni
pip install -v -e . --no-build-isolation
# or VLLM_OMNI_TARGET_DEVICE=npu pip install -v -e .

export VLLM_WORKER_MULTIPROC_METHOD=spawn

The default workdir is /workspace, with vLLM, vLLM-Ascend and vLLM-Omni code placed in /vllm-workspace installed in development mode.

For other installation methods (pip installation, building from source, custom Docker builds), please refer to the vllm-ascend installation guide.

We are keeping issue #886 up to date with the aligned versions of vLLM, vLLM-Ascend, and vLLM-Omni, and also outlining the Q1 roadmap there.

You can also build vLLM-Omni from the latest main branch if you want to use the latest features or bug fixes. (But sometimes it will break for a while. You can check issue #886 for the status of the latest commit of vLLM-Omni main branch on NPU.)

# Pin vLLM version to 0.18.0
git clone -b v0.18.0 https://github.com/vllm-project/vllm.git
VLLM_TARGET_DEVICE=empty pip install -v -e .

git clone -b v0.18.0rc1 https://github.com/vllm-project/vllm-ascend.git
pip install -v -e .

# Install vLLM-Omni from the latest main branch
git clone https://github.com/vllm-project/vllm-omni.git
cd /vllm-workspace/vllm-omni
pip install -v -e . --no-build-isolation
# or VLLM_OMNI_TARGET_DEVICE=npu pip install -v -e .
export VLLM_WORKER_MULTIPROC_METHOD=spawn