Supported Features

Supported Features#

The feature support principle of vLLM Ascend is: aligned with the vLLM. We are also actively collaborating with the community to accelerate support.

You can check the support status of vLLM V1 Engine. Below is the feature support status of vLLM Ascend:

Feature

Status

Next Step

Chunked Prefill

🟢 Functional

Functional, see detailed note: Chunked Prefill

Automatic Prefix Caching

🟢 Functional

Functional, see detailed note: Automatic Prefix Caching

LoRA

🟢 Functional

Functional, see detailed note: LoRA

Speculative decoding

🟢 Functional

Basic support

Pooling

🟢 Functional

CI needed to adapt to more models;

Enc-dec

🟡 Planned

vLLM should support this feature first.

Multi Modality

🟢 Functional

Tutorial, optimizing and adapting more models

LogProbs

🟢 Functional

CI needed

Prompt logProbs

🟢 Functional

CI needed

Async output

🟢 Functional

CI needed

Beam search

🟢 Functional

CI needed

Guided Decoding

🟢 Functional

See detailed note: Structured Output Guide

Tensor Parallel

🟢 Functional

Make TP >4 work with graph mode.

Pipeline Parallel

🟡 Planned

Broken in this version, will fix in next release.

Expert Parallel

🟢 Functional

See detailed note: Expert Load Balance (EPLB)

Data Parallel

🟢 Functional

Data Parallel support for Qwen3 MoE.

Prefill Decode Disaggregation

🟢 Functional

Functional, xPyD is supported.

Quantization

🟢 Functional

See detailed note: [Quantization Guide][qaunt]

Graph Mode

🟢 Functional

See detailed note: Graph Mode Guide

Sleep Mode

🟢 Functional

See detailed note: Sleep Mode

  • 🟢 Functional: Fully operational, with ongoing optimizations.

  • 🔵 Experimental: Experimental support, interfaces and functions may change.

  • 🚧 WIP: Under active development, will be supported soon.

  • 🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).

  • 🔴 NO plan/Deprecated: No plan or deprecated by vLLM.