Supported Features#
The feature support principle of vLLM Ascend is: aligned with the vLLM. We are also actively collaborating with the community to accelerate support.
You can check the support status of vLLM V1 Engine. Below is the feature support status of vLLM Ascend:
Feature |
Status |
Next Step |
|---|---|---|
Chunked Prefill |
🟢 Functional |
Functional, see detailed note: Chunked Prefill |
Automatic Prefix Caching |
🟢 Functional |
Functional, see detailed note: Automatic Prefix Caching |
LoRA |
🟢 Functional |
Functional, see detailed note: LoRA |
Speculative decoding |
🟢 Functional |
Basic support |
Pooling |
🟢 Functional |
CI needed to adapt to more models; |
Enc-dec |
🟡 Planned |
vLLM should support this feature first. |
Multi Modality |
🟢 Functional |
Tutorial, optimizing and adapting more models |
LogProbs |
🟢 Functional |
CI needed |
Prompt logProbs |
🟢 Functional |
CI needed |
Async output |
🟢 Functional |
CI needed |
Beam search |
🟢 Functional |
CI needed |
Guided Decoding |
🟢 Functional |
See detailed note: Structured Output Guide |
Tensor Parallel |
🟢 Functional |
Make TP >4 work with graph mode. |
Pipeline Parallel |
🟡 Planned |
Broken in this version, will fix in next release. |
Expert Parallel |
🟢 Functional |
See detailed note: Expert Load Balance (EPLB) |
Data Parallel |
🟢 Functional |
Data Parallel support for Qwen3 MoE. |
Prefill Decode Disaggregation |
🟢 Functional |
Functional, xPyD is supported. |
Quantization |
🟢 Functional |
See detailed note: [Quantization Guide][qaunt] |
Graph Mode |
🟢 Functional |
See detailed note: Graph Mode Guide |
Sleep Mode |
🟢 Functional |
See detailed note: Sleep Mode |
🟢 Functional: Fully operational, with ongoing optimizations.
🔵 Experimental: Experimental support, interfaces and functions may change.
🚧 WIP: Under active development, will be supported soon.
🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).
🔴 NO plan/Deprecated: No plan or deprecated by vLLM.