Supported Features

Supported Features#

The feature support principle of vLLM Ascend is: aligned with the vLLM. We are also actively collaborating with the community to accelerate support.

You can check the support status of vLLM V1 Engine. Below is the feature support status of vLLM Ascend:

Feature	Status	Next Step
Chunked Prefill	🟢 Functional	Functional, see detailed note: Chunked Prefill
Automatic Prefix Caching	🟢 Functional	Functional, see detailed note: Automatic Prefix Caching
LoRA	🟢 Functional	Functional, see detailed note: LoRA
Speculative decoding	🟢 Functional	Basic support
Pooling	🟢 Functional	CI needed to adapt to more models;
Enc-dec	🟡 Planned	vLLM should support this feature first.
Multi Modality	🟢 Functional	Tutorial, optimizing and adapting more models
LogProbs	🟢 Functional	CI needed
Prompt logProbs	🟢 Functional	CI needed
Async output	🟢 Functional	CI needed
Beam search	🟢 Functional	CI needed
Guided Decoding	🟢 Functional	See detailed note: Structured Output Guide
Tensor Parallel	🟢 Functional	Make TP >4 work with graph mode.
Pipeline Parallel	🟡 Planned	Broken in this version, will fix in next release.
Expert Parallel	🟢 Functional	See detailed note: Expert Load Balance (EPLB)
Data Parallel	🟢 Functional	Data Parallel support for Qwen3 MoE.
Prefill Decode Disaggregation	🟢 Functional	Functional, xPyD is supported.
Quantization	🟢 Functional	See detailed note: [Quantization Guide][qaunt]
Graph Mode	🟢 Functional	See detailed note: Graph Mode Guide
Sleep Mode	🟢 Functional	See detailed note: Sleep Mode

🟢 Functional: Fully operational, with ongoing optimizations.
🔵 Experimental: Experimental support, interfaces and functions may change.
🚧 WIP: Under active development, will be supported soon.
🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).
🔴 NO plan/Deprecated: No plan or deprecated by vLLM.