Feature Support#
The feature support principle of vLLM Ascend is: aligned with the vLLM. We are also actively collaborating with the community to accelerate support.
vLLM Ascend offers the overall functional support of the most features in vLLM, and the usage keep the same with vLLM except for some limits.
Feature |
vLLM Ascend |
vLLM Ascend (+ MindIE Turbo) |
Notes |
|---|---|---|---|
V1Engine |
π΅ Experimental |
π΅ Experimental |
Will enhance in v0.8.x |
Chunked Prefill |
π’ Functional |
π’ Functional |
/ |
Automatic Prefix Caching |
π’ Functional |
π’ Functional |
[Usage Limits]#732 |
LoRA |
π’ Functional |
π‘ Planned |
/ |
Prompt adapter |
π‘ Planned |
π‘ Planned |
/ |
Speculative decoding |
π’ Functional |
π’ Functional |
[Usage Limits]#734 |
Pooling |
π’ Functional |
π’ Functional |
/ |
Enc-dec |
π‘ Planned |
π‘ Planned |
/ |
Multi Modality |
π’ Functional |
π’ Functional |
/ |
LogProbs |
π’ Functional |
π’ Functional |
/ |
Prompt logProbs |
π’ Functional |
π’ Functional |
/ |
Async output |
π’ Functional |
π’ Functional |
/ |
Multi step scheduler |
π’ Functional |
π’ Functional |
/ |
Best of |
π’ Functional |
π’ Functional |
/ |
Beam search |
π’ Functional |
π’ Functional |
/ |
Guided Decoding |
π’ Functional |
π’ Functional |
/ |
Tensor Parallel |
π’ Functional |
β‘Optimized |
/ |
Pipeline Parallel |
π’ Functional |
β‘Optimized |
/ |
Expert Parallel |
π‘ Planned |
π‘ Planned |
Will support in v0.8.x |
Data Parallel |
π‘ Planned |
π‘ Planned |
Will support in v0.8.x |
Prefill Decode Disaggregation |
π‘ Planned |
π‘ Planned |
Will support in v0.8.x |
Quantization |
π‘ Planned |
π’ Functional |
Will support in v0.8.x |
Graph Mode |
π‘ Planned |
π‘ Planned |
Will support in v0.8.x |
Sleep Mode |
π’ Functional |
π’ Functional |
[Usage Limits]#733 |
MTP |
π’ Functional |
π’ Functional |
[Usage Limits]#734 |
Custom Scheduler |
π’ Functional |
π’ Functional |
[Usage Limits]#788 |
MindIE Turbo is an LLM inference engine acceleration plug-in library on Ascend hardware. Find more information here.
π’ Functional: Fully operational, with ongoing optimizations.
π΅ Experimental: Experimental support, interfaces and functions may change.
π‘ Planned: Scheduled for future implementation (some may have open PRs/RFCs).