Supported Models#
Get the latest info here: vllm-project/vllm-ascend#1608
Legend Description:
✅ = Supported model/feature
🔵 = Experimental supported model/feature
❌ = Not supported model/feature
🟡 = Not tested or verified
Text-Only Language Models#
Generative Models#
Core Supported Models#
Model |
Support |
Note |
BF16 |
Supported Hardware |
W8A8 |
Chunked Prefill |
Automatic Prefix Cache |
LoRA |
Speculative Decoding |
Async Scheduling |
Tensor Parallel |
Pipeline Parallel |
Expert Parallel |
Data Parallel |
Prefill-decode Disaggregation |
Piecewise AclGraph |
Fullgraph AclGraph |
max-model-len |
MLP Weight Prefetch |
Doc |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
DeepSeek V4-Flash |
✅ |
✅ |
A2/A3 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
1M |
||||||
DeepSeek V4-Pro |
✅ |
✅ |
A2/A3 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
1M |
||||||
DeepSeek V3/3.1 |
✅ |
✅ |
A2/A3 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
240k |
|||||
DeepSeek V3.2 |
🔵 |
✅ |
A2/A3 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
160k |
✅ |
||
DeepSeek R1 |
✅ |
✅ |
A2/A3 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
128k |
|||||
Qwen3 |
✅ |
✅ |
A2/A3 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
128k |
✅ |
|||||||
Qwen3-Coder |
✅ |
✅ |
A2/A3 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
||||||||||
Qwen3-Moe |
✅ |
✅ |
A2/A3 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
256k |
||||||
Qwen3-Next |
🔵 |
✅ |
A2/A3 |
✅ |
✅ |
✅ |
✅ |
✅ |
||||||||||||
GLM-4.x |
🔵 |
A2/A3 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
198k |
||||||
GLM-5 |
🔵 |
✅ |
A2/A3 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
198k |
||||
Kimi-K2-Thinking |
🔵 |
A2/A3 |
||||||||||||||||||
DeepseekOCR2 |
✅ |
✅ |
A2/A3 |
✅ |
✅ |
|||||||||||||||
MiniMax-M2.5 |
✅ |
✅ |
A2/A3 |
✅ |
✅ |
✅ |
❌ |
✅ |
✅ |
✅ |
🟡 |
✅ |
✅ |
✅ |
✅ |
✅ |
192k |
🟡 |
||
Qwen2.5-Math-RM-72B |
✅ |
vllm-rm, tensor_parallel_size=4, max_model_len=4096 |
✅ |
A2 |
✅ |
🟡 |
🟡 |
❌ |
🟡 |
✅ |
✅ |
🟡 |
🟡 |
🟡 |
🟡 |
🟡 |
🟡 |
4096 |
🟡 |
Extended Compatible Models#
Model |
Support |
Note |
Supported Hardware |
|---|---|---|---|
DeepSeek Distill (Qwen/Llama) |
✅ |
A2/A3 |
|
Qwen3-based |
✅ |
A2/A3 |
|
Qwen2 |
✅ |
A2/A3 |
|
Qwen2.5 |
✅ |
A2/A3 |
|
Qwen2-based |
✅ |
A2/A3 |
|
QwQ-32B |
✅ |
A2/A3 |
|
Llama2/3/3.1/3.2 |
✅ |
A2/A3 |
|
Internlm |
🔵 |
A2/A3 |
|
Baichuan |
🔵 |
A2/A3 |
|
Baichuan2 |
🔵 |
A2/A3 |
|
Phi-4-mini |
🔵 |
A2/A3 |
|
MiniCPM |
🔵 |
A2/A3 |
|
MiniCPM3 |
🔵 |
A2/A3 |
|
Ernie4.5 |
🔵 |
A2/A3 |
|
Ernie4.5-Moe |
🔵 |
A2/A3 |
|
Gemma-2 |
🔵 |
A2/A3 |
|
Gemma-3 |
🔵 |
A2/A3 |
|
Phi-3/4 |
🔵 |
A2/A3 |
|
Mistral/Mistral-Instruct |
🔵 |
A2/A3 |
|
Hy3-preview |
🔵 |
A3 |
|
DeepSeek V2.5 |
🟡 |
Need test |
|
Mllama |
🟡 |
Need test |
|
MiniMax-Text |
🟡 |
Need test |
Pooling Models#
Model |
Support |
Note |
Supported Hardware |
Doc |
|---|---|---|---|---|
Qwen3-Embedding |
🔵 |
A2/A3 |
||
Qwen3-VL-Embedding |
🔵 |
A2/A3 |
||
Qwen3-Reranker |
🔵 |
A2/A3 |
||
Qwen3-VL-Reranker |
🔵 |
A2/A3 |
||
Molmo |
🔵 |
A2/A3 |
||
XLM-RoBERTa-based |
🔵 |
A2/A3 |
||
Bert |
🔵 |
A2/A3 |
||
Qwen2.5-Math-RM-72B |
✅ |
Reward Model, gsm8k_correctness accuracy=0.80 |
A2 |
Multimodal Language Models#
Generative Models#
Core Supported Models#
Model |
Support |
Note |
BF16 |
Supported Hardware |
W8A8 |
Chunked Prefill |
Automatic Prefix Cache |
LoRA |
Speculative Decoding |
Async Scheduling |
Tensor Parallel |
Pipeline Parallel |
Expert Parallel |
Data Parallel |
Prefill-decode Disaggregation |
Piecewise AclGraph |
Fullgraph AclGraph |
max-model-len |
MLP Weight Prefetch |
Doc |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Qwen3-VL |
✅ |
A2/A3 |
✅ |
✅ |
✅ |
|||||||||||||||
Qwen3-VL-MOE |
✅ |
✅ |
A2/A3 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
256k |
||||||
Qwen3.5-397B-A17B |
✅ |
✅ |
A2/A3 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
1010000 |
|||||
Qwen3.5-27B |
✅ |
✅ |
A2/A3 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
1010000 |
|||||
Qwen3-Omni-30B-A3B-Thinking |
🔵 |
A2/A3 |
✅ |
✅ |
Extended Compatible Models#
Model |
Support |
Note |
Supported Hardware |
|---|---|---|---|
Qwen2-VL |
✅ |
A2/A3 |
|
Qwen3-Omni |
🔵 |
A2/A3 |
|
QVQ |
🔵 |
A2/A3 |
|
Qwen2-Audio |
🔵 |
A2/A3 |
|
Aria |
🔵 |
A2/A3 |
|
LLaVA-Next |
🔵 |
A2/A3 |
|
LLaVA-Next-Video |
🔵 |
A2/A3 |
|
MiniCPM-V |
🔵 |
A2/A3 |
|
Mistral3 |
🔵 |
A2/A3 |
|
Phi-3-Vision/Phi-3.5-Vision |
🔵 |
A2/A3 |
|
Gemma3 |
🔵 |
A2/A3 |
|
Llama3.2 |
🔵 |
A2/A3 |
|
PaddleOCR-VL |
🔵 |
A2/A3 |
|
Llama4 |
❌ |
||
Keye-VL-8B-Preview |
❌ |
||
Florence-2 |
❌ |
||
GLM-4V |
❌ |
||
InternVL2.0/2.5/3.0 |
❌ |
||
Whisper |
❌ |
||
Ultravox |
🟡 |
Need test |