Skip to main content
Back to top
Ctrl
+
K
You are viewing the latest stable docs.
Getting Started
Quickstart
Installation
Tutorials
Single NPU (Qwen3 8B)
Single NPU (Qwen2.5-VL 7B)
Single NPU (Qwen2-Audio 7B)
Single NPU (Qwen3-Embedding-8B)
Single-NPU (Qwen3 8B W4A8)
Prefill-Decode Disaggregation Llmdatadist Verification (Qwen2.5-VL)
Multi-NPU (Qwen3-Next)
Multi-NPU (QwQ 32B)
Multi-NPU (Pangu Pro MoE)
Multi-NPU (Qwen3-30B-A3B)
Multi-NPU (QwQ 32B W8A8)
Single Node (Atlas 300I Series)
Multi-Node (DeepSeek V3.2)
Multi-Node-DP (DeepSeek)
Multi-Node-DP (Kimi-K2)
Multi-Node-DP (Qwen3-VL-235B-A22B)
Prefill-Decode Disaggregation Llmdatadist Verification (Qwen)
Prefill-Decode Disaggregation Mooncake Verification (Qwen)
Multi-Node-Ray (Qwen/Qwen3-235B-A22B)
FAQs
User Guide
Features and Models
Supported Models
Supported Features
Configuration Guide
Environment Variables
Additional Configuration
Feature Guide
Graph Mode Guide
Quantization Guide
Sleep Mode Guide
Structured Output Guide
LoRA Adapters Guide
Expert Load Balance (EPLB)
Mooncacke Store Deployment Guide
Release Notes
Developer Guide
Contributing
Testing
Feature Guide
Patch in vLLM Ascend
Prepare inputs for model forwarding
Disaggregated-prefill
Expert Parallelism Load Balancer (EPLB)
Multi Token Prediction (MTP)
ACL Graph
KV Cache Pool
Adding a custom aclnn operation
Accuracy
Using EvalScope
Using lm-eval
Using OpenCompass
Accuracy Report
deepseek-ai/DeepSeek-V2-Lite
Qwen/Qwen2.5-VL-7B-Instruct
Qwen/Qwen3-30B-A3B
Qwen/Qwen3-8B-Base
Performance
Performance Benchmark
Profile Execute Duration
Optimization and Tuning
Modeling
Adding a New Model
Adding a New Multimodal Model
Community
Governance
Maintainers and contributors
Versioning Policy
User stories
LLaMA-Factory
Index