Skip to main content
Back to top
Ctrl
+
K
You are viewing the latest stable docs.
Search
Ctrl
+
K
Getting Started
Quickstart
Installation
Tutorials
Qwen2.5-Omni-7B
Qwen2.5-7B
Qwen3-Dense(Qwen3-0.6B/8B/32B)
Qwen-VL-Dense(Qwen2.5VL-3B/7B, Qwen3-VL-2B/4B/8B/32B)
Qwen3-30B-A3B
Qwen3-235B-A22B
Qwen3-VL-235B-A22B-Instruct
Qwen3-Coder-30B-A3B
Qwen3-Embedding
Qwen3-Reranker
Qwen3-8B-W4A8
Qwen3-32B-W4A4
Qwen3-Next
Qwen3-Omni-30B-A3B-Thinking
DeepSeek-V3/3.1
DeepSeek-V3.2
DeepSeek-R1
DeepSeek-V4
GLM-4.5/4.6/4.7
Kimi-K2-Thinking
PaddleOCR-VL
PD-Colocated with Mooncake Multi-Instance
Prefill-Decode Disaggregation (Qwen2.5-VL)
Prefill-Decode Disaggregation (Deepseek)
Long-Sequence Context Parallel (Qwen3-235B-A22B)
Long-Sequence Context Parallel (Deepseek)
Ray Distributed (Qwen3-235B-A22B)
Atlas 300I
FAQs
User Guide
Features and Models
Supported Models
Supported Features
Configuration Guide
Environment Variables
Additional Configuration
Feature Guide
Graph Mode Guide
Quantization Guide
Sleep Mode Guide
Structured Output Guide
LoRA Adapters Guide
Expert Load Balance (EPLB)
Netloader Guide
Multi Token Prediction (MTP)
Dynamic Batch
Ascend Store Deployment Guide
External DP
Distributed DP Server With Large Scale Expert Parallelism
UCM-Enhanced Prefix Caching Deployment Guide
Fine-Grained Tensor Parallelism (Finegrained TP)
Layer Sharding Linear Guide
Speculative Decoding Guide
Context Parallel Guide
Deployment Guide
Using Volcano Kthena
Release Notes
Developer Guide
Contributing
Testing
Multi Node Test
Feature Guide
Patch in vLLM Ascend
Prepare inputs for model forwarding
Disaggregated-prefill
Expert Parallelism Load Balancer (EPLB)
ACL Graph
KV Cache Pool
Adding a custom aclnn operation
Context Parallel (CP)
Quantization Adaptation Guide
Accuracy
Using EvalScope
Using lm-eval
Using AISBench
Using OpenCompass
Performance and Debug
Performance Benchmark
Profile Execute Duration
Optimization and Tuning
Service Profiling Guide
MSProbe Debugging Guide
Community
Governance
Maintainers and Contributors
Versioning Policy
User Stories
LLaMA-Factory
Index