Skip to main content

Ctrl+K

You are viewing the latest developer preview docs. Click here to view docs for the latest stable release(v0.18.0).

Getting Started

Quickstart
Installation
Model Tutorials
Feature Tutorials
Hardware Tutorials
- Atlas 300I DUO
FAQs

User Guide

Features and Models
Configuration Guide
- Environment Variables
- Additional Configuration
Feature Guide
Deployment Guide
- Using Volcano Kthena
Release Notes

Developer Guide

Contributing
Design Documents
Accuracy
Performance and Debug

Community

Governance
Committers and Contributors
Issue Workflow Guidelines
Versioning Policy
User Stories
- LLaMA-Factory

Repository
Suggest edit

.md

Feature Guide

Feature Guide#

This section provides a detailed usage guide of vLLM Ascend features.

Feature Guide

Graph Mode Guide
CPU Binding
AI QoS Feature
Quantization Guide
Sleep Mode Guide
Structured Output Guide
LoRA Adapters Guide
Expert Load Balance (EPLB)
Netloader Guide
RFork Guide
Multi Token Prediction (MTP)
Dynamic Batch
Disaggregated-encoder
Ascend Store Deployment Guide
KV Cache CPU Offload Guide
External DP
Distributed DP Server With Large-Scale Expert Parallelism
UCM Store Deployment Guide
Fine-Grained Tensor Parallelism (Fine-grained TP)
Layer Sharding Linear Guide
Speculative Decoding Guide
Context Parallel Guide
Weight Prefetch Guide
Sequence Parallelism
Batch Invariance
LMCache-Ascend Deployment Guide
Dynamic Chunked Pipeline Parallel
Flash Attention 3

previous

Additional Configuration

next

Graph Mode Guide

By the vllm-ascend team

© Copyright 2025, vllm-ascend team.