Feature Guide#
This section provides a detailed usage guide of vLLM Ascend features.
Feature Guide
- Graph Mode Guide
- CPU Binding
- AI QoS Feature
- Quantization Guide
- Sleep Mode Guide
- Structured Output Guide
- LoRA Adapters Guide
- Expert Load Balance (EPLB)
- Netloader Guide
- RFork Guide
- Multi Token Prediction (MTP)
- Dynamic Batch
- Disaggregated-encoder
- Ascend Store Deployment Guide
- KV Cache CPU Offload Guide
- External DP
- Distributed DP Server With Large-Scale Expert Parallelism
- UCM Store Deployment Guide
- Fine-Grained Tensor Parallelism (Fine-grained TP)
- Layer Sharding Linear Guide
- Speculative Decoding Guide
- Context Parallel Guide
- Weight Prefetch Guide
- Sequence Parallelism
- Batch Invariance
- LMCache-Ascend Deployment Guide
- Dynamic Chunked Pipeline Parallel
- Flash Attention 3