Skip to main content
Ctrl+K

You are viewing the latest stable docs.

vllm-ascend - Home vllm-ascend - Home

Getting Started

  • Quickstart
  • Installation
  • Tutorials
    • Qwen2.5-Omni-7B
    • Qwen2.5-7B
    • Qwen3-Dense(Qwen3-0.6B/8B/32B)
    • Qwen-VL-Dense(Qwen2.5VL-3B/7B, Qwen3-VL-2B/4B/8B/32B)
    • Qwen3-30B-A3B
    • Qwen3-235B-A22B
    • Qwen3-VL-235B-A22B-Instruct
    • Qwen3-Coder-30B-A3B
    • Qwen3-Embedding
    • Qwen3-Reranker
    • Qwen3-8B-W4A8
    • Qwen3-32B-W4A4
    • Qwen3-Next
    • Qwen3-Omni-30B-A3B-Thinking
    • DeepSeek-V3/3.1
    • DeepSeek-V3.2
    • DeepSeek-R1
    • DeepSeek-V4
    • GLM-4.5/4.6/4.7
    • Kimi-K2-Thinking
    • PaddleOCR-VL
    • PD-Colocated with Mooncake Multi-Instance
    • Prefill-Decode Disaggregation (Qwen2.5-VL)
    • Prefill-Decode Disaggregation (Deepseek)
    • Long-Sequence Context Parallel (Qwen3-235B-A22B)
    • Long-Sequence Context Parallel (Deepseek)
    • Ray Distributed (Qwen3-235B-A22B)
    • Atlas 300I
  • FAQs

User Guide

  • Features and Models
    • Supported Models
    • Supported Features
  • Configuration Guide
    • Environment Variables
    • Additional Configuration
  • Feature Guide
    • Graph Mode Guide
    • Quantization Guide
    • Sleep Mode Guide
    • Structured Output Guide
    • LoRA Adapters Guide
    • Expert Load Balance (EPLB)
    • Netloader Guide
    • Multi Token Prediction (MTP)
    • Dynamic Batch
    • Ascend Store Deployment Guide
    • External DP
    • Distributed DP Server With Large Scale Expert Parallelism
    • UCM-Enhanced Prefix Caching Deployment Guide
    • Fine-Grained Tensor Parallelism (Finegrained TP)
    • Layer Sharding Linear Guide
    • Speculative Decoding Guide
    • Context Parallel Guide
  • Deployment Guide
    • Using Volcano Kthena
  • Release Notes

Developer Guide

  • Contributing
    • Testing
    • Multi Node Test
  • Feature Guide
    • Patch in vLLM Ascend
    • Prepare inputs for model forwarding
    • Disaggregated-prefill
    • Expert Parallelism Load Balancer (EPLB)
    • ACL Graph
    • KV Cache Pool
    • Adding a custom aclnn operation
    • Context Parallel (CP)
    • Quantization Adaptation Guide
  • Accuracy
    • Using EvalScope
    • Using lm-eval
    • Using AISBench
    • Using OpenCompass
  • Performance and Debug
    • Performance Benchmark
    • Profile Execute Duration
    • Optimization and Tuning
    • Service Profiling Guide
    • MSProbe Debugging Guide

Community

  • Governance
  • Maintainers and Contributors
  • Versioning Policy
  • User Stories
    • LLaMA-Factory
  • Repository
  • Suggest edit
  • .md

Accuracy

Accuracy#

Accuracy

  • Using EvalScope
  • Using lm-eval
  • Using AISBench
  • Using OpenCompass

previous

Quantization Adaptation Guide

next

Using EvalScope

By the vllm-ascend team

© Copyright 2025, vllm-ascend team.