图模式指南

图模式指南#

备注

此功能目前为实验性功能。在未来的版本中，配置、覆盖率和性能改进等方面的行为可能会有变化。

本指南提供了在 vLLM Ascend 上使用 Ascend 图模式的操作说明。请注意，图模式仅在 V1 引擎上可用，并且从 0.9.0rc1 起，仅对 Qwen、DeepSeek 系列模型进行了充分测试。我们将在下一个版本中使其更加稳定和通用。

快速入门#

从 v0.9.1rc1 版本起，使用 V1 引擎时，vLLM Ascend 默认将在图模式下运行模型，以保持与 vLLM 同样的行为。如果遇到任何问题，欢迎在 GitHub 上提交 issue，并在初始化模型时通过设置 enforce_eager=True 临时切换回 eager 模式。

vLLM Ascend 支持两种图模式：

ACLGraph：这是 vLLM Ascend 支持的默认图模式。在 v0.9.1rc1 版本中，只有 Qwen 系列模型得到了充分测试。
TorchAirGraph：这是GE图模式。在v0.9.1rc1版本中，仅支持DeepSeek系列模型。

备注

ACLGraph 默认会提升所有模型的性能。TorchAirGraph 将在下一版本中弃用，v0.11.0 将是支持 TorchAirGraph 的最后一个稳定版本。

使用 ACLGraph#

ACLGraph 默认启用。以 Qwen 系列模型为例，只需设置为使用 V1 引擎即可。

离线示例：

import os

from vllm import LLM

model = LLM(model="Qwen/Qwen2-7B-Instruct")
outputs = model.generate("Hello, how are you?")

在线示例：

vllm serve Qwen/Qwen2-7B-Instruct

使用 TorchAirGraph#

如果你想通过图模式运行 DeepSeek 系列模型，你应该使用 TorchAirGraph。在这种情况下，需要额外的配置。

离线示例：

import os
from vllm import LLM

# TorchAirGraph is only work without chunked-prefill now
model = LLM(model="deepseek-ai/DeepSeek-R1-0528", additional_config={"torchair_graph_config": {"enabled": True},"ascend_scheduler_config": {"enabled": True,}})
outputs = model.generate("Hello, how are you?")

在线示例：

vllm serve Qwen/Qwen2-7B-Instruct --additional-config='{"torchair_graph_config": {"enabled": true},"ascend_scheduler_config": {"enabled": true,}}'

你可以在这里找到关于附加配置的更多详细信息。

回退到 Eager 模式#

如果 ACLGraph 和 TorchAirGraph 都无法运行，你应该退回到 eager 模式。

离线示例：

import os
from vllm import LLM

model = LLM(model="someother_model_weight", enforce_eager=True)
outputs = model.generate("Hello, how are you?")

在线示例：

vllm serve Qwen/Qwen2-7B-Instruct --enforce-eager

图模式指南

目录

图模式指南#

快速入门#

使用 ACLGraph#

使用 TorchAirGraph#

回退到 Eager 模式#