多节点测试#
多节点 CI 旨在测试超大规模模型的分布式场景,例如:跨多个节点的解耦式预填充(disaggregated_prefill)多数据并行(DP)等。
工作原理#
下图展示了多节点 CI 机制的基本部署视图,说明了 GitHub Action 如何与 lws(一种 Kubernetes CRD 资源)交互。

从工作流的角度,我们可以看到最终测试脚本是如何执行的。关键在于 lws.yaml 和 run.sh 这两个文件。前者定义了如何拉起我们的 k8s 集群,后者定义了 Pod 启动时的入口脚本。每个节点根据 LWS_WORKER_INDEX 环境变量执行不同的逻辑,从而使多个节点能够组成分布式集群来执行任务。

如何贡献#
上传自定义权重
如果您需要自定义权重,例如为 DeepSeek-V3 量化了 w8a8 权重并希望在 CI 上运行,欢迎将权重上传至 ModelScope 的 vllm-ascend 组织。如果您没有上传权限,请联系 @Potabk。
添加配置文件
如入口脚本 run.sh 所示,k8s Pod 启动时会遍历 配置目录 中的所有 *.yaml 文件,并根据不同的配置读取和执行。因此,我们需要做的就是添加像 DeepSeek-V3.yaml 这样的配置文件。
假设您有 2个节点 运行 1P1D 配置(1个预填充器 + 1个解码器):
您可以添加一个类似这样的配置文件:
test_name: "test DeepSeek-V3 disaggregated_prefill" # the model being tested model: "vllm-ascend/DeepSeek-V3-W8A8" # how large the cluster is num_nodes: 2 npu_per_node: 16 # All env vars you need should add it here env_common: VLLM_USE_MODELSCOPE: true OMP_PROC_BIND: false OMP_NUM_THREADS: 100 HCCL_BUFFSIZE: 1024 SERVER_PORT: 8080 disaggregated_prefill: enabled: true # node index(a list) which meet all the conditions: # - prefiller # - no headless(have api server) prefiller_host_index: [0] # node index(a list) which meet all the conditions: # - decoder decoder_host_index: [1] # Add each node's vllm serve cli command just like you run locally # Add each node's individual envs like follow deployment: - envs: # fill with envs like: <key>:<value> server_cmd: > vllm serve ... - envs: # fill with envs like: <key>:<value> server_cmd: > vllm serve ... benchmarks: perf: # fill with performance test kwargs acc: # fill with accuracy test kwargs
将测试用例添加到夜间工作流中。当前多节点测试工作流定义在 nightly_test_a3.yaml 中。
multi-node-tests: name: multi-node if: always() && (github.event_name == 'schedule' || github.event_name == 'workflow_dispatch') strategy: fail-fast: false max-parallel: 1 matrix: test_config: - name: multi-node-deepseek-pd config_file_path: DeepSeek-V3.yaml size: 2 - name: multi-node-qwen3-dp config_file_path: Qwen3-235B-A22B.yaml size: 2 - name: multi-node-qwenw8a8-2node config_file_path: Qwen3-235B-W8A8.yaml size: 2 - name: multi-node-qwenw8a8-2node-eplb config_file_path: Qwen3-235B-W8A8-EPLB.yaml size: 2 uses: ./.github/workflows/_e2e_nightly_multi_node.yaml with: soc_version: a3 runner: linux-aarch64-a3-0 image: 'swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/vllm-ascend:nightly-a3' replicas: 1 size: ${{ matrix.test_config.size }} config_file_path: ${{ matrix.test_config.config_file_path }} secrets: KUBECONFIG_B64: ${{ secrets.KUBECONFIG_B64 }}
上方的矩阵定义了添加一个多机器用例所需的所有参数。值得关注的参数(如果您要添加一个新用例)是 size 和 YAML 配置文件的路径。前者定义了您的用例所需的节点数量,后者定义了您在步骤2中完成的配置文件的路径。
本地运行多节点测试#
1.使用 Kubernetes#
本节假定您本地已有一个 Kubernetes NPU 集群环境。这样您就可以轻松一键启动我们的测试。
步骤 1.安装 LWS CRD 资源
参考 https://lws.sigs.k8s.io/docs/installation/
步骤 2.按需部署以下
lws.yaml文件apiVersion: leaderworkerset.x-k8s.io/v1 kind: LeaderWorkerSet metadata: name: test-server namespace: vllm-project spec: replicas: 1 leaderWorkerTemplate: size: 2 restartPolicy: None leaderTemplate: metadata: labels: role: leader spec: containers: - name: vllm-leader imagePullPolicy: Always image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/vllm-ascend:nightly-a3 env: - name: CONFIG_YAML_PATH value: DeepSeek-V3.yaml - name: WORKSPACE value: "/vllm-workspace" - name: FAIL_TAG value: FAIL_TAG command: - sh - -c - | bash /vllm-workspace/vllm-ascend/tests/e2e/nightly/multi_node/scripts/run.sh resources: limits: huawei.com/ascend-1980: 16 memory: 512Gi ephemeral-storage: 100Gi requests: huawei.com/ascend-1980: 16 memory: 512Gi ephemeral-storage: 100Gi cpu: 125 ports: - containerPort: 8080 # readinessProbe: # tcpSocket: # port: 8080 # initialDelaySeconds: 15 # periodSeconds: 10 volumeMounts: - mountPath: /root/.cache name: shared-volume - mountPath: /usr/local/Ascend/driver/tools name: driver-tools - mountPath: /dev/shm name: dshm volumes: - name: dshm emptyDir: medium: Memory sizeLimit: 15Gi - name: shared-volume persistentVolumeClaim: claimName: nv-action-vllm-benchmarks-v2 - name: driver-tools hostPath: path: /usr/local/Ascend/driver/tools workerTemplate: spec: containers: - name: vllm-worker imagePullPolicy: Always image: swr.cn-southwest-2.myhuaweicloud.com/base_image/ascend-ci/vllm-ascend:nightly-a3 env: - name: CONFIG_YAML_PATH value: DeepSeek-V3.yaml - name: WORKSPACE value: "/vllm-workspace" - name: FAIL_TAG value: FAIL_TAG command: - sh - -c - | bash /vllm-workspace/vllm-ascend/tests/e2e/nightly/multi_node/scripts/run.sh resources: limits: huawei.com/ascend-1980: 16 memory: 512Gi ephemeral-storage: 100Gi requests: huawei.com/ascend-1980: 16 ephemeral-storage: 100Gi cpu: 125 volumeMounts: - mountPath: /root/.cache name: shared-volume - mountPath: /usr/local/Ascend/driver/tools name: driver-tools - mountPath: /dev/shm name: dshm volumes: - name: dshm emptyDir: medium: Memory sizeLimit: 15Gi - name: shared-volume persistentVolumeClaim: claimName: nv-action-vllm-benchmarks-v2 - name: driver-tools hostPath: path: /usr/local/Ascend/driver/tools --- apiVersion: v1 kind: Service metadata: name: vllm-leader namespace: vllm-project spec: ports: - name: http port: 8080 protocol: TCP targetPort: 8080 selector: leaderworkerset.sigs.k8s.io/name: vllm role: leader type: ClusterIP
kubectl apply -f lws.yaml
验证 Pod 状态:
kubectl get pods -n vllm-project
应该会得到类似以下的输出:
NAME READY STATUS RESTARTS AGE vllm-0 1/1 Running 0 2s vllm-0-1 1/1 Running 0 2s
验证分布式推理是否正常工作:
kubectl logs -f vllm-0 -n vllm-project
应该会得到类似以下的结果:
INFO 12-30 11:00:57 [__init__.py:43] Available plugins for group vllm.platform_plugins: INFO 12-30 11:00:57 [__init__.py:45] - ascend -> vllm_ascend:register INFO 12-30 11:00:57 [__init__.py:48] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 12-30 11:00:57 [__init__.py:217] Platform plugin ascend is activated INFO 12-30 11:00:57 [importing.py:68] Triton not installed or not compatible; certain GPU-related functions will not be available. ================================================================================================== test session starts =================================================================================================== platform linux -- Python 3.11.13, pytest-8.4.2, pluggy-1.6.0 -- /usr/local/python3.11.13/bin/python3 cachedir: .pytest_cache rootdir: /vllm-workspace/vllm-ascend configfile: pyproject.toml plugins: cov-7.0.0, asyncio-1.3.0, mock-3.15.1, anyio-4.12.0 asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 1 item tests/e2e/nightly/multi_node/scripts/test_multi_node.py::test_multi_node [2025-12-30 11:01:01] INFO multi_node_config.py:294: Loading config yaml: tests/e2e/nightly/multi_node/config/DeepSeek-V3.yaml [2025-12-30 11:01:01] INFO multi_node_config.py:348: Resolving cluster IPs via DNS... [2025-12-30 11:01:01] INFO multi_node_config.py:212: Node 0 envs: {'VLLM_USE_MODELSCOPE': 'True', 'OMP_PROC_BIND': 'False', 'OMP_NUM_THREADS': '100', 'HCCL_BUFFSIZE': '1024', 'SERVER_PORT': '8080', 'NUMEXPR_MAX_THREADS': '128', 'DISAGGREGATED_PREFILL_PROXY_SCRIPT': 'examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py', 'HCCL_IF_IP': '10.0.0.102', 'HCCL_SOCKET_IFNAME': 'eth0', 'GLOO_SOCKET_IFNAME': 'eth0', 'TP_SOCKET_IFNAME': 'eth0', 'LOCAL_IP': '10.0.0.102', 'NIC_NAME': 'eth0', 'MASTER_IP': '10.0.0.102'} [2025-12-30 11:01:01] INFO multi_node_config.py:159: Launching proxy: python examples/disaggregated_prefill_v1/load_balance_proxy_server_example.py --host 10.0.0.102 --port 6000 --prefiller-hosts 10.0.0.102 --prefiller-ports 8080 --decoder-hosts 10.0.0.138 --decoder-ports 8080 [2025-12-30 11:01:01] INFO conftest.py:107: Starting server with command: vllm serve vllm-ascend/DeepSeek-V3-W8A8 --host 0.0.0.0 --port 8080 --data-parallel-size 2 --data-parallel-size-local 2 --tensor-parallel-size 8 --seed 1024 --enforce-eager --enable-expert-parallel --max-num-seqs 16 --max-model-len 8192 --max-num-batched-tokens 8192 --quantization ascend --trust-remote-code --no-enable-prefix-caching --gpu-memory-utilization 0.9 --kv-transfer-config {"kv_connector": "MooncakeConnectorV1", "kv_role": "kv_producer", "kv_port": "30000", "engine_id": "0", "kv_connector_module_path": "vllm_ascend.distributed.mooncake_connector", "kv_connector_extra_config": { "prefill": { "dp_size": 2, "tp_size": 8 }, "decode": { "dp_size": 2, "tp_size": 8 } } }
2.不使用 Kubernetes 进行测试#
由于我们的脚本是 Kubernetes 友好的,如果您没有 Kubernetes 环境,则需要主动传入一些集群信息。
步骤 1.在配置文件中添加 cluster_hosts
在每个集群节点上进行修改,参照 DeepSeek-V3.yaml,在配置项
num_nodes之后添加,例如:cluster_hosts: ["xxx.xxx.xxx.188", "xxx.xxx.xxx.212"]步骤 2.安装开发环境
在每个集群节点上安装 vllm-ascend 开发包
cd /vllm-workspace/vllm-ascend python3 -m pip install -r requirements-dev.txt
在 cluster_hosts 列表中的第一个主机(主节点)上安装 AISBench
export AIS_BENCH_TAG="v3.0-20250930-master" export AIS_BENCH_URL="https://gitee.com/aisbench/benchmark.git" export BENCHMARK_HOME=/vllm-workspace/benchmark git clone -b ${AIS_BENCH_TAG} --depth 1 ${AIS_BENCH_URL} $BENCHMARK_HOME cd $BENCHMARK_HOME pip install -e . -r requirements/api.txt -r requirements/extra.txt ```
步骤 3.本地运行测试
分别在每个节点上 运行脚本
export WORKSPACE=/vllm-workspace # Change it to your path locally export CONFIG_YAML_PATH="DeepSeek-V3.yaml" # Replace with the config case you added cd $WORKSPACE/vllm-ascend bash tests/e2e/nightly/multi_node/scripts/run.sh