Skip to content

Contributing to vLLM Metal

Thanks for your interest in contributing! This plugin targets Apple Silicon Macs only — you'll need an M-series Mac running macOS to build, test, and run it.

Development setup

git clone https://github.com/vllm-project/vllm-metal.git
cd vllm-metal

# Creates ./.venv-vllm-metal/, installs vLLM core + the plugin, and
# prebuilds the native Metal kernels from your checkout
./install.sh

# Activate the virtualenv
source .venv-vllm-metal/bin/activate

# Install dev dependencies (pytest, ruff, mypy, ...)
pip install -e ".[dev]"

Editing the Metal kernels

Release wheels ship the native paged-attention extension and its Metal shader libraries prebuilt, so end users never compile them. To edit the kernels — the .metal shaders or paged_ops.cpp — and run from your local source, set:

VLLM_METAL_BUILD_FROM_SOURCE=1 vllm serve ...   # or: pytest, your script, etc.

In this mode the C++ extension is recompiled when its inputs change (the build is hash-checked, so unchanged sources are skipped) and the shaders are compiled in-process by MLX from the .metal source at runtime. There is no manual .metallib rebuild step: edit a kernel, restart the Python process, and the change is picked up.

Requirements:

  • Xcode Command Line Tools (xcode-select --install) — clang++ rebuilds the .so; keep it current enough for the pinned MLX headers.
  • No Metal toolchain needed: MLX compiles the .metal shaders in-process.

Without VLLM_METAL_BUILD_FROM_SOURCE, the prebuilt artifacts are loaded as-is. If you edited a kernel source after building them locally, loading fails loudly on the stale-hash mismatch rather than silently running the old kernel — set the variable, or rerun python -m vllm_metal.metal.build to refresh the prebuilt artifacts. (A plain wheel install ships no hash stamps, so end users never hit this.)

Run lint locally

Mirrors the lint job in CI (ruff, ruff format --check, mypy, shellcheck):

scripts/lint.sh

Run CI locally

Mirrors the test job in CI — serving smoke tests plus the non-slow pytest suite:

scripts/test.sh

For a faster inner loop while iterating, run pytest directly:

pytest -m "not slow" tests/ -v --tb=short

🎉 Congratulations! You have completed the development environment setup.


Before you open the PR

Two conditional checks apply depending on what your PR touches:

If your PR adds or modifies a model, include a deterministic test that asserts the generated tokens match the mlx_lm reference under greedy sampling (temperature=0). See tools/gen_golden_token_ids_for_deterministics.py for how to generate golden token IDs for a new model.

If your PR claims a performance improvement, attach before/after benchmark results. For example, using vllm bench serve with the sonnet dataset:

curl -O https://raw.githubusercontent.com/vllm-project/vllm/main/benchmarks/sonnet.txt

# 1. Start the server
VLLM_METAL_USE_PAGED_ATTENTION=1 VLLM_METAL_MEMORY_FRACTION=0.8 \
  vllm serve Qwen/Qwen3-0.6B --port 8000 --max-model-len 2048

# 2. Run the benchmark
vllm bench serve \
  --backend openai \
  --base-url http://localhost:8000 \
  --model Qwen/Qwen3-0.6B \
  --dataset-name sonnet \
  --dataset-path sonnet.txt \
  --num-prompts 100 \
  --request-rate inf \
  --percentile-metrics ttft,tpot,e2el \
  --metric-percentiles 50,99

Developer Certificate of Origin (DCO)

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

Submit your changes

  1. Fork the repository on GitHub.

  2. Re-point origin to your fork and add upstream:

git remote set-url origin https://github.com/<your-username>/vllm-metal.git
git remote add upstream https://github.com/vllm-project/vllm-metal.git
  1. Create a feature branch:
git checkout -b my-feature
  1. Commit your changes using -s (adds the DCO sign-off automatically):
git commit -sm "your commit info"
  1. Push to your fork:
git push -u origin my-feature
  1. Open a pull request against main in the upstream repository.