Documentation writing guide

Documentation writing guide#

Guide to Writing Model Tutorial Documentation#

docs/source/_templates/Model-Deployment-Tutorial-Template.md is a template for writing model deployment tutorials. You can copy and modify it to create new docs.

Testable documentation code block generation (`model-code`)#

For documentation authors: how to insert testable command blocks into docs
For developers: how to add a new converter

Built-in supported converter_tag values:

converter_tag	Renders	YAML source
`single_node`	One `vllm serve` script for a single node	`test_cases[case_index]`
`multi_node`	One host's `vllm serve` script	`deployment[host_index]`
`external_dp_template`	One external-DP node's env exports + `vllm serve` command	`templates[host_index]` (+ top-level `model`)
`external_dp_launch`	One `launch_online_dp.py` line per node	`config[]`
`external_dp_proxy`	The load-balance proxy launch command	`config[]` + `routing`

For authors: add a block#

重要

By default, the generator scans only .md files under docs/source/tutorials/models/ and produces artifacts. If you put model-code blocks in other directories, Sphinx builds will not automatically generate the corresponding scripts.

Single node (`single_node`)#

Template 1: minimal (metadata only)#

```{model-code}
:block_name: your_unique_block_name
:converter_tag: single_node
:test_case_path: tests/e2e/nightly/single_node/models/configs/your_model.yaml
```

Template 2: with text (use `{{ generated }}` placeholder)#

```{model-code}
:block_name: your_unique_block_name
:converter_tag: single_node
:test_case_path: tests/e2e/nightly/single_node/models/configs/your_model.yaml
:case_index: 0

# You can add any extra content here, e.g. code, explanations, or comments.
{{ generated }}
```

Options#

Option	Required	Default	Description
`block_name`	Yes	None	Block name; must be unique within the current document
`converter_tag`	Yes	None	Must be `single_node`
`test_case_path`	Yes	None	Repository-relative path that stays within the repo (no `..` escape); file must exist
`case_index`	No	`0`	Use `test_cases[case_index]` from the YAML as the rendering source

YAML reference#

See existing files under tests/e2e/nightly/single_node/models/configs/.

single_node reads test_cases[case_index]. Common fields include:

model: model name (ultimately renders vllm serve <model> ...)
envs: rendered as export ... (scalar values)
server_cmd: arguments appended to vllm serve <model> (shell string or token list)
server_cmd_extra (optional): extra appended arguments

Multi node (`multi_node`)#

Template 1: minimal (metadata only)#

```{model-code}
:block_name: your_unique_block_name_0
:converter_tag: multi_node
:test_case_path: tests/e2e/nightly/multi_node/config/your_model.yaml
:host_index: 0
```

```{model-code}
:block_name: your_unique_block_name_1
:converter_tag: multi_node
:test_case_path: tests/e2e/nightly/multi_node/config/your_model.yaml
:host_index: 1
```

Template 2: with text (use `{{ generated }}` placeholder)#

```{model-code}
:block_name: your_unique_block_name_0
:converter_tag: multi_node
:test_case_path: tests/e2e/nightly/multi_node/config/your_model.yaml
:host_index: 0

# You can add any extra content here, e.g. code, explanations, or comments.
{{ generated }}
```

```{model-code}
:block_name: your_unique_block_name_1
:converter_tag: multi_node
:test_case_path: tests/e2e/nightly/multi_node/config/your_model.yaml
:host_index: 1

# You can add any extra content here, e.g. code, explanations, or comments.
{{ generated }}
```

Options#

Option	Required	Default	Description
`block_name`	Yes	None	Block name; must be unique within the current document
`converter_tag`	Yes	None	Must be `multi_node`
`test_case_path`	Yes	None	Repository-relative path that stays within the repo (no `..` escape); file must exist
`host_index`	Yes	None	Use `deployment[host_index]` from the YAML as the rendering source

YAML reference#

See existing files under tests/e2e/nightly/multi_node/config/.

multi_node reads deployment[host_index]. Common fields include:

envs: rendered as export ... (scalar values)
server_cmd: a complete command (must start with vllm serve <model>; shell multi-line string or token list)

External data parallel (`external_dp_template` / `external_dp_launch` / `external_dp_proxy`)#

These three converters read one shared external-DP YAML (see tests/e2e/nightly/multi_node/external_dp/config/) and each render a different part of the deployment. They are tightly coupled to that schema by design.

The shared YAML provides:

model: model name (top level)
config: a list of per-node settings (port_start, dp_rpc_port, dp_size, dp_size_local, dp_rank_start, tp_size, dp_address, ...)
routing: type plus groups (e.g. prefiller / decoder lists of config indices)
templates: a list of per-node envs and server_cmd_template entries

server_cmd_template uses braced ${VAR} placeholders that external_dp_template rewrites to the positional shell parameters consumed by run_dp_template.sh:

`${VAR}`	Positional
`${VISIBLE_DEVICES}`	`$1`
`${PORT}`	`$2`
`${DP_SIZE}`	`$3`
`${DP_RANK}`	`$4`
`${DP_ADDRESS}`	`$5`
`${DP_RPC_PORT}`	`$6`
`${TP_SIZE}`	`$7`

Unbraced references such as $SERVER_PORT and unknown braced variables are left untouched so they remain live shell expansions.

Templates#

```{model-code}
:block_name: your_unique_block_name_prefill_node0
:converter_tag: external_dp_template
:test_case_path: tests/e2e/nightly/multi_node/external_dp/config/your_model.yaml
:host_index: 0
```

external_dp_launch (one python launch_online_dp.py ... line per config node) and external_dp_proxy (the load-balance proxy command, e.g. python load_balance_proxy_server_example.py ...) read the whole cluster, so they take no index option:

```{model-code}
:block_name: your_unique_block_name_launch
:converter_tag: external_dp_launch
:test_case_path: tests/e2e/nightly/multi_node/external_dp/config/your_model.yaml
```

```{model-code}
:block_name: your_unique_block_name_proxy
:converter_tag: external_dp_proxy
:test_case_path: tests/e2e/nightly/multi_node/external_dp/config/your_model.yaml
```

Options#

Option	Required	Default	Description
`block_name`	Yes	None	Block name; must be unique within the current document
`converter_tag`	Yes	None	One of `external_dp_template`, `external_dp_launch`, `external_dp_proxy`
`test_case_path`	Yes	None	Repository-relative path that stays within the repo (no `..` escape); file must exist
`host_index`	`external_dp_template` only	None	Use `templates[host_index]` from the YAML as the rendering source

备注

external_dp_proxy currently supports only routing.type: disaggregated_prefill and reads its routing.groups.prefiller / routing.groups.decoder node lists.

Local debugging and generation#

Generate only (without building the full site)#

# Generate all model-code artifacts under docs/source/tutorials/models/
python3 tools/docs_codegen/cli.py

# Generate artifacts for a single document
python3 tools/docs_codegen/cli.py --doc docs/source/tutorials/models/Kimi-K2-Thinking.md

# Generate a single block and print it (no files written)
python3 tools/docs_codegen/cli.py \
  --block docs/source/tutorials/models/Kimi-K2-Thinking.md::kimi_k2_thinking_single_node \
  --dry-run --stdout

By default, artifacts are written to: docs/_build/doc_codegen/<doc_stem>/<block_name>.sh.

备注

After the script is generated, please make sure to check whether the generated content is runnable, especially key parts such as environment variables and command-line parameters.

Concrete YAML-to-shell example#

The following model-code block reads the first test case from tests/e2e/nightly/single_node/models/configs/Kimi-K2-Thinking.yaml:

```{model-code}
:block_name: kimi_k2_thinking_single_node
:converter_tag: single_node
:test_case_path: tests/e2e/nightly/single_node/models/configs/Kimi-K2-Thinking.yaml
```

The YAML fields read by the converter look like this:

test_cases:
  - name: "Kimi-K2-Thinking-TP16-Case"
    model: "moonshotai/Kimi-K2-Thinking"
    envs:
      HCCL_BUFFSIZE: "1024"
      TASK_QUEUE_ENABLE: "1"
      OMP_PROC_BIND: "false"
      HCCL_OP_EXPANSION_MODE: "AIV"
      PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
      SERVER_PORT: "DEFAULT_PORT"
    server_cmd:
      - "--tensor-parallel-size"
      - "16"
      - "--port"
      - "$SERVER_PORT"
      - "--max-model-len"
      - "8192"
      - "--max-num-batched-tokens"
      - "8192"
      - "--max-num-seqs"
      - "12"
      - "--gpu-memory-utilization"
      - "0.9"
      - "--trust-remote-code"
      - "--enable-expert-parallel"
      - "--no-enable-prefix-caching"

Run the block in dry-run mode to see the generated shell without writing files:

python3 tools/docs_codegen/cli.py \
  --block docs/source/tutorials/models/Kimi-K2-Thinking.md::kimi_k2_thinking_single_node \
  --dry-run --stdout

The first line is the artifact path. The remaining lines are the generated shell content:

# docs/_build/doc_codegen/Kimi-K2-Thinking/kimi_k2_thinking_single_node.sh
export HCCL_BUFFSIZE=1024
export TASK_QUEUE_ENABLE=1
export OMP_PROC_BIND=false
export HCCL_OP_EXPANSION_MODE=AIV
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export SERVER_PORT=8000

vllm serve moonshotai/Kimi-K2-Thinking \
  --tensor-parallel-size 16 \
  --port $SERVER_PORT \
  --max-model-len 8192 \
  --max-num-batched-tokens 8192 \
  --max-num-seqs 12 \
  --gpu-memory-utilization 0.9 \
  --trust-remote-code \
  --enable-expert-parallel \
  --no-enable-prefix-caching

In this example, envs is rendered as export lines, model becomes vllm serve <model>, and server_cmd is appended as formatted command-line arguments. SERVER_PORT: "DEFAULT_PORT" is resolved to the default single-node port 8000.

Build the site & preview locally#

# Install documentation build dependencies
python3 -m pip install -r docs/requirements-docs.txt

# (Optional) Clean previous builds
make -C docs clean

# Build the English site
make -C docs html

# (Optional) Build the Chinese site
make -C docs intl

# Preview locally
python3 -m http.server -d docs/_build/html 8000

# Then open in a browser:
# http://localhost:8000

For developers: add a new converter#

The goal of adding a converter is to make converter_tag: <name> render a given YAML structure into a script (GeneratedScript).

What to change#

In tools/docs_codegen/converters.py:
- Add a BaseConverter subclass that implements convert(loaded_yaml, *, block) -> GeneratedScript
- Give the converter a unique name (the value used by converter_tag in docs)
- Register it in build_default_converters()
- Reuse the shared validation/rendering helpers in tools/docs_codegen/utils.py (require_yaml_mapping, require_mapping, require_scalar_mapping, require_indexed_mapping, parse_command_tokens, render_cli_command, ...) rather than re-validating the YAML shape inline
If your converter needs new directive options (e.g. :foo_index:):
- Add the option name to MODEL_CODE_OPTION_NAMES in tools/docs_codegen/scanner.py
- Add the option name to ModelCodeDirective.option_spec in tools/docs_codegen/sphinx_extension.py
Add a real example snippet in any model doc (recommended under docs/source/tutorials/models/) and point it to a YAML file that exists (recommended under tests/).
Minimal validation via CLI:
- python3 tools/docs_codegen/cli.py --doc <your_doc> or --block <doc>::<block_name>

Documentation writing guide

目录

Documentation writing guide#

Guide to Writing Model Tutorial Documentation#

Testable documentation code block generation (model-code)#

For authors: add a block#

Single node (single_node)#

Template 1: minimal (metadata only)#

Template 2: with text (use {{ generated }} placeholder)#

Options#

YAML reference#

Multi node (multi_node)#

Template 1: minimal (metadata only)#

Template 2: with text (use {{ generated }} placeholder)#

Options#

YAML reference#

External data parallel (external_dp_template / external_dp_launch / external_dp_proxy)#

Templates#

Options#

Local debugging and generation#

Generate only (without building the full site)#

Concrete YAML-to-shell example#

Build the site & preview locally#

For developers: add a new converter#

What to change#

Testable documentation code block generation (`model-code`)#

Single node (`single_node`)#

Template 2: with text (use `{{ generated }}` placeholder)#

Multi node (`multi_node`)#

Template 2: with text (use `{{ generated }}` placeholder)#

External data parallel (`external_dp_template` / `external_dp_launch` / `external_dp_proxy`)#