Documentation writing guide#

Guide to Writing Model Tutorial Documentation#

docs/source/_templates/Model-Deployment-Tutorial-Template.md is a template for writing model deployment tutorials. You can copy and modify it to create new docs.

Testable documentation code block generation (model-code)#

  • For documentation authors: how to insert testable command blocks into docs

  • For developers: how to add a new converter

Built-in supported converter_tag values:

converter_tag

Renders

YAML source

single_node

One vllm serve script for a single node

test_cases[case_index]

multi_node

One host's vllm serve script

deployment[host_index]

external_dp_template

One external-DP node's env exports + vllm serve command

templates[host_index] (+ top-level model)

external_dp_launch

One launch_online_dp.py line per node

config[]

external_dp_proxy

The load-balance proxy launch command

config[] + routing

For authors: add a block#

重要

By default, the generator scans only .md files under docs/source/tutorials/models/ and produces artifacts. If you put model-code blocks in other directories, Sphinx builds will not automatically generate the corresponding scripts.

Single node (single_node)#

Template 1: minimal (metadata only)#
```{model-code}
:block_name: your_unique_block_name
:converter_tag: single_node
:test_case_path: tests/e2e/nightly/single_node/models/configs/your_model.yaml
```
Template 2: with text (use {{ generated }} placeholder)#
```{model-code}
:block_name: your_unique_block_name
:converter_tag: single_node
:test_case_path: tests/e2e/nightly/single_node/models/configs/your_model.yaml
:case_index: 0

# You can add any extra content here, e.g. code, explanations, or comments.
{{ generated }}
```
Options#

Option

Required

Default

Description

block_name

Yes

None

Block name; must be unique within the current document

converter_tag

Yes

None

Must be single_node

test_case_path

Yes

None

Repository-relative path that stays within the repo (no .. escape); file must exist

case_index

No

0

Use test_cases[case_index] from the YAML as the rendering source

YAML reference#

See existing files under tests/e2e/nightly/single_node/models/configs/.

single_node reads test_cases[case_index]. Common fields include:

  • model: model name (ultimately renders vllm serve <model> ...)

  • envs: rendered as export ... (scalar values)

  • server_cmd: arguments appended to vllm serve <model> (shell string or token list)

  • server_cmd_extra (optional): extra appended arguments

Multi node (multi_node)#

Template 1: minimal (metadata only)#
```{model-code}
:block_name: your_unique_block_name_0
:converter_tag: multi_node
:test_case_path: tests/e2e/nightly/multi_node/config/your_model.yaml
:host_index: 0
```
```{model-code}
:block_name: your_unique_block_name_1
:converter_tag: multi_node
:test_case_path: tests/e2e/nightly/multi_node/config/your_model.yaml
:host_index: 1
```
Template 2: with text (use {{ generated }} placeholder)#
```{model-code}
:block_name: your_unique_block_name_0
:converter_tag: multi_node
:test_case_path: tests/e2e/nightly/multi_node/config/your_model.yaml
:host_index: 0

# You can add any extra content here, e.g. code, explanations, or comments.
{{ generated }}
```
```{model-code}
:block_name: your_unique_block_name_1
:converter_tag: multi_node
:test_case_path: tests/e2e/nightly/multi_node/config/your_model.yaml
:host_index: 1

# You can add any extra content here, e.g. code, explanations, or comments.
{{ generated }}
```
Options#

Option

Required

Default

Description

block_name

Yes

None

Block name; must be unique within the current document

converter_tag

Yes

None

Must be multi_node

test_case_path

Yes

None

Repository-relative path that stays within the repo (no .. escape); file must exist

host_index

Yes

None

Use deployment[host_index] from the YAML as the rendering source

YAML reference#

See existing files under tests/e2e/nightly/multi_node/config/.

multi_node reads deployment[host_index]. Common fields include:

  • envs: rendered as export ... (scalar values)

  • server_cmd: a complete command (must start with vllm serve <model>; shell multi-line string or token list)

External data parallel (external_dp_template / external_dp_launch / external_dp_proxy)#

These three converters read one shared external-DP YAML (see tests/e2e/nightly/multi_node/external_dp/config/) and each render a different part of the deployment. They are tightly coupled to that schema by design.

The shared YAML provides:

  • model: model name (top level)

  • config: a list of per-node settings (port_start, dp_rpc_port, dp_size, dp_size_local, dp_rank_start, tp_size, dp_address, ...)

  • routing: type plus groups (e.g. prefiller / decoder lists of config indices)

  • templates: a list of per-node envs and server_cmd_template entries

server_cmd_template uses braced ${VAR} placeholders that external_dp_template rewrites to the positional shell parameters consumed by run_dp_template.sh:

${VAR}

Positional

${VISIBLE_DEVICES}

$1

${PORT}

$2

${DP_SIZE}

$3

${DP_RANK}

$4

${DP_ADDRESS}

$5

${DP_RPC_PORT}

$6

${TP_SIZE}

$7

Unbraced references such as $SERVER_PORT and unknown braced variables are left untouched so they remain live shell expansions.

Templates#
```{model-code}
:block_name: your_unique_block_name_prefill_node0
:converter_tag: external_dp_template
:test_case_path: tests/e2e/nightly/multi_node/external_dp/config/your_model.yaml
:host_index: 0
```

external_dp_launch (one python launch_online_dp.py ... line per config node) and external_dp_proxy (the load-balance proxy command, e.g. python load_balance_proxy_server_example.py ...) read the whole cluster, so they take no index option:

```{model-code}
:block_name: your_unique_block_name_launch
:converter_tag: external_dp_launch
:test_case_path: tests/e2e/nightly/multi_node/external_dp/config/your_model.yaml
```
```{model-code}
:block_name: your_unique_block_name_proxy
:converter_tag: external_dp_proxy
:test_case_path: tests/e2e/nightly/multi_node/external_dp/config/your_model.yaml
```
Options#

Option

Required

Default

Description

block_name

Yes

None

Block name; must be unique within the current document

converter_tag

Yes

None

One of external_dp_template, external_dp_launch, external_dp_proxy

test_case_path

Yes

None

Repository-relative path that stays within the repo (no .. escape); file must exist

host_index

external_dp_template only

None

Use templates[host_index] from the YAML as the rendering source

备注

external_dp_proxy currently supports only routing.type: disaggregated_prefill and reads its routing.groups.prefiller / routing.groups.decoder node lists.

Local debugging and generation#

Generate only (without building the full site)#

# Generate all model-code artifacts under docs/source/tutorials/models/
python3 tools/docs_codegen/cli.py

# Generate artifacts for a single document
python3 tools/docs_codegen/cli.py --doc docs/source/tutorials/models/Kimi-K2-Thinking.md

# Generate a single block and print it (no files written)
python3 tools/docs_codegen/cli.py \
  --block docs/source/tutorials/models/Kimi-K2-Thinking.md::kimi_k2_thinking_single_node \
  --dry-run --stdout

By default, artifacts are written to: docs/_build/doc_codegen/<doc_stem>/<block_name>.sh.

备注

After the script is generated, please make sure to check whether the generated content is runnable, especially key parts such as environment variables and command-line parameters.

Concrete YAML-to-shell example#

The following model-code block reads the first test case from tests/e2e/nightly/single_node/models/configs/Kimi-K2-Thinking.yaml:

```{model-code}
:block_name: kimi_k2_thinking_single_node
:converter_tag: single_node
:test_case_path: tests/e2e/nightly/single_node/models/configs/Kimi-K2-Thinking.yaml
```

The YAML fields read by the converter look like this:

test_cases:
  - name: "Kimi-K2-Thinking-TP16-Case"
    model: "moonshotai/Kimi-K2-Thinking"
    envs:
      HCCL_BUFFSIZE: "1024"
      TASK_QUEUE_ENABLE: "1"
      OMP_PROC_BIND: "false"
      HCCL_OP_EXPANSION_MODE: "AIV"
      PYTORCH_NPU_ALLOC_CONF: "expandable_segments:True"
      SERVER_PORT: "DEFAULT_PORT"
    server_cmd:
      - "--tensor-parallel-size"
      - "16"
      - "--port"
      - "$SERVER_PORT"
      - "--max-model-len"
      - "8192"
      - "--max-num-batched-tokens"
      - "8192"
      - "--max-num-seqs"
      - "12"
      - "--gpu-memory-utilization"
      - "0.9"
      - "--trust-remote-code"
      - "--enable-expert-parallel"
      - "--no-enable-prefix-caching"

Run the block in dry-run mode to see the generated shell without writing files:

python3 tools/docs_codegen/cli.py \
  --block docs/source/tutorials/models/Kimi-K2-Thinking.md::kimi_k2_thinking_single_node \
  --dry-run --stdout

The first line is the artifact path. The remaining lines are the generated shell content:

# docs/_build/doc_codegen/Kimi-K2-Thinking/kimi_k2_thinking_single_node.sh
export HCCL_BUFFSIZE=1024
export TASK_QUEUE_ENABLE=1
export OMP_PROC_BIND=false
export HCCL_OP_EXPANSION_MODE=AIV
export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export SERVER_PORT=8000

vllm serve moonshotai/Kimi-K2-Thinking \
  --tensor-parallel-size 16 \
  --port $SERVER_PORT \
  --max-model-len 8192 \
  --max-num-batched-tokens 8192 \
  --max-num-seqs 12 \
  --gpu-memory-utilization 0.9 \
  --trust-remote-code \
  --enable-expert-parallel \
  --no-enable-prefix-caching

In this example, envs is rendered as export lines, model becomes vllm serve <model>, and server_cmd is appended as formatted command-line arguments. SERVER_PORT: "DEFAULT_PORT" is resolved to the default single-node port 8000.

Build the site & preview locally#

# Install documentation build dependencies
python3 -m pip install -r docs/requirements-docs.txt

# (Optional) Clean previous builds
make -C docs clean

# Build the English site
make -C docs html

# (Optional) Build the Chinese site
make -C docs intl

# Preview locally
python3 -m http.server -d docs/_build/html 8000

# Then open in a browser:
# http://localhost:8000

For developers: add a new converter#

The goal of adding a converter is to make converter_tag: <name> render a given YAML structure into a script (GeneratedScript).

What to change#

  1. In tools/docs_codegen/converters.py:

    • Add a BaseConverter subclass that implements convert(loaded_yaml, *, block) -> GeneratedScript

    • Give the converter a unique name (the value used by converter_tag in docs)

    • Register it in build_default_converters()

    • Reuse the shared validation/rendering helpers in tools/docs_codegen/utils.py (require_yaml_mapping, require_mapping, require_scalar_mapping, require_indexed_mapping, parse_command_tokens, render_cli_command, ...) rather than re-validating the YAML shape inline

  2. If your converter needs new directive options (e.g. :foo_index:):

    • Add the option name to MODEL_CODE_OPTION_NAMES in tools/docs_codegen/scanner.py

    • Add the option name to ModelCodeDirective.option_spec in tools/docs_codegen/sphinx_extension.py

  3. Add a real example snippet in any model doc (recommended under docs/source/tutorials/models/) and point it to a YAML file that exists (recommended under tests/).

  4. Minimal validation via CLI:

    • python3 tools/docs_codegen/cli.py --doc <your_doc> or --block <doc>::<block_name>