Doc writing guide#

Guide to Writing Model Tutorial Doc#

docs/source/_templates/Model-Deployment-Tutorial-Template.md is a template for writing model deployment tutorials. You can copy and modify it to create new docs.

Testable doc code block generation (model-code)#

  • For documentation authors: how to insert testable command blocks into docs

  • For developers: how to add a new converter

Built-in supported converter_tag values:

converter_tag

Renders

YAML source

single_node

A single node's env exports + vllm serve script

test_cases[case_index]

multi_node

One host's env exports + vllm serve script

deployment[host_index]

external_dp_template

One external-DP node's env exports + vllm serve command

templates[host_index]

external_dp_launch

One launch_online_dp.py line per node

config[]

external_dp_proxy

The load-balance proxy launch command

config[] + routing

For authors: add a block#

重要

By default, the generator scans only .md files under docs/source/tutorials/models/ and produces artifacts. If you put model-code blocks in other directories, Sphinx builds will not automatically generate the corresponding scripts.

All model-code blocks need:

Option

Required

Description

block_name

Yes

Block name; must be unique within the current document

converter_tag

Yes

Selects one of the built-in converters

test_case_path

Yes

Repository-relative YAML path that stays within the repo; the file must exist

Use the body of the block to add shell wrapper lines such as set -eux. Always place the {{ generated }} placeholder where the converter output should be inserted.

converter_tag: single_node#

single_node reads one item from test_cases. The optional case_index metadata selects the item; when omitted, it defaults to 0.

Only the fields read by this converter are expanded below. Other test metadata can be left in the YAML and is ignored by this converter.

test_cases:
  - name: qwen3-8b-single
    model: Qwen/Qwen3-8B
    envs:
      HCCL_BUFFSIZE: "1024"
      SERVER_PORT: DEFAULT_PORT
    server_cmd:
      - --tensor-parallel-size
      - "1"
      - --port
      - $SERVER_PORT
      - --trust-remote-code
    server_cmd_extra:
      - --enable-expert-parallel
    benchmarks: ...

envs is rendered as export lines. SERVER_PORT: DEFAULT_PORT is resolved to the default single-node port 8000. model becomes vllm serve <model>, and server_cmd plus optional server_cmd_extra become command arguments. Both command fields can be either a shell string or a flat token list.

Write the doc block like this:

```{model-code}
:block_name: qwen3_8b_single_node
:converter_tag: single_node
:test_case_path: tests/e2e/nightly/single_node/models/configs/your_model.yaml
:case_index: 0

set -eux
{{ generated }}
```

Generated shell script:

set -eux
export HCCL_BUFFSIZE=1024
export SERVER_PORT=8000

vllm serve Qwen/Qwen3-8B \
  --tensor-parallel-size 1 \
  --port $SERVER_PORT \
  --trust-remote-code \
  --enable-expert-parallel

converter_tag: multi_node#

multi_node reads one item from deployment. The required host_index metadata selects which host to render.

deployment:
  - envs:
      SERVER_PORT: "8000"
    server_cmd: >
      vllm serve Qwen/Qwen3-235B-A22B
      --host 0.0.0.0
      --port $SERVER_PORT
      --data-parallel-size 2
      --tensor-parallel-size 8
      --data-parallel-address $LOCAL_IP
  - envs:
      SERVER_PORT: "8000"
    server_cmd: >
      vllm serve Qwen/Qwen3-235B-A22B
      --headless
      --port $SERVER_PORT
      --data-parallel-size 2
      --tensor-parallel-size 8
      --data-parallel-start-rank 1
      --data-parallel-address $MASTER_IP
benchmarks: ...

server_cmd must be a complete command starting with vllm serve <model>. It can be written as a shell string or a flat token list.

Write the doc block like this:

```{model-code}
:block_name: qwen3_235b_worker_1
:converter_tag: multi_node
:test_case_path: tests/e2e/nightly/multi_node/internal_dp/config/your_model.yaml
:host_index: 1

set -eux
{{ generated }}
```

Generated shell script for host_index: 1:

set -eux
export MASTER_IP=192.168.1.10
export SERVER_PORT=8000

vllm serve Qwen/Qwen3-235B-A22B \
  --headless \
  --port $SERVER_PORT \
  --data-parallel-size 2 \
  --tensor-parallel-size 8 \
  --data-parallel-start-rank 1 \
  --data-parallel-address $MASTER_IP

converter_tag: external_dp_template#

external_dp_template reads one item from templates. The required host_index metadata selects which template to render. The top-level model field is also required because the converter builds vllm serve <model>.

model: Eco-Tech/GLM-Test
templates:
  - node_index: 0
    envs:
      HCCL_BUFFSIZE: "1024"
      ASCEND_RT_VISIBLE_DEVICES: "${VISIBLE_DEVICES}"
    server_cmd_template:
      - --host
      - 0.0.0.0
      - --port
      - ${PORT}
      - --data-parallel-size
      - ${DP_SIZE}
      - --data-parallel-rank
      - ${DP_RANK}
      - --data-parallel-address
      - ${DP_ADDRESS}
      - --data-parallel-rpc-port
      - ${DP_RPC_PORT}
      - --tensor-parallel-size
      - ${TP_SIZE}
      - --trust-remote-code
config: ...
routing: ...

Known braced template variables are rewritten to the positional shell arguments that run_dp_template.sh receives from launch_online_dp.py:

Template variable

Rendered positional

${VISIBLE_DEVICES}

$1

${PORT}

$2

${DP_SIZE}

$3

${DP_RANK}

$4

${DP_ADDRESS}

$5

${DP_RPC_PORT}

$6

${TP_SIZE}

$7

Unknown braced variables and unbraced shell references such as $SERVER_PORT are left unchanged.

Write the doc block like this:

```{model-code}
:block_name: glm_external_dp_template_node0
:converter_tag: external_dp_template
:test_case_path: tests/e2e/nightly/multi_node/external_dp/config/your_model.yaml
:host_index: 0

set -eux
export HCCL_IF_IP=$local_ip
export GLOO_SOCKET_IFNAME=$nic_name
export TP_SOCKET_IFNAME=$nic_name
export HCCL_SOCKET_IFNAME=$nic_name

{{ generated }}
```

Generated shell script for host_index: 0:

set -eux
export HCCL_IF_IP=$local_ip
export GLOO_SOCKET_IFNAME=$nic_name
export TP_SOCKET_IFNAME=$nic_name
export HCCL_SOCKET_IFNAME=$nic_name

export HCCL_BUFFSIZE=1024
export ASCEND_RT_VISIBLE_DEVICES=$1

vllm serve Eco-Tech/GLM-Test \
  --host 0.0.0.0 \
  --port $2 \
  --data-parallel-size $3 \
  --data-parallel-rank $4 \
  --data-parallel-address $5 \
  --data-parallel-rpc-port $6 \
  --tensor-parallel-size $7 \
  --trust-remote-code

converter_tag: external_dp_launch#

external_dp_launch reads the full config list and renders one launch_online_dp.py command per node. It does not take an index option.

config:
  - node_index: 0
    port_start: 7100
    dp_rpc_port: 12321
    dp_size: 2
    dp_size_local: 2
    dp_rank_start: 0
    tp_size: 8
    dp_address: "${NODE_0_IP}"
  - node_index: 1
    port_start: 7200
    dp_rpc_port: 12321
    dp_size: 4
    dp_size_local: 4
    dp_rank_start: 0
    tp_size: 4
    dp_address: "${NODE_1_IP}"
templates: ...
routing: ...

Write the doc block like this:

```{model-code}
:block_name: glm_external_dp_launch
:converter_tag: external_dp_launch
:test_case_path: tests/e2e/nightly/multi_node/external_dp/config/your_model.yaml

set -eux
{{ generated }}
```

Generated shell script:

set -eux
python launch_online_dp.py --dp-size 2 --tp-size 8 --dp-size-local 2 --dp-rank-start 0 --dp-address ${NODE_0_IP} --dp-rpc-port 12321 --vllm-start-port 7100

python launch_online_dp.py --dp-size 4 --tp-size 4 --dp-size-local 4 --dp-rank-start 0 --dp-address ${NODE_1_IP} --dp-rpc-port 12321 --vllm-start-port 7200

converter_tag: external_dp_proxy#

external_dp_proxy reads config and routing. It renders the load_balance_proxy_server_example.py command for routing.type: disaggregated_prefill. It does not take an index option.

routing:
  type: disaggregated_prefill
  groups:
    prefiller: [0]
    decoder: [1]
config:
  - node_index: 0
    port_start: 7100
    dp_size_local: 2
    dp_rpc_port: 12321
    dp_size: 2
    dp_rank_start: 0
    tp_size: 8
    dp_address: "${NODE_0_IP}"
  - node_index: 1
    port_start: 7200
    dp_size_local: 4
    dp_rpc_port: 12321
    dp_size: 4
    dp_rank_start: 0
    tp_size: 4
    dp_address: "${NODE_1_IP}"
templates: ...

routing.groups.prefiller and routing.groups.decoder contain indices into config. Each referenced node expands to dp_size_local host and port entries. The proxy itself is rendered on ${NODE_0_IP}:1999.

Write the doc block like this:

```{model-code}
:block_name: glm_external_dp_proxy
:converter_tag: external_dp_proxy
:test_case_path: tests/e2e/nightly/multi_node/external_dp/config/your_model.yaml

set -eux
{{ generated }}
```

Generated shell script:

set -eux
python load_balance_proxy_server_example.py \
  --host ${NODE_0_IP} \
  --port 1999 \
  --prefiller-hosts \
    ${NODE_0_IP} \
    ${NODE_0_IP} \
  --prefiller-ports \
    7100 \
    7101 \
  --decoder-hosts \
    ${NODE_1_IP} \
    ${NODE_1_IP} \
    ${NODE_1_IP} \
    ${NODE_1_IP} \
  --decoder-ports \
    7200 \
    7201 \
    7202 \
    7203

Local debugging and generation#

Generate only (without building the full site)#

# Generate all model-code artifacts under docs/source/tutorials/models/
python3 tools/docs_codegen/cli.py

# Generate artifacts for a single document
python3 tools/docs_codegen/cli.py --doc docs/source/tutorials/models/Kimi-K2-Thinking.md

# Generate a single block and print it (no files written)
python3 tools/docs_codegen/cli.py \
  --block docs/source/tutorials/models/Kimi-K2-Thinking.md::kimi_k2_thinking_single_node \
  --dry-run --stdout

By default, artifacts are written to: docs/_build/doc_codegen/<doc_stem>/<block_name>.sh.

备注

After the script is generated, please make sure to check whether the generated content is runnable, especially key parts such as environment variables and command-line parameters.

Build the site & preview locally#

# Install documentation build dependencies
python3 -m pip install -r docs/requirements-docs.txt

# (Optional) Clean previous builds
make -C docs clean

# Build the English site
make -C docs html

# (Optional) Build the Chinese site
make -C docs intl

# Preview locally
python3 -m http.server -d docs/_build/html 8000

# Then open in a browser:
# http://localhost:8000

For developers: add a new converter#

A converter turns one loaded YAML file plus one parsed ModelCodeBlock into a GeneratedScript. The current pipeline is:

  1. BlockScanner parses model-code fences and accepts only options listed in MODEL_CODE_OPTION_NAMES.

  2. YamlLoader loads test_case_path.

  3. get_converter() looks up block.converter_tag from build_default_converters().

  4. The selected converter returns GeneratedScript(content=..., language="shell").

  5. GeneratorService replaces {{ generated }} in the block body, validates that the final script is non-empty, and writes docs/_build/doc_codegen/<doc_stem>/<block_name>.sh.

To add a converter:

  1. In tools/docs_codegen/converters.py, add a BaseConverter subclass with a unique name. That name is the value authors put in :converter_tag:.

  2. Implement convert(self, loaded_yaml, *, block) -> GeneratedScript. Use make_docs_codegen_error(..., block=block) for user-facing validation errors so the CLI and Sphinx output include document context.

  3. Reuse helpers from tools/docs_codegen/utils.py, such as require_mapping, require_mapping_list, require_scalar_mapping, require_indexed_mapping, require_node_field, parse_command_tokens, substitute_template_positionals, and render_cli_command.

  4. Register the converter in build_default_converters(). If it is not registered, get_converter() will reject the new converter_tag.

  5. If the converter needs new directive metadata, add the option name to MODEL_CODE_OPTION_NAMES in tools/docs_codegen/scanner.py and to ModelCodeDirective.option_spec in tools/docs_codegen/sphinx_extension.py. Read the option with block.get_option("<option_name>").

  6. Add or update tests in tests/ut/tools/test_docs_codegen.py. Cover the successful render path, required option validation, YAML shape validation, and any CLI/Sphinx scanner behavior affected by new metadata.

  7. Add a real model-code example in a model tutorial, preferably under docs/source/tutorials/models/, and point it to an existing YAML file under tests/.

  8. Validate with the CLI:

    python3 tools/docs_codegen/cli.py --doc <your_doc> --dry-run
    python3 tools/docs_codegen/cli.py --block <your_doc>::<block_name> --dry-run --stdout
    

If a converter should render something other than shell, set GeneratedScript.language accordingly so Sphinx can highlight the generated literal block correctly.