vLLM-Omni CLI Guide¶

The CLI for vLLM-Omni inherits from vllm with some additional arguments.

serve¶

Starts the vLLM-Omni OpenAI Compatible API server.

Start with a model:

vllm serve Qwen/Qwen2.5-Omni-7B --omni

Specify the port:

vllm serve Qwen/Qwen2.5-Omni-7B --omni --port 8091

If you have custom stage configs file, launch the server with command below

vllm serve Qwen/Qwen2.5-Omni-7B --omni --stage-configs-path /path/to/stage_configs_file

bench¶

Run benchmark tests for online serving throughput. Available Commands:

vllm bench serve --omni \
    --model Qwen/Qwen2.5-Omni-7B \
    --host server-host \
    --port server-port \
    --random-input-len 32 \
    --random-output-len 4  \
    --num-prompts  5

See vllm bench serve for the full reference of all available arguments.