vLLM-Omni CLI Guide¶
The CLI for vLLM-Omni inherits from vllm with some additional arguments.
serve¶
Starts the vLLM-Omni OpenAI Compatible API server.
Start with a model:
Specify the port:
If you have custom stage configs file, launch the server with command below
bench¶
Run benchmark tests for online serving throughput. Available Commands:
vllm bench serve --omni \
--model Qwen/Qwen2.5-Omni-7B \
--host server-host \
--port server-port \
--random-input-len 32 \
--random-output-len 4 \
--num-prompts 5
See vllm bench serve for the full reference of all available arguments.