vllm_omni.entrypoints.cli.serve ¶
Omni serve command for vLLM-Omni.
Supports both multi-stage LLM models (e.g., Qwen2.5-Omni) and diffusion models (e.g., Qwen-Image) through the same CLI interface.
DESCRIPTION module-attribute ¶
DESCRIPTION = "Launch a local OpenAI-compatible API server to serve Omni models\nvia HTTP. Supports both multi-stage LLM models and diffusion models.\n\nThe server automatically detects the model type:\n- LLM models: Served via /v1/chat/completions endpoint\n- Diffusion models: Served via /v1/images/generations endpoint\n\nExamples:\n # Start an Omni LLM server\n vllm serve Qwen/Qwen2.5-Omni-7B --omni --port 8091\n\n # Start a diffusion model server\n vllm serve Qwen/Qwen-Image --omni --port 8091\n\nSearch by using: `--help=<ConfigGroup>` to explore options by section (e.g.,\n--help=OmniConfig)\n Use `--help=all` to show all available flags at once.\n"
OmniServeCommand ¶
Bases: CLISubcommand
The serve subcommand for the vLLM CLI.
run_headless ¶
run_headless(args: TrackingNamespace) -> None
Run a single stage in headless mode.
Honors --omni-dp-size-local: launches that many replicas locally for --stage-id. Each replica registers with the head's OmniMasterServer (auto-assigned replica id when --omni-dp-size-local > 1 so multiple headless invocations can coexist) and reports heartbeats to the head's OmniCoordinator.