Skip to content

vllm_omni.entrypoints.cli.serve

Omni serve command for vLLM-Omni.

Supports both multi-stage LLM models (e.g., Qwen2.5-Omni) and diffusion models (e.g., Qwen-Image) through the same CLI interface.

DESCRIPTION module-attribute

DESCRIPTION = "Launch a local OpenAI-compatible API server to serve Omni models\nvia HTTP. Supports both multi-stage LLM models and diffusion models.\n\nThe server automatically detects the model type:\n- LLM models: Served via /v1/chat/completions endpoint\n- Diffusion models: Served via /v1/images/generations endpoint\n\nExamples:\n  # Start an Omni LLM server\n  vllm serve Qwen/Qwen2.5-Omni-7B --omni --port 8091\n\n  # Start a diffusion model server\n  vllm serve Qwen/Qwen-Image --omni --port 8091\n\nSearch by using: `--help=<ConfigGroup>` to explore options by section (e.g.,\n--help=OmniConfig)\n  Use `--help=all` to show all available flags at once.\n"

logger module-attribute

logger = init_logger(__name__)

OmniServeCommand

Bases: CLISubcommand

The serve subcommand for the vLLM CLI.

name class-attribute instance-attribute

name = 'serve'

cmd staticmethod

cmd(args: TrackingNamespace) -> None

subparser_init

subparser_init(
    subparsers: _SubParsersAction,
) -> TrackingArgumentParser

validate

validate(args: Namespace) -> None

cmd_init

cmd_init() -> list[CLISubcommand]

run_headless

run_headless(args: TrackingNamespace) -> None

Run a single stage in headless mode.

Honors --omni-dp-size-local: launches that many replicas locally for --stage-id. Each replica registers with the head's OmniMasterServer (auto-assigned replica id when --omni-dp-size-local > 1 so multiple headless invocations can coexist) and reports heartbeats to the head's OmniCoordinator.