Skip to content

Configuration Options¶

This section lists the most common options for running vLLM-Omni.

For options within a vLLM Engine. Please refer to vLLM Configuration

Currently, the main options are maintained by stage configs for each model.

For a specific example, see the Qwen2.5-Omni deploy config. The matching frozen pipeline topology lives at vllm_omni/model_executor/models/qwen2_5_omni/pipeline.py.

For introduction, please check Introduction for stage config

Memory Configuration¶

GPU Memory Calculation and Configuration - Guide on how to calculate memory requirements and set up gpu_memory_utilization for optimal performance

Multi-Stage Recipes¶

Prefill-Decode Disaggregation - How to derive a PD-aware Qwen3-Omni stage config from the default config without introducing another bundled YAML

Optimization Features¶

Diffusion Features Overview - Complete overview of all diffusion model features and supported models