Configuration Options¶
This section lists the most common options for running vLLM-Omni.
For options within a vLLM Engine. Please refer to vLLM Configuration
Currently, the main options are maintained by stage configs for each model.
For a specific example, see the Qwen2.5-Omni deploy config. The matching frozen pipeline topology lives at vllm_omni/model_executor/models/qwen2_5_omni/pipeline.py.
For introduction, please check Introduction for stage config
Memory Configuration¶
- GPU Memory Calculation and Configuration - Guide on how to calculate memory requirements and set up
gpu_memory_utilizationfor optimal performance
Multi-Stage Recipes¶
- Prefill-Decode Disaggregation - How to derive a PD-aware Qwen3-Omni stage config from the default config without introducing another bundled YAML
Optimization Features¶
- Diffusion Features Overview - Complete overview of all diffusion model features and supported models