Governance¶
vLLM-Omni's governance is inspired by the vLLM governance process. We share the same commitment to open source community and meritocratic norms.
Values¶
vLLM-Omni aims to be the fastest and easiest-to-use omni-modality inference and serving engine. Our values are aligned with vLLM's values:
Design Values¶
- Top performance: System performance is our top priority. We monitor overheads, optimize kernels, and publish benchmarks. We never leave performance on the table.
- Ease of use: vLLM-Omni must be simple to install, configure, and operate. We provide clear documentation, fast startup, clean logs, helpful error messages, and monitoring guides. Many users fork our code or study it deeply, so we keep it readable and modular.
- Wide coverage: vLLM-Omni supports frontier models and high-performance accelerators. We make it easy to add new models and hardware. vLLM-Omni + PyTorch form a simple interface that avoids complexity.
- Production ready: vLLM-Omni runs 24/7 in production. It must be easy to operate and monitor for health issues.
- Extensibility: vLLM-Omni serves as fundamental omni-modality infrastructure. Our codebase cannot cover every use case, so we design for easy forking and customization.
Collaboration Values¶
- Tightly Knit and Fast-Moving: Our maintainer team is aligned on vision, philosophy, and roadmap. We work closely to unblock each other and move quickly.
- Individual Merit: No one buys their way into governance. Committer status belongs to individuals, not companies. We reward contribution, maintenance, and project stewardship.
Project Maintainers¶
Lead Maintainers¶
Lead maintainers are responsible for the overall direction and strategy of the project:
Active Committers¶
Committers have write access and merge rights. They typically have deep expertise in specific areas of this project and shepherd the community contributions:
- @david6666666: Quantization and Community Relationship
- @gcanlin: Hardware plugin and NPU integration
- @Isotr0py: Diffusion and Quantization
- @linyueqian: TTS and Omni Support
- @lishunyang12: Quantization and Configuration
- @princepride: Diffusion and Omni Support
- @RuixiangMa: Diffusion models, parallel, cache, and docs
- @SamitHuang: RL, Diffusion, and cache
- @tzhouam: Engine and New Model Support
- @wtomin: Diffusion models, parallel, and docs
- @ZeldaHuang: Omni Support
- @ZJY0516: Diffusion attention backend, kernel fusion, and CustomOp
- @yuanheng-zhao: Diffusion cache, offload, and Omni Support
Diffusion Workload Division¶
This section breaks down diffusion-related responsibilities. If you have PRs touching these areas, please ping the listed owners for review.
Runtime-related¶
- Diffusion models: @RuixiangMa, @wtomin
Optimization¶
- Parallel: @RuixiangMa, @wtomin
- Attention backend: @ZJY0516
- Cache: @yuanheng-zhao, @RuixiangMa, @SamitHuang
- Offload: @yuanheng-zhao
- Quantization: @lishunyang12, @david6666666
- Kernel fusion / communication-computation: @ZJY0516
Docs & Test¶
- Diffusion docs: @RuixiangMa, @wtomin
Meetings¶
Committers hold bi-weekly meetings to discuss future directions and collaborations of the project.
Committer Nomination Process¶
Every month, any active committer can nominate new committer(s) to the project. Up to two new committers will be admitted per month based on the quality and impact of their contributions.