Supported Models¶
vLLM-Omni supports unified multimodal comprehension and generation models across various tasks.
Model Implementation¶
If vLLM-Omni natively supports a model, its implementation can be found in vllm_omni/model_executor/models and vllm_omni/diffusion/models.
List of Supported Models¶
| Architecture | Models | Example HF Models | NVIDIA GPU | AMD GPU | Ascend NPU | Intel GPU |
|---|---|---|---|---|---|---|
Qwen3OmniMoeForConditionalGeneration | Qwen3-Omni | Qwen/Qwen3-Omni-30B-A3B-Instruct | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
Qwen2_5OmniForConditionalGeneration | Qwen2.5-Omni | Qwen/Qwen2.5-Omni-7B, Qwen/Qwen2.5-Omni-3B | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
MingFlashOmniForConditionalGeneration + MingImagePipeline | Ming-flash-omni-2.0 (omni-speech + imagegen1) | Jonathan1909/Ming-flash-omni-2.0 | ✅︎ | |||
BagelForConditionalGeneration | BAGEL (DiT-only) | ByteDance-Seed/BAGEL-7B-MoT | ✅︎ | ✅︎ | ✅︎ | |
InternVLAA1Pipeline | InternVLA-A1 | InternRobotics/InternVLA-A1-3B | ✅︎ | ✅︎ | ||
HunyuanImage3ForCausalMM | HunyuanImage3.0 (DiT-only) | tencent/HunyuanImage-3.0, tencent/HunyuanImage-3.0-Instruct | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
QwenImagePipeline | Qwen-Image | Qwen/Qwen-Image | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
QwenImagePipeline | Qwen-Image-2512 | Qwen/Qwen-Image-2512 | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
QwenImageEditPipeline | Qwen-Image-Edit | Qwen/Qwen-Image-Edit | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
QwenImageEditPlusPipeline | Qwen-Image-Edit-2509 | Qwen/Qwen-Image-Edit-2509 | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
QwenImageLayeredPipeline | Qwen-Image-Layered | Qwen/Qwen-Image-Layered | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
QwenImageEditPlusPipeline | Qwen-Image-Edit-2511 | Qwen/Qwen-Image-Edit-2511 | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
GlmImagePipeline | GLM-Image | zai-org/GLM-Image | ✅︎ | ✅︎ | ||
ZImagePipeline | Z-Image | Tongyi-MAI/Z-Image-Turbo | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
WanPipeline | Wan2.1-T2V, Wan2.2-T2V, Wan2.2-TI2V | Wan-AI/Wan2.1-T2V-1.3B-Diffusers, Wan-AI/Wan2.1-T2V-14B-Diffusers, Wan-AI/Wan2.2-T2V-A14B-Diffusers, Wan-AI/Wan2.2-TI2V-5B-Diffusers | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
WanImageToVideoPipeline | Wan2.2-I2V | Wan-AI/Wan2.2-I2V-A14B-Diffusers | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
Cosmos3OmniDiffusersPipeline | Cosmos3 T2I, T2V, I2V, T2V with sound, action policy | nvidia/Cosmos3-Nano | ✅︎ | |||
WanSpeechToVideoPipeline | Wan2.2-S2V | Wan-AI/Wan2.2-S2V-14B | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
Wan22VACEPipeline | Wan2.1-VACE | Wan-AI/Wan2.1-VACE-1.3B-diffusers, Wan-AI/Wan2.1-VACE-14B-diffusers | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
LTX2Pipeline | LTX-2-T2V | Lightricks/LTX-2 | ✅︎ | ✅︎ | ||
LTX2ImageToVideoPipeline | LTX-2-I2V | Lightricks/LTX-2 | ✅︎ | ✅︎ | ||
LTX2TwoStagesPipeline | LTX-2-T2V | rootonchair/LTX-2-19b-distilled | ✅︎ | ✅︎ | ||
LTX2ImageToVideoTwoStagesPipeline | LTX-2-I2V | rootonchair/LTX-2-19b-distilled | ✅︎ | ✅︎ | ||
LTX23Pipeline | LTX-2.3-T2V | dg845/LTX-2.3-Diffusers | ✅︎ | ✅︎ | ||
LTX23ImageToVideoPipeline | LTX-2.3-I2V | dg845/LTX-2.3-Diffusers | ✅︎ | ✅︎ | ||
DreamZeroPipeline | DreamZero-DROID | GEAR-Dreams/DreamZero-DROID | ✅︎ | |||
HeliosPipeline, HeliosPyramidPipeline | Helios | BestWishYsh/Helios-Base, BestWishYsh/Helios-Mid, BestWishYsh/Helios-Distilled | ✅︎ | ✅︎ | ✅︎ | |
MagiHumanPipeline | MagiHuman | SII-GAIR/daVinci-MagiHuman-Base-1080p | ✅︎ | ✅︎ | ||
OvisImagePipeline | Ovis-Image | OvisAI/Ovis-Image | ✅︎ | ✅︎ | ✅︎ | |
LongcatImagePipeline | LongCat-Image | meituan-longcat/LongCat-Image | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
LongCatImageEditPipeline | LongCat-Image-Edit | meituan-longcat/LongCat-Image-Edit | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
StableDiffusion3Pipeline | Stable-Diffusion-3 | stabilityai/stable-diffusion-3.5-medium | ✅︎ | ✅︎ | ✅︎ | |
CosyVoice3Model | CosyVoice3 | FunAudioLLM/Fun-CosyVoice3-0.5B-2512 | ✅︎ | ✅︎ | ✅︎ | |
MammothModa2ForConditionalGeneration | MammothModa2-Preview | bytedance-research/MammothModa2-Preview | ✅︎ | ✅︎ | ||
Flux2KleinPipeline | FLUX.2-klein | black-forest-labs/FLUX.2-klein-4B, black-forest-labs/FLUX.2-klein-9B | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
FluxKontextPipeline | FLUX.1-Kontext-dev | black-forest-labs/FLUX.1-Kontext-dev | ✅︎ | ✅︎ | ||
FluxPipeline | FLUX.1-dev | black-forest-labs/FLUX.1-dev | ✅︎ | ✅︎ | ✅︎ | |
FluxPipeline | FLUX.1-schnell | black-forest-labs/FLUX.1-schnell | ✅︎ | ✅︎ | ✅︎ | |
OmniGen2Pipeline | OmniGen2 | OmniGen2/OmniGen2 | ✅︎ | ✅︎ | ✅︎ | |
StableAudioPipeline | Stable-Audio-Open | stabilityai/stable-audio-open-1.0 | ✅︎ | ✅︎ | ✅︎ | |
AudioXPipeline | AudioX | zhangj1an/AudioX | ✅︎ | ✅︎ | ||
Qwen3TTSForConditionalGeneration | Qwen3-TTS-12Hz-1.7B-CustomVoice | Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
Qwen3TTSForConditionalGeneration | Qwen3-TTS-12Hz-1.7B-VoiceDesign | Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
Qwen3TTSForConditionalGeneration | Qwen3-TTS-12Hz-1.7B-Base | Qwen/Qwen3-TTS-12Hz-0.6B-Base | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
GLMTTSForConditionalGeneration | GLM-TTS | zai-org/GLM-TTS | ✅︎ | |||
NextStep11Pipeline | NextStep-1.1 | stepfun-ai/NextStep-1.1 | ✅︎ | ✅︎ | ✅︎ | |
MiMoAudioModel | MiMo-Audio-7B-Instruct | XiaomiMiMo/MiMo-Audio-7B-Instruct | ✅︎ | ✅︎ | ||
MiMoV2ASRForCausalLM | MiMo-V2.5-ASR | XiaomiMiMo/MiMo-V2.5-ASR | ✅︎ | ✅︎ | ||
Flux2Pipeline | FLUX.2-dev | black-forest-labs/FLUX.2-dev | ✅︎ | ✅︎ | ||
FishSpeechSlowARForConditionalGeneration | Fish Speech S2 Pro | fishaudio/s2-pro | ✅︎ | ✅︎ | ||
DreamIDOmniPipeline | DreamID-Omni | XuGuo699/DreamID-Omni | ✅︎ | ✅︎ | ||
SenseNovaU1Pipeline | SenseNova-U1 (DiT-only) | SenseNova/SenseNova-U1-8B-MoT | ✅︎ | |||
HunyuanVideo15Pipeline | HunyuanVideo-1.5-T2V | hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_t2v, hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v | ✅︎ | ✅︎ | ||
HunyuanVideo15ImageToVideoPipeline | HunyuanVideo-1.5-I2V | hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-480p_i2v, hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_i2v | ✅︎ | ✅︎ | ||
VoxtralTTSForConditionalGeneration | Voxtral TTS | mistralai/Voxtral-4B-TTS-2603 | ✅︎ | ✅︎ | ||
CovoAudioForConditionalGeneration | Covo-Audio-Chat | tencent/Covo-Audio-Chat | ✅︎ | |||
DyninOmniForConditionalGeneration | Dynin-Omni | snu-aidas/Dynin-Omni | ✅︎ | |||
ErnieImagePipeline | ERNIE-Image | baidu/ERNIE-Image, baidu/ERNIE-Image-Turbo | ✅︎ | ✅︎ | ✅︎ | ✅︎ |
HiDreamImagePipeline | HiDream-I1-Full | HiDream-ai/HiDream-I1-Full | ✅︎ | ✅︎ |
✅︎ indicates the model is supported on that backend. Empty cells mean not listed as supported on that backend.