Skip to content

vLLM-Omni ComfyUI Integration

vLLM-Omni offers a ComfyUI integration on top of its online serving API. It can send model inference requests to either a locally running vLLM-Omni service or a remote one.

Requirement

  • Python 3.12 or above
  • ComfyUI installed
  • vLLM-Omni installed on either the same device or another device discoverable via the internet.
  • No need to install additional packages apart from those already required by ComfyUI.

Tip

If you run both ComfyUI and vLLM-Omni on the same device, you can create separate virtual environments and use different Python versions for them.

Installation

Copy the apps/ComfyUI-vLLM-Omni folder to the custom_nodes subfolder of your ComfyUI installation. Your directory should look like ComfyUI/custom_nodes/ComfyUI-vLLM-Omni.

If you are running ComfyUI during copying, you should restart ComfyUI to load this extension.

Tip

You can use utility websites such as https://download-directory.github.io/ to download a subdirectory of a repo. Also checkout community discussions (e.g., https://stackoverflow.com/questions/7106012/download-a-single-folder-or-directory-from-a-github-repository) for more info.

On the device and virtual environment you run ComfyUI, launch ComfyUI with

cd ComfyUI

# The regular way
python main.py

# If you are mainly using this node, launch it faster with
python main.py --cpu

On the device and virtual environment you run vLLM-Omni, start a model service with

vllm serve The_Model_ID_to_Serve --omni --port 8000

Check ComfyUI's sidebar -> Node Library. There should be a new folder named vLLM-Omni. If no, check your shell running the ComfyUI process. There may be some error messages before the line Import times for custom nodes: and the line To see the GUI go to: http://127.0.0.1:8188.

Quickstart

This extension offers the following nodes based on the output modalities:

  • Generate Image for text-to-image and image-to-image tasks
  • Generate Video for text-to-video and image-to-video tasks
  • Multimodality Understanding for multimodality-to-text and multimodality-to-audio tasks
  • TTS and TTS Voice Clone for TTS tasks

This extension also offers example workflows (at ComfyUI sidebar -> Templates -> vLLM-Omni)

Info

The node UI and feature designs are intended to match vLLM-Omni online serving interfaces. It cannot offer more than what the interfaces support.

To build a simple workflow yourself,

  • Drag a generation node onto the canvas.
  • Depending on your need, grab built-in multimedia file loader nodes, such as image->Load Image, image->video->Load Video, audio->Load Audio
  • Depending on your need, grab built-in multimedia file preview nodes, such as image->Preview Image, image->video->Save Video, audio->Preview Audio, utils->Preview as Text.
  • If you want to tune sampling parameters, grab corresponding nodes from vLLM-Omni-> Sampling Params.
    • For multi-stage models, you can connect multiple AR Sampling Params and Diffusion Sampling Params nodes to a Multi-Stage Sampling Params List node, and connect this node to the generation node.
    • For some multi-stage models like BAGEL, only one stage's sampling parameters are exposed and tunable via vLLM-Omni's online serving API. Thus, these models are treated as single-stage ones. Please check the vLLM-Omni documentation on how to correctly set each model's sampling parameters.
    • For multi-stage models where all stages are either autoregression or diffusion, you can also connect only a single Sampling Params node, indicating that this set of sampling parameters will be used for all stages.

Examples & Screenshots

Please read the ComfyUI integration's README for more info.