Skip to content

vllm_omni

vLLM-Omni: Multi-modality models inference and serving with non-autoregressive structures.

This package extends vLLM beyond traditional text-based, autoregressive generation to support multi-modality models with non-autoregressive structures and non-textual outputs.

Architecture: - 🟡 Modified: vLLM components modified for multimodal support - 🔴 Added: New components for multimodal and non-autoregressive processing

Modules:

Name Description
attention
config

Configuration module for vLLM-Omni.

core
data_entry_keys

Structured payload types for inter-stage communication.

diffusion
distributed
engine

Engine components for vLLM-Omni.

entrypoints

vLLM-Omni entrypoints module.

inputs
logger
lora
metrics
model_executor
outputs
patch
platforms
plugins
profiler
quantization

Unified quantization framework for vLLM-OMNI.

request
sample
tokenizers
transformers_utils
utils
version

Version information for vLLM-Omni.

worker

OmniModelConfig

Bases: ModelConfig

Configuration for Omni models, extending the base ModelConfig.

This configuration class extends the base vLLM ModelConfig with omni-specific fields for multi-stage pipeline processing.

Attributes: hf_config: The model's HF Transformers config (default: None) hf_text_config: The sub text_config of the model's hf_config (default: None) stage_id: Identifier for the stage in a multi-stage pipeline (default: 0) async_chunk: If set to True, perform async chunk model_stage: Stage type identifier, e.g., "thinker" or "talker" (default: "thinker") model_arch: Model architecture name (default: "Qwen2_5OmniForConditionalGeneration") worker_type: Model Type, e.g., "ar" or "generation" engine_output_type: Optional output type specification for the engine. Used to route outputs to appropriate processors (e.g., "image", "audio", "latents"). If None, output type is inferred. stage_connector_config: Stage connector configuration dictionary. Contains "name" (connector name), "extra" (extra connector config). task_type: Default task type for TTS models (CustomVoice, VoiceDesign, or Base). If not specified, will be inferred from model path.

The correct way to initialize this class is via vLLM config, as most of the logic for handling values is in the ModelConfig's post_init.

Example: >>> config = OmniModelConfig.from_vllm_model_config( ... vllm_config, ... stage_id=0, ... model_stage="thinker", ... model_arch="Qwen2_5OmniForConditionalGeneration" ... )

active_stream_window class-attribute instance-attribute

active_stream_window: int = 0

architectures property

architectures: list[str]

async_chunk class-attribute instance-attribute

async_chunk: bool = False

codec_frame_rate_hz class-attribute instance-attribute

codec_frame_rate_hz: float | None = None

custom_process_next_stage_input_func class-attribute instance-attribute

custom_process_next_stage_input_func: str | None = None

embedding_size property

embedding_size

enable_sleep_mode class-attribute instance-attribute

enable_sleep_mode: bool = False

engine_output_type class-attribute instance-attribute

engine_output_type: str | None = None

has_sampling_extra_args class-attribute instance-attribute

has_sampling_extra_args: bool = False

hf_config_name class-attribute instance-attribute

hf_config_name: str | None = None

model_arch class-attribute instance-attribute

model_arch: str | None = None

model_stage class-attribute instance-attribute

model_stage: str = 'thinker'

omni_kv_config class-attribute instance-attribute

omni_kv_config: dict | None = None

registry property

registry

stage_connector_config class-attribute instance-attribute

stage_connector_config: dict[str, Any] = field(
    default_factory=lambda: {
        "name": "SharedMemoryConnector",
        "extra": {},
    }
)

stage_id class-attribute instance-attribute

stage_id: int = 0

subtalker_sampling_params class-attribute instance-attribute

subtalker_sampling_params: dict[str, Any] | None = None

task_type class-attribute instance-attribute

task_type: str | None = None

uses_mrope property

uses_mrope: bool

worker_type class-attribute instance-attribute

worker_type: str | None = None

add_defaults_to_omni_kwargs classmethod

add_defaults_to_omni_kwargs(omni_kwargs)

Because we init the OmniModelConfig with new to sidestep expensive validation, we need to be careful to ensure fields with default factories are initialized, otherwise we will get an AttributeError when we use it.

To work around this issue, we explicitly add defaults to the omni_kwargs dict provided to ensure all fields are defined correctly.

NOTE: omni_kwargs are mutated in place.

draw_hf_text_config

draw_hf_text_config()

from_vllm_model_config classmethod

from_vllm_model_config(
    model_config: ModelConfig, **omni_kwargs
)

Create OmniModelConfig from an existing vLLM ModelConfig and additional Omni specific kwargs.

NOTE: The validation and post_init for ModelConfig is expensive; to avoid calling it a second time, we explicitly retrieve defaults from dataclass attributes for values not passed to omni_kwargs, and use that to initialize a new instance. This is significantly faster than creating the OmniModelConfig directly from the ModelConfig, and saves us from having to pass all kwargs to the OmniModelConfig.

get_model_arch_config

get_model_arch_config()