vllm_omni ¶

vLLM-Omni: Multi-modality models inference and serving with non-autoregressive structures.

This package extends vLLM beyond traditional text-based, autoregressive generation to support multi-modality models with non-autoregressive structures and non-textual outputs.

Architecture: - 🟡 Modified: vLLM components modified for multimodal support - 🔴 Added: New components for multimodal and non-autoregressive processing

Modules:

Name	Description
`attention`
`config`	Configuration module for vLLM-Omni.
`core`
`data_entry_keys`	Structured payload types for inter-stage communication.
`diffusion`
`distributed`
`engine`	Engine components for vLLM-Omni.
`entrypoints`	vLLM-Omni entrypoints module.
`errors`	Request-scoped client error types shared across vLLM-Omni entrypoints.
`experimental`	Experimental vLLM-Omni subsystems.
`inputs`
`logger`
`lora`
`metrics`
`model_executor`
`model_extras`
`outputs`
`patch`
`platforms`
`plugins`
`profiler`
`quantization`	Unified quantization framework for vLLM-OMNI.
`reasoning`
`request`
`sample`
`tokenizers`
`transformers_utils`
`utils`
`version`	Version information for vLLM-Omni.
`worker`

OmniModelConfig ¶

Bases: ModelConfig

Configuration for Omni models, extending the base ModelConfig.

This configuration class extends the base vLLM ModelConfig with omni-specific fields for multi-stage pipeline processing.

Attributes: hf_config: The model's HF Transformers config (default: None) hf_text_config: The sub text_config of the model's hf_config (default: None) stage_id: Identifier for the stage in a multi-stage pipeline (default: 0) async_chunk: If set to True, perform async chunk model_stage: Stage type identifier, e.g., "thinker" or "talker" (default: "thinker") model_arch: Model architecture name (default: "Qwen2_5OmniForConditionalGeneration") worker_type: Model Type, e.g., "ar" or "generation" engine_output_type: Optional output type specification for the engine. Used to route outputs to appropriate processors (e.g., "image", "audio", "latents"). If None, output type is inferred. stage_connector_config: Stage connector configuration dictionary. Contains "name" (connector name), "extra" (extra connector config). task_type: Default task type for TTS models (CustomVoice, VoiceDesign, or Base). If not specified, will be inferred from model path.

The correct way to initialize this class is via vLLM config, as most of the logic for handling values is in the ModelConfig's post_init.

Example: >>> config = OmniModelConfig.from_vllm_model_config( ... vllm_config, ... stage_id=0, ... model_stage="thinker", ... model_arch="Qwen2_5OmniForConditionalGeneration" ... )

active_stream_window `class-attribute` `instance-attribute` ¶

active_stream_window: int = 0

architectures `property` ¶

architectures: list[str]

async_chunk `class-attribute` `instance-attribute` ¶

async_chunk: bool = False

codec_frame_rate_hz `class-attribute` `instance-attribute` ¶

codec_frame_rate_hz: float | None = None

custom_process_next_stage_input_func `class-attribute` `instance-attribute` ¶

custom_process_next_stage_input_func: str | None = None

embedding_size `property` ¶

embedding_size

enable_sleep_mode `class-attribute` `instance-attribute` ¶

enable_sleep_mode: bool = False

engine_output_type `class-attribute` `instance-attribute` ¶

engine_output_type: str | None = None

has_sampling_extra_args `class-attribute` `instance-attribute` ¶

has_sampling_extra_args: bool = False

hf_config_name `class-attribute` `instance-attribute` ¶

hf_config_name: str | None = None

model_arch `class-attribute` `instance-attribute` ¶

model_arch: str | None = None

model_stage `class-attribute` `instance-attribute` ¶

model_stage: str = 'thinker'

omni_kv_config `class-attribute` `instance-attribute` ¶

omni_kv_config: dict | None = None

registry `property` ¶

registry

stage_connector_config `class-attribute` `instance-attribute` ¶

stage_connector_config: dict[str, Any] = field(
    default_factory=lambda: {
        "name": "SharedMemoryConnector",
        "extra": {},
    }
)

stage_id `class-attribute` `instance-attribute` ¶

stage_id: int = 0

subtalker_sampling_params `class-attribute` `instance-attribute` ¶

subtalker_sampling_params: dict[str, Any] | None = None

task_type `class-attribute` `instance-attribute` ¶

task_type: str | None = None

uses_mrope `property` ¶

uses_mrope: bool

worker_type `class-attribute` `instance-attribute` ¶

worker_type: str | None = None

add_defaults_to_omni_kwargs `classmethod` ¶

add_defaults_to_omni_kwargs(omni_kwargs)

Because we init the OmniModelConfig with new to sidestep expensive validation, we need to be careful to ensure fields with default factories are initialized, otherwise we will get an AttributeError when we use it.

To work around this issue, we explicitly add defaults to the omni_kwargs dict provided to ensure all fields are defined correctly.

NOTE: omni_kwargs are mutated in place.

draw_hf_text_config ¶

draw_hf_text_config()

from_vllm_model_config `classmethod` ¶

from_vllm_model_config(
    model_config: ModelConfig, **omni_kwargs
)

Create OmniModelConfig from an existing vLLM ModelConfig and additional Omni specific kwargs.

NOTE: The validation and post_init for ModelConfig is expensive; to avoid calling it a second time, we explicitly retrieve defaults from dataclass attributes for values not passed to omni_kwargs, and use that to initialize a new instance. This is significantly faster than creating the OmniModelConfig directly from the ModelConfig, and saves us from having to pass all kwargs to the OmniModelConfig.

get_model_arch_config ¶

get_model_arch_config()

vllm_omni ¶

OmniModelConfig ¶

active_stream_window class-attribute instance-attribute ¶

architectures property ¶

async_chunk class-attribute instance-attribute ¶

codec_frame_rate_hz class-attribute instance-attribute ¶

custom_process_next_stage_input_func class-attribute instance-attribute ¶

embedding_size property ¶

enable_sleep_mode class-attribute instance-attribute ¶

engine_output_type class-attribute instance-attribute ¶

has_sampling_extra_args class-attribute instance-attribute ¶

hf_config_name class-attribute instance-attribute ¶

model_arch class-attribute instance-attribute ¶

model_stage class-attribute instance-attribute ¶

omni_kv_config class-attribute instance-attribute ¶

registry property ¶

stage_connector_config class-attribute instance-attribute ¶

stage_id class-attribute instance-attribute ¶

subtalker_sampling_params class-attribute instance-attribute ¶

task_type class-attribute instance-attribute ¶

uses_mrope property ¶

worker_type class-attribute instance-attribute ¶

add_defaults_to_omni_kwargs classmethod ¶