vllm_omni ¶
vLLM-Omni: Multi-modality models inference and serving with non-autoregressive structures.
This package extends vLLM beyond traditional text-based, autoregressive generation to support multi-modality models with non-autoregressive structures and non-textual outputs.
Architecture: - 🟡 Modified: vLLM components modified for multimodal support - 🔴 Added: New components for multimodal and non-autoregressive processing
Modules:
| Name | Description |
|---|---|
attention | |
config | Configuration module for vLLM-Omni. |
core | |
data_entry_keys | Structured payload types for inter-stage communication. |
diffusion | |
distributed | |
engine | Engine components for vLLM-Omni. |
entrypoints | vLLM-Omni entrypoints module. |
inputs | |
logger | |
lora | |
metrics | |
model_executor | |
outputs | |
patch | |
platforms | |
plugins | |
profiler | |
quantization | Unified quantization framework for vLLM-OMNI. |
request | |
sample | |
tokenizers | |
transformers_utils | |
utils | |
version | Version information for vLLM-Omni. |
worker | |
OmniModelConfig ¶
Bases: ModelConfig
Configuration for Omni models, extending the base ModelConfig.
This configuration class extends the base vLLM ModelConfig with omni-specific fields for multi-stage pipeline processing.
Attributes: hf_config: The model's HF Transformers config (default: None) hf_text_config: The sub text_config of the model's hf_config (default: None) stage_id: Identifier for the stage in a multi-stage pipeline (default: 0) async_chunk: If set to True, perform async chunk model_stage: Stage type identifier, e.g., "thinker" or "talker" (default: "thinker") model_arch: Model architecture name (default: "Qwen2_5OmniForConditionalGeneration") worker_type: Model Type, e.g., "ar" or "generation" engine_output_type: Optional output type specification for the engine. Used to route outputs to appropriate processors (e.g., "image", "audio", "latents"). If None, output type is inferred. stage_connector_config: Stage connector configuration dictionary. Contains "name" (connector name), "extra" (extra connector config). task_type: Default task type for TTS models (CustomVoice, VoiceDesign, or Base). If not specified, will be inferred from model path.
The correct way to initialize this class is via vLLM config, as most of the logic for handling values is in the ModelConfig's post_init.
Example: >>> config = OmniModelConfig.from_vllm_model_config( ... vllm_config, ... stage_id=0, ... model_stage="thinker", ... model_arch="Qwen2_5OmniForConditionalGeneration" ... )
custom_process_next_stage_input_func class-attribute instance-attribute ¶
custom_process_next_stage_input_func: str | None = None
stage_connector_config class-attribute instance-attribute ¶
stage_connector_config: dict[str, Any] = field(
default_factory=lambda: {
"name": "SharedMemoryConnector",
"extra": {},
}
)
subtalker_sampling_params class-attribute instance-attribute ¶
add_defaults_to_omni_kwargs classmethod ¶
Because we init the OmniModelConfig with new to sidestep expensive validation, we need to be careful to ensure fields with default factories are initialized, otherwise we will get an AttributeError when we use it.
To work around this issue, we explicitly add defaults to the omni_kwargs dict provided to ensure all fields are defined correctly.
NOTE: omni_kwargs are mutated in place.
from_vllm_model_config classmethod ¶
Create OmniModelConfig from an existing vLLM ModelConfig and additional Omni specific kwargs.
NOTE: The validation and post_init for ModelConfig is expensive; to avoid calling it a second time, we explicitly retrieve defaults from dataclass attributes for values not passed to omni_kwargs, and use that to initialize a new instance. This is significantly faster than creating the OmniModelConfig directly from the ModelConfig, and saves us from having to pass all kwargs to the OmniModelConfig.