Skip to content

vllm_omni.config.model

logger module-attribute

logger = init_logger(__name__)

OmniModelArchConfigConvertor

Bases: ModelArchConfigConvertorBase

Config convertor for Omni multi-stage models.

Pre-quantized checkpoints (e.g. modelopt FP8) store quantization config in a stage-specific sub-config (e.g. thinker_config.text_config.quantization_config) with correct relative prefixes. The legacy hf_quant_config.json sits at the top level with "thinker."-prefixed names that don't match vllm-omni's module names.

This convertor accepts an optional stage_config_name so that only the relevant stage's quantization config is surfaced.

stage_config_name instance-attribute

stage_config_name = stage_config_name

get_quantization_config

get_quantization_config()

OmniModelConfig

Bases: ModelConfig

Configuration for Omni models, extending the base ModelConfig.

This configuration class extends the base vLLM ModelConfig with omni-specific fields for multi-stage pipeline processing.

Attributes: hf_config: The model's HF Transformers config (default: None) hf_text_config: The sub text_config of the model's hf_config (default: None) stage_id: Identifier for the stage in a multi-stage pipeline (default: 0) async_chunk: If set to True, perform async chunk model_stage: Stage type identifier, e.g., "thinker" or "talker" (default: "thinker") model_arch: Model architecture name (default: "Qwen2_5OmniForConditionalGeneration") worker_type: Model Type, e.g., "ar" or "generation" engine_output_type: Optional output type specification for the engine. Used to route outputs to appropriate processors (e.g., "image", "audio", "latents"). If None, output type is inferred. stage_connector_config: Stage connector configuration dictionary. Contains "name" (connector name), "extra" (extra connector config). task_type: Default task type for TTS models (CustomVoice, VoiceDesign, or Base). If not specified, will be inferred from model path.

The correct way to initialize this class is via vLLM config, as most of the logic for handling values is in the ModelConfig's post_init.

Example: >>> config = OmniModelConfig.from_vllm_model_config( ... vllm_config, ... stage_id=0, ... model_stage="thinker", ... model_arch="Qwen2_5OmniForConditionalGeneration" ... )

active_stream_window class-attribute instance-attribute

active_stream_window: int = 0

architectures property

architectures: list[str]

async_chunk class-attribute instance-attribute

async_chunk: bool = False

codec_frame_rate_hz class-attribute instance-attribute

codec_frame_rate_hz: float | None = None

custom_process_next_stage_input_func class-attribute instance-attribute

custom_process_next_stage_input_func: str | None = None

embedding_size property

embedding_size

enable_sleep_mode class-attribute instance-attribute

enable_sleep_mode: bool = False

engine_output_type class-attribute instance-attribute

engine_output_type: str | None = None

has_sampling_extra_args class-attribute instance-attribute

has_sampling_extra_args: bool = False

hf_config_name class-attribute instance-attribute

hf_config_name: str | None = None

model_arch class-attribute instance-attribute

model_arch: str | None = None

model_stage class-attribute instance-attribute

model_stage: str = 'thinker'

omni_kv_config class-attribute instance-attribute

omni_kv_config: dict | None = None

registry property

registry

stage_connector_config class-attribute instance-attribute

stage_connector_config: dict[str, Any] = field(
    default_factory=lambda: {
        "name": "SharedMemoryConnector",
        "extra": {},
    }
)

stage_id class-attribute instance-attribute

stage_id: int = 0

subtalker_sampling_params class-attribute instance-attribute

subtalker_sampling_params: dict[str, Any] | None = None

task_type class-attribute instance-attribute

task_type: str | None = None

uses_mrope property

uses_mrope: bool

worker_type class-attribute instance-attribute

worker_type: str | None = None

add_defaults_to_omni_kwargs classmethod

add_defaults_to_omni_kwargs(omni_kwargs)

Because we init the OmniModelConfig with new to sidestep expensive validation, we need to be careful to ensure fields with default factories are initialized, otherwise we will get an AttributeError when we use it.

To work around this issue, we explicitly add defaults to the omni_kwargs dict provided to ensure all fields are defined correctly.

NOTE: omni_kwargs are mutated in place.

draw_hf_text_config

draw_hf_text_config()

from_vllm_model_config classmethod

from_vllm_model_config(
    model_config: ModelConfig, **omni_kwargs
)

Create OmniModelConfig from an existing vLLM ModelConfig and additional Omni specific kwargs.

NOTE: The validation and post_init for ModelConfig is expensive; to avoid calling it a second time, we explicitly retrieve defaults from dataclass attributes for values not passed to omni_kwargs, and use that to initialize a new instance. This is significantly faster than creating the OmniModelConfig directly from the ModelConfig, and saves us from having to pass all kwargs to the OmniModelConfig.

get_model_arch_config

get_model_arch_config()