vllm_omni.config.model ¶
OmniModelArchConfigConvertor ¶
Bases: ModelArchConfigConvertorBase
Config convertor for Omni multi-stage models.
Pre-quantized checkpoints (e.g. modelopt FP8) store quantization config in a stage-specific sub-config (e.g. thinker_config.text_config.quantization_config) with correct relative prefixes. The legacy hf_quant_config.json sits at the top level with "thinker."-prefixed names that don't match vllm-omni's module names.
This convertor accepts an optional stage_config_name so that only the relevant stage's quantization config is surfaced.
OmniModelConfig ¶
Bases: ModelConfig
Configuration for Omni models, extending the base ModelConfig.
This configuration class extends the base vLLM ModelConfig with omni-specific fields for multi-stage pipeline processing.
Attributes: hf_config: The model's HF Transformers config (default: None) hf_text_config: The sub text_config of the model's hf_config (default: None) stage_id: Identifier for the stage in a multi-stage pipeline (default: 0) async_chunk: If set to True, perform async chunk model_stage: Stage type identifier, e.g., "thinker" or "talker" (default: "thinker") model_arch: Model architecture name (default: "Qwen2_5OmniForConditionalGeneration") worker_type: Model Type, e.g., "ar" or "generation" engine_output_type: Optional output type specification for the engine. Used to route outputs to appropriate processors (e.g., "image", "audio", "latents"). If None, output type is inferred. stage_connector_config: Stage connector configuration dictionary. Contains "name" (connector name), "extra" (extra connector config). task_type: Default task type for TTS models (CustomVoice, VoiceDesign, or Base). If not specified, will be inferred from model path.
The correct way to initialize this class is via vLLM config, as most of the logic for handling values is in the ModelConfig's post_init.
Example: >>> config = OmniModelConfig.from_vllm_model_config( ... vllm_config, ... stage_id=0, ... model_stage="thinker", ... model_arch="Qwen2_5OmniForConditionalGeneration" ... )
custom_process_next_stage_input_func class-attribute instance-attribute ¶
custom_process_next_stage_input_func: str | None = None
stage_connector_config class-attribute instance-attribute ¶
stage_connector_config: dict[str, Any] = field(
default_factory=lambda: {
"name": "SharedMemoryConnector",
"extra": {},
}
)
subtalker_sampling_params class-attribute instance-attribute ¶
add_defaults_to_omni_kwargs classmethod ¶
Because we init the OmniModelConfig with new to sidestep expensive validation, we need to be careful to ensure fields with default factories are initialized, otherwise we will get an AttributeError when we use it.
To work around this issue, we explicitly add defaults to the omni_kwargs dict provided to ensure all fields are defined correctly.
NOTE: omni_kwargs are mutated in place.
from_vllm_model_config classmethod ¶
Create OmniModelConfig from an existing vLLM ModelConfig and additional Omni specific kwargs.
NOTE: The validation and post_init for ModelConfig is expensive; to avoid calling it a second time, we explicitly retrieve defaults from dataclass attributes for values not passed to omni_kwargs, and use that to initialize a new instance. This is significantly faster than creating the OmniModelConfig directly from the ModelConfig, and saves us from having to pass all kwargs to the OmniModelConfig.