vllm_omni.config ¶
Configuration module for vLLM-Omni.
Modules:
| Name | Description |
|---|---|
lora | |
model | |
pipeline_registry | Central declarative registry of all vllm-omni pipelines. |
stage_config | Stage configuration system for vLLM-Omni. |
yaml_util | Centralized OmegaConf wrapper for vLLM-Omni. |
DeployConfig dataclass ¶
Loaded from deploy/
Top-level fields (trust_remote_code, distributed_executor_backend, dtype, quantization, enable_prefix_caching, enable_chunked_prefill, data_parallel_size, pipeline_parallel_size) are pipeline-wide: they apply uniformly to every stage. Fields that legitimately vary per stage live in the individual StageDeployConfig entries under stages:.
distributed_executor_backend class-attribute instance-attribute ¶
distributed_executor_backend: str | None = None
enable_chunked_prefill class-attribute instance-attribute ¶
enable_chunked_prefill: bool | None = None
enable_prefix_caching class-attribute instance-attribute ¶
enable_prefix_caching: bool | None = None
pipeline_parallel_size class-attribute instance-attribute ¶
pipeline_parallel_size: int | None = None
stages class-attribute instance-attribute ¶
stages: list[StageDeployConfig] = field(
default_factory=list
)
ModelPipeline dataclass ¶
Complete pipeline definition for a multi-stage model (legacy).
TODO(@lishunyang12): remove once all models migrate to PipelineConfig.
get_stage ¶
get_stage(stage_id: int) -> StageConfig | None
Look up a stage by its ID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
stage_id | int | The stage ID to search for. | required |
Returns:
| Type | Description |
|---|---|
StageConfig | None | The matching StageConfig, or None if not found. |
validate_pipeline ¶
Validate pipeline topology at model integration time (not runtime).
Checks: - All stage IDs are unique - All input_sources reference valid stage IDs - At least one entry point (stage with empty input_sources)
Returns:
| Type | Description |
|---|---|
list[str] | List of validation error messages. Empty list if valid. |
OmniModelConfig ¶
Bases: ModelConfig
Configuration for Omni models, extending the base ModelConfig.
This configuration class extends the base vLLM ModelConfig with omni-specific fields for multi-stage pipeline processing.
Attributes: hf_config: The model's HF Transformers config (default: None) hf_text_config: The sub text_config of the model's hf_config (default: None) stage_id: Identifier for the stage in a multi-stage pipeline (default: 0) async_chunk: If set to True, perform async chunk model_stage: Stage type identifier, e.g., "thinker" or "talker" (default: "thinker") model_arch: Model architecture name (default: "Qwen2_5OmniForConditionalGeneration") worker_type: Model Type, e.g., "ar" or "generation" engine_output_type: Optional output type specification for the engine. Used to route outputs to appropriate processors (e.g., "image", "audio", "latents"). If None, output type is inferred. stage_connector_config: Stage connector configuration dictionary. Contains "name" (connector name), "extra" (extra connector config). task_type: Default task type for TTS models (CustomVoice, VoiceDesign, or Base). If not specified, will be inferred from model path.
The correct way to initialize this class is via vLLM config, as most of the logic for handling values is in the ModelConfig's post_init.
Example: >>> config = OmniModelConfig.from_vllm_model_config( ... vllm_config, ... stage_id=0, ... model_stage="thinker", ... model_arch="Qwen2_5OmniForConditionalGeneration" ... )
custom_process_next_stage_input_func class-attribute instance-attribute ¶
custom_process_next_stage_input_func: str | None = None
stage_connector_config class-attribute instance-attribute ¶
stage_connector_config: dict[str, Any] = field(
default_factory=lambda: {
"name": "SharedMemoryConnector",
"extra": {},
}
)
subtalker_sampling_params class-attribute instance-attribute ¶
add_defaults_to_omni_kwargs classmethod ¶
Because we init the OmniModelConfig with new to sidestep expensive validation, we need to be careful to ensure fields with default factories are initialized, otherwise we will get an AttributeError when we use it.
To work around this issue, we explicitly add defaults to the omni_kwargs dict provided to ensure all fields are defined correctly.
NOTE: omni_kwargs are mutated in place.
from_vllm_model_config classmethod ¶
Create OmniModelConfig from an existing vLLM ModelConfig and additional Omni specific kwargs.
NOTE: The validation and post_init for ModelConfig is expensive; to avoid calling it a second time, we explicitly retrieve defaults from dataclass attributes for values not passed to omni_kwargs, and use that to initialize a new instance. This is significantly faster than creating the OmniModelConfig directly from the ModelConfig, and saves us from having to pass all kwargs to the OmniModelConfig.
PipelineConfig dataclass ¶
Complete pipeline topology for a model (frozen).
hf_config_predicate class-attribute instance-attribute ¶
StageConfig dataclass ¶
Per-stage config (legacy path). Used by both new and legacy loaders.
TODO(@lishunyang12): replace with ResolvedStageConfig once all models are migrated.
custom_process_input_func class-attribute instance-attribute ¶
custom_process_input_func: str | None = None
input_sources class-attribute instance-attribute ¶
runtime_overrides class-attribute instance-attribute ¶
yaml_engine_args class-attribute instance-attribute ¶
yaml_extras class-attribute instance-attribute ¶
yaml_runtime class-attribute instance-attribute ¶
StageConfigFactory ¶
Factory that loads pipeline YAML and merges CLI overrides.
Handles both single-stage and multi-stage models.
Pipelines are declared in vllm_omni/config/pipeline_registry.py and loaded lazily via _PIPELINE_REGISTRY; no hardcoded model-type → directory mapping is maintained here. Models with generic HF model_type collisions (e.g. MiMo Audio reports qwen2) should declare hf_architectures=(...) on their PipelineConfig so the factory can disambiguate via hf_config.architectures.
create_default_diffusion classmethod ¶
Single-stage diffusion - no YAML needed.
Creates a default diffusion stage configuration for single-stage diffusion models. Returns a legacy OmegaConf-compatible dict for backward compatibility with OmniStage.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kwargs | dict[str, Any] | Engine arguments from CLI/API. | required |
Returns:
| Type | Description |
|---|---|
list[dict[str, Any]] | List containing a single config dict for the diffusion stage. |
create_from_model classmethod ¶
create_from_model(
model: str,
cli_overrides: dict[str, Any] | None = None,
deploy_config_path: str | None = None,
) -> list[StageConfig] | None
Load pipeline + deploy config, merge with CLI overrides.
Checks _PIPELINE_REGISTRY first (new path), falls back to legacy YAML.
StageDeployConfig dataclass ¶
Per-stage deployment knobs.
Only fields whose value legitimately varies across stages of the same pipeline live here (e.g. max_num_seqs on thinker vs talker, devices for GPU placement). Pipeline-wide settings (trust_remote_code, distributed_executor_backend, dtype, quantization, prefix/chunked prefill, DP/PP sizes) are declared at the top level of DeployConfig and propagated to every stage.
compilation_config class-attribute instance-attribute ¶
default_sampling_params class-attribute instance-attribute ¶
disable_hybrid_kv_cache_manager class-attribute instance-attribute ¶
disable_hybrid_kv_cache_manager: bool | None = None
enable_expert_parallel class-attribute instance-attribute ¶
enable_expert_parallel: bool | None = None
enable_flashinfer_autotune class-attribute instance-attribute ¶
enable_flashinfer_autotune: bool | None = None
engine_extras class-attribute instance-attribute ¶
gpu_memory_utilization class-attribute instance-attribute ¶
gpu_memory_utilization: float | None = None
input_connectors class-attribute instance-attribute ¶
max_num_batched_tokens class-attribute instance-attribute ¶
max_num_batched_tokens: int | None = None
mm_processor_cache_gb class-attribute instance-attribute ¶
mm_processor_cache_gb: float | None = None
output_connectors class-attribute instance-attribute ¶
sequence_parallel_size class-attribute instance-attribute ¶
sequence_parallel_size: int | None = None
subtalker_sampling_params class-attribute instance-attribute ¶
StageExecutionType ¶
StagePipelineConfig dataclass ¶
Fixed topology for one stage (frozen, not user-configurable).
async_chunk_process_next_stage_input_func class-attribute instance-attribute ¶
async_chunk_process_next_stage_input_func: str | None = None
custom_process_input_func class-attribute instance-attribute ¶
custom_process_input_func: str | None = None
custom_process_next_stage_input_func class-attribute instance-attribute ¶
custom_process_next_stage_input_func: str | None = None
requires_multimodal_data class-attribute instance-attribute ¶
requires_multimodal_data: bool = False
sampling_constraints class-attribute instance-attribute ¶
StageType ¶
create_config ¶
create_config(data: Any) -> DictConfig
Wrap a dict (or list) into a DictConfig.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data | Any | Dict, list, or other structure to wrap. | required |
Returns:
| Type | Description |
|---|---|
DictConfig | OmegaConf DictConfig / ListConfig. |
load_deploy_config ¶
load_deploy_config(path: str | Path) -> DeployConfig
Load a deploy YAML (with optional base_config inheritance).
load_yaml_config ¶
merge_configs ¶
merge_pipeline_deploy ¶
merge_pipeline_deploy(
pipeline: PipelineConfig,
deploy: DeployConfig,
cli_overrides: dict[str, Any] | None = None,
) -> list[StageConfig]
Merge pipeline + deploy + platform overrides → list[StageConfig].
register_pipeline ¶
register_pipeline(pipeline: PipelineConfig) -> None
Register a pipeline config dynamically.
In-tree pipelines are declared in pipeline_registry._OMNI_PIPELINES and loaded lazily; calling register_pipeline is only needed for out-of-tree plugins or tests that build a PipelineConfig at runtime. A dynamic registration overrides the central-registry entry with the same model_type.