Skip to content

vllm_omni.config

Configuration module for vLLM-Omni.

Modules:

Name Description
lora
model
pipeline_registry

Central declarative registry of all vllm-omni pipelines.

stage_config

Stage configuration system for vLLM-Omni.

yaml_util

Centralized OmegaConf wrapper for vLLM-Omni.

DeployConfig dataclass

Loaded from deploy/.yaml — the only config file users edit.

Top-level fields (trust_remote_code, distributed_executor_backend, dtype, quantization, enable_prefix_caching, enable_chunked_prefill, data_parallel_size, pipeline_parallel_size) are pipeline-wide: they apply uniformly to every stage. Fields that legitimately vary per stage live in the individual StageDeployConfig entries under stages:.

active_stream_window class-attribute instance-attribute

active_stream_window: int = 0

async_chunk class-attribute instance-attribute

async_chunk: bool = True

connectors class-attribute instance-attribute

connectors: dict[str, Any] | None = None

custom_voice_dir class-attribute instance-attribute

custom_voice_dir: str | None = None

data_parallel_size class-attribute instance-attribute

data_parallel_size: int | None = None

distributed_executor_backend class-attribute instance-attribute

distributed_executor_backend: str | None = None

dtype class-attribute instance-attribute

dtype: str | None = None

edges class-attribute instance-attribute

edges: list[dict[str, Any]] | None = None

enable_chunked_prefill class-attribute instance-attribute

enable_chunked_prefill: bool | None = None

enable_prefix_caching class-attribute instance-attribute

enable_prefix_caching: bool | None = None

pipeline class-attribute instance-attribute

pipeline: str | None = None

pipeline_parallel_size class-attribute instance-attribute

pipeline_parallel_size: int | None = None

platforms class-attribute instance-attribute

platforms: dict[str, Any] | None = None

quantization class-attribute instance-attribute

quantization: str | None = None

stages class-attribute instance-attribute

stages: list[StageDeployConfig] = field(
    default_factory=list
)

trust_remote_code class-attribute instance-attribute

trust_remote_code: bool | None = None

ModelPipeline dataclass

Complete pipeline definition for a multi-stage model (legacy).

TODO(@lishunyang12): remove once all models migrate to PipelineConfig.

async_chunk class-attribute instance-attribute

async_chunk: bool = False

connectors class-attribute instance-attribute

connectors: dict[str, Any] | None = None

edges class-attribute instance-attribute

edges: list[dict[str, Any]] | None = None

model_type instance-attribute

model_type: str

stages instance-attribute

stages: list[StageConfig]

get_stage

get_stage(stage_id: int) -> StageConfig | None

Look up a stage by its ID.

Parameters:

Name Type Description Default
stage_id int

The stage ID to search for.

required

Returns:

Type Description
StageConfig | None

The matching StageConfig, or None if not found.

validate_pipeline

validate_pipeline() -> list[str]

Validate pipeline topology at model integration time (not runtime).

Checks: - All stage IDs are unique - All input_sources reference valid stage IDs - At least one entry point (stage with empty input_sources)

Returns:

Type Description
list[str]

List of validation error messages. Empty list if valid.

OmniModelConfig

Bases: ModelConfig

Configuration for Omni models, extending the base ModelConfig.

This configuration class extends the base vLLM ModelConfig with omni-specific fields for multi-stage pipeline processing.

Attributes: hf_config: The model's HF Transformers config (default: None) hf_text_config: The sub text_config of the model's hf_config (default: None) stage_id: Identifier for the stage in a multi-stage pipeline (default: 0) async_chunk: If set to True, perform async chunk model_stage: Stage type identifier, e.g., "thinker" or "talker" (default: "thinker") model_arch: Model architecture name (default: "Qwen2_5OmniForConditionalGeneration") worker_type: Model Type, e.g., "ar" or "generation" engine_output_type: Optional output type specification for the engine. Used to route outputs to appropriate processors (e.g., "image", "audio", "latents"). If None, output type is inferred. stage_connector_config: Stage connector configuration dictionary. Contains "name" (connector name), "extra" (extra connector config). task_type: Default task type for TTS models (CustomVoice, VoiceDesign, or Base). If not specified, will be inferred from model path.

The correct way to initialize this class is via vLLM config, as most of the logic for handling values is in the ModelConfig's post_init.

Example: >>> config = OmniModelConfig.from_vllm_model_config( ... vllm_config, ... stage_id=0, ... model_stage="thinker", ... model_arch="Qwen2_5OmniForConditionalGeneration" ... )

active_stream_window class-attribute instance-attribute

active_stream_window: int = 0

architectures property

architectures: list[str]

async_chunk class-attribute instance-attribute

async_chunk: bool = False

codec_frame_rate_hz class-attribute instance-attribute

codec_frame_rate_hz: float | None = None

custom_process_next_stage_input_func class-attribute instance-attribute

custom_process_next_stage_input_func: str | None = None

embedding_size property

embedding_size

enable_sleep_mode class-attribute instance-attribute

enable_sleep_mode: bool = False

engine_output_type class-attribute instance-attribute

engine_output_type: str | None = None

has_sampling_extra_args class-attribute instance-attribute

has_sampling_extra_args: bool = False

hf_config_name class-attribute instance-attribute

hf_config_name: str | None = None

model_arch class-attribute instance-attribute

model_arch: str | None = None

model_stage class-attribute instance-attribute

model_stage: str = 'thinker'

omni_kv_config class-attribute instance-attribute

omni_kv_config: dict | None = None

registry property

registry

stage_connector_config class-attribute instance-attribute

stage_connector_config: dict[str, Any] = field(
    default_factory=lambda: {
        "name": "SharedMemoryConnector",
        "extra": {},
    }
)

stage_id class-attribute instance-attribute

stage_id: int = 0

subtalker_sampling_params class-attribute instance-attribute

subtalker_sampling_params: dict[str, Any] | None = None

task_type class-attribute instance-attribute

task_type: str | None = None

uses_mrope property

uses_mrope: bool

worker_type class-attribute instance-attribute

worker_type: str | None = None

add_defaults_to_omni_kwargs classmethod

add_defaults_to_omni_kwargs(omni_kwargs)

Because we init the OmniModelConfig with new to sidestep expensive validation, we need to be careful to ensure fields with default factories are initialized, otherwise we will get an AttributeError when we use it.

To work around this issue, we explicitly add defaults to the omni_kwargs dict provided to ensure all fields are defined correctly.

NOTE: omni_kwargs are mutated in place.

draw_hf_text_config

draw_hf_text_config()

from_vllm_model_config classmethod

from_vllm_model_config(
    model_config: ModelConfig, **omni_kwargs
)

Create OmniModelConfig from an existing vLLM ModelConfig and additional Omni specific kwargs.

NOTE: The validation and post_init for ModelConfig is expensive; to avoid calling it a second time, we explicitly retrieve defaults from dataclass attributes for values not passed to omni_kwargs, and use that to initialize a new instance. This is significantly faster than creating the OmniModelConfig directly from the ModelConfig, and saves us from having to pass all kwargs to the OmniModelConfig.

get_model_arch_config

get_model_arch_config()

PipelineConfig dataclass

Complete pipeline topology for a model (frozen).

diffusers_class_name class-attribute instance-attribute

diffusers_class_name: str | None = None

hf_architectures class-attribute instance-attribute

hf_architectures: tuple[str, ...] = ()

hf_config_predicate class-attribute instance-attribute

hf_config_predicate: Callable[[Any], bool] | None = None

model_arch class-attribute instance-attribute

model_arch: str = ''

model_type instance-attribute

model_type: str

stages class-attribute instance-attribute

stages: tuple[StagePipelineConfig, ...] = ()

get_stage

get_stage(stage_id: int) -> StagePipelineConfig | None

Look up a stage by its ID.

validate

validate() -> list[str]

Return list of topology errors (empty if valid).

StageConfig dataclass

Per-stage config (legacy path). Used by both new and legacy loaders.

TODO(@lishunyang12): replace with ResolvedStageConfig once all models are migrated.

custom_process_input_func class-attribute instance-attribute

custom_process_input_func: str | None = None

final_output class-attribute instance-attribute

final_output: bool = False

final_output_type class-attribute instance-attribute

final_output_type: str | None = None

hf_config_name class-attribute instance-attribute

hf_config_name: str | None = None

input_sources class-attribute instance-attribute

input_sources: list[int] = field(default_factory=list)

is_comprehension class-attribute instance-attribute

is_comprehension: bool = False

model_stage instance-attribute

model_stage: str

runtime_overrides class-attribute instance-attribute

runtime_overrides: dict[str, Any] = field(
    default_factory=dict
)

scheduler_cls class-attribute instance-attribute

scheduler_cls: str | None = None

stage_id instance-attribute

stage_id: int

stage_type class-attribute instance-attribute

stage_type: StageType = LLM

worker_type class-attribute instance-attribute

worker_type: str | None = None

yaml_engine_args class-attribute instance-attribute

yaml_engine_args: dict[str, Any] = field(
    default_factory=dict
)

yaml_extras class-attribute instance-attribute

yaml_extras: dict[str, Any] = field(default_factory=dict)

yaml_runtime class-attribute instance-attribute

yaml_runtime: dict[str, Any] = field(default_factory=dict)

to_omegaconf

to_omegaconf() -> Any

TODO(@lishunyang12): remove once engine consumes ResolvedStageConfig directly.

StageConfigFactory

Factory that loads pipeline YAML and merges CLI overrides.

Handles both single-stage and multi-stage models.

Pipelines are declared in vllm_omni/config/pipeline_registry.py and loaded lazily via _PIPELINE_REGISTRY; no hardcoded model-type → directory mapping is maintained here. Models with generic HF model_type collisions (e.g. MiMo Audio reports qwen2) should declare hf_architectures=(...) on their PipelineConfig so the factory can disambiguate via hf_config.architectures.

create_default_diffusion classmethod

create_default_diffusion(
    kwargs: dict[str, Any],
) -> list[dict[str, Any]]

Single-stage diffusion - no YAML needed.

Creates a default diffusion stage configuration for single-stage diffusion models. Returns a legacy OmegaConf-compatible dict for backward compatibility with OmniStage.

Parameters:

Name Type Description Default
kwargs dict[str, Any]

Engine arguments from CLI/API.

required

Returns:

Type Description
list[dict[str, Any]]

List containing a single config dict for the diffusion stage.

create_from_model classmethod

create_from_model(
    model: str,
    cli_overrides: dict[str, Any] | None = None,
    deploy_config_path: str | None = None,
) -> list[StageConfig] | None

Load pipeline + deploy config, merge with CLI overrides.

Checks _PIPELINE_REGISTRY first (new path), falls back to legacy YAML.

StageDeployConfig dataclass

Per-stage deployment knobs.

Only fields whose value legitimately varies across stages of the same pipeline live here (e.g. max_num_seqs on thinker vs talker, devices for GPU placement). Pipeline-wide settings (trust_remote_code, distributed_executor_backend, dtype, quantization, prefix/chunked prefill, DP/PP sizes) are declared at the top level of DeployConfig and propagated to every stage.

async_scheduling class-attribute instance-attribute

async_scheduling: bool | None = None

cfg_parallel_size class-attribute instance-attribute

cfg_parallel_size: int | None = None

compilation_config class-attribute instance-attribute

compilation_config: dict[str, Any] | None = None

config_format class-attribute instance-attribute

config_format: str | None = None

default_sampling_params class-attribute instance-attribute

default_sampling_params: dict[str, Any] | None = None

devices class-attribute instance-attribute

devices: str | None = None

disable_hybrid_kv_cache_manager class-attribute instance-attribute

disable_hybrid_kv_cache_manager: bool | None = None

enable_expert_parallel class-attribute instance-attribute

enable_expert_parallel: bool | None = None

enable_flashinfer_autotune class-attribute instance-attribute

enable_flashinfer_autotune: bool | None = None

enforce_eager class-attribute instance-attribute

enforce_eager: bool | None = None

engine_extras class-attribute instance-attribute

engine_extras: dict[str, Any] = field(default_factory=dict)

env class-attribute instance-attribute

env: dict[str, Any] | None = None

gpu_memory_utilization class-attribute instance-attribute

gpu_memory_utilization: float | None = None

hsdp_replicate_size class-attribute instance-attribute

hsdp_replicate_size: int | None = None

hsdp_shard_size class-attribute instance-attribute

hsdp_shard_size: int | None = None

input_connectors class-attribute instance-attribute

input_connectors: dict[str, str] | None = None

load_format class-attribute instance-attribute

load_format: str | None = None

max_model_len class-attribute instance-attribute

max_model_len: int | None = None

max_num_batched_tokens class-attribute instance-attribute

max_num_batched_tokens: int | None = None

max_num_seqs class-attribute instance-attribute

max_num_seqs: int | None = None

mm_processor_cache_gb class-attribute instance-attribute

mm_processor_cache_gb: float | None = None

num_replicas class-attribute instance-attribute

num_replicas: int = 1

output_connectors class-attribute instance-attribute

output_connectors: dict[str, str] | None = None

profiler_config class-attribute instance-attribute

profiler_config: dict[str, Any] | None = None

ring_degree class-attribute instance-attribute

ring_degree: int | None = None

sequence_parallel_size class-attribute instance-attribute

sequence_parallel_size: int | None = None

skip_mm_profiling class-attribute instance-attribute

skip_mm_profiling: bool | None = None

stage_id instance-attribute

stage_id: int

subtalker_sampling_params class-attribute instance-attribute

subtalker_sampling_params: dict[str, Any] | None = None

tensor_parallel_size class-attribute instance-attribute

tensor_parallel_size: int | None = None

tokenizer_mode class-attribute instance-attribute

tokenizer_mode: str | None = None

ulysses_degree class-attribute instance-attribute

ulysses_degree: int | None = None

ulysses_mode class-attribute instance-attribute

ulysses_mode: str | None = None

use_hsdp class-attribute instance-attribute

use_hsdp: bool | None = None

vae_patch_parallel_size class-attribute instance-attribute

vae_patch_parallel_size: int | None = None

StageExecutionType

Bases: str, Enum

Merged StageType + WorkerType — 3 combinations today.

DIFFUSION class-attribute instance-attribute

DIFFUSION = 'diffusion'

LLM_AR class-attribute instance-attribute

LLM_AR = 'llm_ar'

LLM_GENERATION class-attribute instance-attribute

LLM_GENERATION = 'llm_generation'

StagePipelineConfig dataclass

Fixed topology for one stage (frozen, not user-configurable).

async_chunk_process_next_stage_input_func class-attribute instance-attribute

async_chunk_process_next_stage_input_func: str | None = None

cfg_kv_collect_func class-attribute instance-attribute

cfg_kv_collect_func: str | None = None

custom_process_input_func class-attribute instance-attribute

custom_process_input_func: str | None = None

custom_process_next_stage_input_func class-attribute instance-attribute

custom_process_next_stage_input_func: str | None = None

engine_output_type class-attribute instance-attribute

engine_output_type: str | None = None

execution_type class-attribute instance-attribute

execution_type: StageExecutionType = LLM_AR

extras class-attribute instance-attribute

extras: dict[str, Any] = field(default_factory=dict)

final_output class-attribute instance-attribute

final_output: bool = False

final_output_type class-attribute instance-attribute

final_output_type: str | None = None

hf_config_name class-attribute instance-attribute

hf_config_name: str | None = None

input_sources class-attribute instance-attribute

input_sources: tuple[int, ...] = ()

model_arch class-attribute instance-attribute

model_arch: str | None = None

model_stage instance-attribute

model_stage: str

model_subdir class-attribute instance-attribute

model_subdir: str | None = None

omni_kv_config class-attribute instance-attribute

omni_kv_config: dict[str, Any] | None = None

owns_tokenizer class-attribute instance-attribute

owns_tokenizer: bool = False

prompt_expand_func class-attribute instance-attribute

prompt_expand_func: str | None = None

requires_multimodal_data class-attribute instance-attribute

requires_multimodal_data: bool = False

sampling_constraints class-attribute instance-attribute

sampling_constraints: dict[str, Any] = field(
    default_factory=dict
)

stage_id instance-attribute

stage_id: int

sync_process_input_func class-attribute instance-attribute

sync_process_input_func: str | None = None

tokenizer_subdir class-attribute instance-attribute

tokenizer_subdir: str | None = None

StageType

Bases: str, Enum

Type of processing stage in the Omni pipeline.

DIFFUSION class-attribute instance-attribute

DIFFUSION = 'diffusion'

LLM class-attribute instance-attribute

LLM = 'llm'

create_config

create_config(data: Any) -> DictConfig

Wrap a dict (or list) into a DictConfig.

Parameters:

Name Type Description Default
data Any

Dict, list, or other structure to wrap.

required

Returns:

Type Description
DictConfig

OmegaConf DictConfig / ListConfig.

load_deploy_config

load_deploy_config(path: str | Path) -> DeployConfig

Load a deploy YAML (with optional base_config inheritance).

load_yaml_config

load_yaml_config(path: str | Any) -> DictConfig

Load a YAML file and return it as a DictConfig.

Parameters:

Name Type Description Default
path str | Any

Path to the YAML file.

required

Returns:

Type Description
DictConfig

OmegaConf DictConfig with attribute-style access.

merge_configs

merge_configs(*cfgs: Any) -> dict

Deep-merge multiple configs and return a plain dict.

Parameters:

Name Type Description Default
*cfgs Any

DictConfig or dict objects to merge (left to right).

()

Returns:

Type Description
dict

Plain dict with merged, resolved values.

merge_pipeline_deploy

merge_pipeline_deploy(
    pipeline: PipelineConfig,
    deploy: DeployConfig,
    cli_overrides: dict[str, Any] | None = None,
) -> list[StageConfig]

Merge pipeline + deploy + platform overrides → list[StageConfig].

register_pipeline

register_pipeline(pipeline: PipelineConfig) -> None

Register a pipeline config dynamically.

In-tree pipelines are declared in pipeline_registry._OMNI_PIPELINES and loaded lazily; calling register_pipeline is only needed for out-of-tree plugins or tests that build a PipelineConfig at runtime. A dynamic registration overrides the central-registry entry with the same model_type.

to_dict

to_dict(obj: Any, *, resolve: bool = True) -> Any

Convert a DictConfig (or similar) to a plain dict.

Parameters:

Name Type Description Default
obj Any

OmegaConf container to convert.

required
resolve bool

Whether to resolve interpolations (default True).

True

Returns:

Type Description
Any

Plain dict.