Skip to content

vllm_omni.config.omni_config

Structured vLLM-Omni configuration classes.

This module is additive for Phase 2 of RFC #4021. VllmOmniConfig.from_registry builds the structured view directly from the pipeline registry and deploy config so parity can be proven before later PRs cut consumers over to these classes.

StageConfigType module-attribute

BaseVllmOmniStageConfig

Common structured config contract shared by all Omni stage realizations.

cache_config class-attribute instance-attribute

cache_config: OmniStageCacheConfig = field(
    default_factory=OmniStageCacheConfig
)

cfg_kv_collect_func property

cfg_kv_collect_func: str | None

connector_config class-attribute instance-attribute

connector_config: OmniStageConnectorConfig = field(
    default_factory=OmniStageConnectorConfig
)

custom_process_input_func property

custom_process_input_func: str | None

custom_process_next_stage_input_func property

custom_process_next_stage_input_func: str | None

engine_output_type property

engine_output_type: str | None

final_output property

final_output: bool

final_output_type property

final_output_type: str | None

hf_config_name property

hf_config_name: str | None

input_sources property

input_sources: list[int]

is_comprehension property

is_comprehension: bool

load_config class-attribute instance-attribute

load_config: OmniStageLoadConfig = field(
    default_factory=OmniStageLoadConfig
)

model_config class-attribute instance-attribute

model_config: OmniStageModelConfig = field(
    default_factory=OmniStageModelConfig
)

model_stage property

model_stage: str

parallel_config class-attribute instance-attribute

parallel_config: OmniStageParallelConfig = field(
    default_factory=OmniStageParallelConfig
)

prompt_expand_func property

prompt_expand_func: str | None

quantization_config class-attribute instance-attribute

quantization_config: _QuantizationConfigType = None

requires_multimodal_data property

requires_multimodal_data: bool

runtime_config class-attribute instance-attribute

runtime_config: OmniStageRuntimeConfig = field(
    default_factory=OmniStageRuntimeConfig
)

scheduler_cls property

scheduler_cls: str | None

scheduler_config class-attribute instance-attribute

scheduler_config: OmniStageSchedulerConfig = field(
    default_factory=OmniStageSchedulerConfig
)

stage_id property

stage_id: int

stage_pipeline_config instance-attribute

stage_pipeline_config: StagePipelineConfig

stage_type property

stage_type: StageType

worker_type property

worker_type: str | None

OmniStageCacheConfig

Per-stage engine cache and memory behavior.

This is separate from _DiffusionConfigProjection.cache_config, which configures vLLM-Omni diffusion-specific cache backends such as TeaCache and Cache-DiT.

disable_hybrid_kv_cache_manager class-attribute instance-attribute

disable_hybrid_kv_cache_manager: bool = False

enable_prefix_caching class-attribute instance-attribute

enable_prefix_caching: bool = False

gpu_memory_utilization class-attribute instance-attribute

gpu_memory_utilization: float = Field(
    default=0.9, gt=0.0, le=1.0
)

mm_processor_cache_gb class-attribute instance-attribute

mm_processor_cache_gb: float | None = Field(
    default=None, ge=0.0
)

OmniStageConnectorConfig

Per-stage inter-stage connector wiring.

input_connectors class-attribute instance-attribute

input_connectors: dict[str, Any] | None = None

output_connectors class-attribute instance-attribute

output_connectors: dict[str, Any] | None = None

stage_connector class-attribute instance-attribute

stage_connector: dict[str, Any] = field(
    default_factory=lambda: {
        "name": "SharedMemoryConnector",
        "extra": {},
    }
)

OmniStageDiffusionParallelConfig

Bases: OmniStageParallelConfig

Diffusion-stage distributed parallelism behavior.

cfg_parallel_size class-attribute instance-attribute

cfg_parallel_size: int = Field(default=1, ge=1, le=3)

hsdp_replicate_size class-attribute instance-attribute

hsdp_replicate_size: int = Field(default=1, ge=1)

hsdp_shard_size class-attribute instance-attribute

hsdp_shard_size: int = -1

mask_sp_padding class-attribute instance-attribute

mask_sp_padding: bool = False

ring_degree class-attribute instance-attribute

ring_degree: int = Field(default=1, ge=1)

sequence_parallel_size class-attribute instance-attribute

sequence_parallel_size: int = Field(
    default=1, ge=1, init=False
)

ulysses_degree class-attribute instance-attribute

ulysses_degree: int = Field(default=1, ge=1)

ulysses_mode class-attribute instance-attribute

ulysses_mode: str = 'strict'

use_hsdp class-attribute instance-attribute

use_hsdp: bool = False

vae_patch_parallel_size class-attribute instance-attribute

vae_patch_parallel_size: int = Field(default=1, ge=1)

OmniStageLoadConfig

Per-stage loading behavior.

config_format class-attribute instance-attribute

config_format: str | None = None

load_format class-attribute instance-attribute

load_format: str = 'auto'

skip_mm_profiling class-attribute instance-attribute

skip_mm_profiling: bool | None = None

tokenizer_mode class-attribute instance-attribute

tokenizer_mode: str = 'auto'

OmniStageModelConfig

Per-stage model behavior.

active_stream_window class-attribute instance-attribute

active_stream_window: int = Field(default=0, ge=0)

codec_frame_rate_hz class-attribute instance-attribute

codec_frame_rate_hz: float | None = None

compilation_config class-attribute instance-attribute

compilation_config: dict[str, Any] | None = None

custom_voice_dir class-attribute instance-attribute

custom_voice_dir: str | None = None

default_sampling_params class-attribute instance-attribute

default_sampling_params: dict[str, Any] | None = None

disable_autocast class-attribute instance-attribute

disable_autocast: bool = False

enable_flashinfer_autotune class-attribute instance-attribute

enable_flashinfer_autotune: bool | None = None

enable_multithread_weight_load class-attribute instance-attribute

enable_multithread_weight_load: bool = True

enable_sleep_mode class-attribute instance-attribute

enable_sleep_mode: bool = False

enforce_eager class-attribute instance-attribute

enforce_eager: bool = False

has_sampling_extra_args class-attribute instance-attribute

has_sampling_extra_args: bool = False

num_weight_load_threads class-attribute instance-attribute

num_weight_load_threads: int = Field(default=4, ge=1)

subtalker_sampling_params class-attribute instance-attribute

subtalker_sampling_params: dict[str, Any] | None = None

task_type class-attribute instance-attribute

task_type: str | None = None

OmniStageParallelConfig

Common per-stage distributed parallelism behavior.

data_parallel_size class-attribute instance-attribute

data_parallel_size: int = Field(default=1, ge=1)

enable_expert_parallel class-attribute instance-attribute

enable_expert_parallel: bool = False

pipeline_parallel_size class-attribute instance-attribute

pipeline_parallel_size: int = Field(default=1, ge=1)

tensor_parallel_size class-attribute instance-attribute

tensor_parallel_size: int = Field(default=1, ge=1)

world_size class-attribute instance-attribute

world_size: int = Field(default=1, ge=1, init=False)

OmniStageRuntimeConfig

Per-stage process placement and runtime behavior.

devices class-attribute instance-attribute

devices: str | None = None

env class-attribute instance-attribute

env: dict[str, Any] | None = None

log_level class-attribute instance-attribute

log_level: str = 'info'

log_stats class-attribute instance-attribute

log_stats: bool = False

num_gpus class-attribute instance-attribute

num_gpus: int = Field(default=1, ge=1)

num_replicas class-attribute instance-attribute

num_replicas: int = Field(default=1, ge=1)

profiler_config class-attribute instance-attribute

profiler_config: dict[str, Any] | None = None

OmniStageSchedulerConfig

Per-stage request scheduling behavior.

async_scheduling class-attribute instance-attribute

async_scheduling: bool = True

enable_chunked_prefill class-attribute instance-attribute

enable_chunked_prefill: bool = False

max_model_len class-attribute instance-attribute

max_model_len: int | None = Field(default=None, ge=-1)

max_num_batched_tokens class-attribute instance-attribute

max_num_batched_tokens: int | None = Field(
    default=None, ge=1
)

max_num_seqs class-attribute instance-attribute

max_num_seqs: int = Field(default=128, ge=1)

VllmOmniARStageConfig

Bases: BaseVllmOmniStageConfig

Structured config for autoregressive LLM stages.

VllmOmniConfig

Top-level structured Omni config built once from registry inputs.

orchestrator_config class-attribute instance-attribute

orchestrator_config: VllmOmniOrchestratorConfig = field(
    default_factory=VllmOmniOrchestratorConfig
)

pipeline_config instance-attribute

pipeline_config: PipelineConfig

stage_configs instance-attribute

stage_configs: tuple[StageConfigType, ...]

from_registry classmethod

from_registry(
    model_type: str,
    deploy_config_path: str | None = None,
    cli_overrides: dict[str, Any] | None = None,
) -> VllmOmniConfig

Create a structured config from a registered pipeline and deploy YAML.

stage_by_id

stage_by_id(stage_id: int) -> StageConfigType

VllmOmniDiffusionStageConfig

Bases: BaseVllmOmniStageConfig

Structured config for diffusion stages.

diffusion_config class-attribute instance-attribute

diffusion_config: _DiffusionConfigProjection = field(
    default_factory=_DiffusionConfigProjection
)

parallel_config class-attribute instance-attribute

VllmOmniGenerationStageConfig

Bases: BaseVllmOmniStageConfig

Structured config for generation LLM stages.

VllmOmniOrchestratorConfig

Configuration consumed by the orchestrator process only.

batch_timeout class-attribute instance-attribute

batch_timeout: int = Field(default=10, ge=0)

deploy_config_path class-attribute instance-attribute

deploy_config_path: str | None = None

init_timeout class-attribute instance-attribute

init_timeout: int = Field(default=600, ge=1)

omni_dp_size_local class-attribute instance-attribute

omni_dp_size_local: int = Field(default=1, ge=1)

omni_heartbeat_timeout class-attribute instance-attribute

omni_heartbeat_timeout: float = Field(default=30.0, gt=0.0)

omni_lb_policy class-attribute instance-attribute

omni_lb_policy: str = 'random'

omni_master_address class-attribute instance-attribute

omni_master_address: str | None = None

omni_master_port class-attribute instance-attribute

omni_master_port: int | None = None

ray_address class-attribute instance-attribute

ray_address: str | None = None

shm_threshold_bytes class-attribute instance-attribute

shm_threshold_bytes: int = Field(default=65536, ge=0)

stage_init_timeout class-attribute instance-attribute

stage_init_timeout: int = Field(default=300, ge=1)

worker_backend class-attribute instance-attribute

worker_backend: str = 'multi_process'