Skip to content

vllm_omni.engine.arg_utils

SHARED_FIELDS module-attribute

SHARED_FIELDS: frozenset[str] = frozenset(
    {
        "model",
        "stage_id",
        "log_stats",
        "stage_configs_path",
        "async_chunk",
        "tokenizer",
    }
)

logger module-attribute

logger = init_logger(__name__)

OmniAsyncEngineArgs dataclass

Bases: AsyncEngineArgs, OmniEngineArgs

output_modality property

output_modality: OutputModality

Parse engine_output_type into a type-safe OutputModality flag.

add_cli_args classmethod

add_cli_args(parser: ArgumentParser) -> ArgumentParser

OmniEngineArgs dataclass

Bases: EngineArgs

Engine arguments for omni models, extending base EngineArgs. Adds omni-specific configuration fields for multi-stage pipeline processing and output type specification. Args: stage_id: Identifier for the stage in a multi-stage pipeline. Defaults to 0 for per-stage engine construction. The CLI-level single-stage selector remains optional on the parsed argparse namespace and should not be forwarded as a nullable per-stage engine argument. model_stage: Stage type identifier, e.g., "thinker" or "talker" (default: "thinker") model_arch: Model architecture name (default: "Qwen2_5OmniForConditionalGeneration") engine_output_type: Optional output type specification for the engine. Used to route outputs to appropriate processors (e.g., "image", "audio", "latents"). If None, output type is inferred. hf_config_name: Optional key for HF config subkey to be extracted for this stage, e.g., talker_config; If None, the default HF config will be used. custom_process_next_stage_input_func: Optional path to a custom function for processing inputs from previous stages If None, default processing is used. stage_connector_spec: Extra configuration for stage connector async_chunk: If set to True, perform async chunk worker_type: Model Type, e.g., "ar" or "generation" task_type: Default task type for TTS models (CustomVoice, VoiceDesign, or Base). If not specified, will be inferred from model path. omni_master_address: TCP address that the OmniMasterServer (running inside AsyncOmniEngine) listens on for engine core registrations. Required when single-stage mode is active. omni_master_port: TCP port for the OmniMasterServer registration socket. Required when single-stage mode is active. stage_configs_path: Optional path to a JSON/YAML file containing stage configurations for the multi-stage pipeline. If None, stage configs are resolved from the model's default configuration. output_modalities: Optional list of output modality names to enable (e.g. ["text", "audio"]). If None, all modalities supported by the model are used. log_stats: Whether to log engine statistics. Defaults to False. custom_pipeline_args: Dictionary of arguments for custom pipeline initialization (e.g., {"pipeline_class": "my.Module"}). Passed through to the diffusion stage engine.

active_stream_window class-attribute instance-attribute

active_stream_window: int = 0

async_chunk class-attribute instance-attribute

async_chunk: bool = False

custom_pipeline_args class-attribute instance-attribute

custom_pipeline_args: dict[str, Any] | None = None

custom_process_next_stage_input_func class-attribute instance-attribute

custom_process_next_stage_input_func: str | None = None

enable_sleep_mode class-attribute instance-attribute

enable_sleep_mode: bool = False

engine_output_type class-attribute instance-attribute

engine_output_type: str | None = None

force_cutlass_fp8 class-attribute instance-attribute

force_cutlass_fp8: bool | None = None

has_sampling_extra_args class-attribute instance-attribute

has_sampling_extra_args: bool = False

hf_config_name class-attribute instance-attribute

hf_config_name: str | None = None

log_stats class-attribute instance-attribute

log_stats: bool = False

model_arch class-attribute instance-attribute

model_arch: str | None = None

model_stage class-attribute instance-attribute

model_stage: str = 'thinker'

omni class-attribute instance-attribute

omni: bool = False

omni_dp_size_local class-attribute instance-attribute

omni_dp_size_local: int = 1

omni_heartbeat_timeout class-attribute instance-attribute

omni_heartbeat_timeout: float = 30.0

omni_kv_config class-attribute instance-attribute

omni_kv_config: dict | None = None

omni_lb_policy class-attribute instance-attribute

omni_lb_policy: str = 'random'

omni_master_address class-attribute instance-attribute

omni_master_address: str | None = None

omni_master_port class-attribute instance-attribute

omni_master_port: int | None = None

output_modalities class-attribute instance-attribute

output_modalities: list[str] | None = None

quantization_config class-attribute instance-attribute

quantization_config: Any | None = None

stage_configs_path class-attribute instance-attribute

stage_configs_path: str | None = None

stage_connector_spec class-attribute instance-attribute

stage_connector_spec: dict[str, Any] = field(
    default_factory=dict
)

stage_id class-attribute instance-attribute

stage_id: int = 0

subtalker_sampling_params class-attribute instance-attribute

subtalker_sampling_params: dict[str, Any] | None = None

task_type class-attribute instance-attribute

task_type: str | None = None

worker_cls class-attribute instance-attribute

worker_cls: str = None

worker_type class-attribute instance-attribute

worker_type: str | None = None

create_model_config

create_model_config() -> OmniModelConfig

Create an OmniModelConfig from these engine arguments. Returns: OmniModelConfig instance with all configuration fields set

OrchestratorArgs dataclass

CLI flags consumed by the orchestrator.

every field here is either

(a) orchestrator-only (never needed by a stage engine), OR (b) orchestrator-read-then-redistributed (e.g. async_chunk is read from CLI, written to DeployConfig, then propagated to every stage via merge_pipeline_deploy — not via direct kwargs forwarding).

Fields that BOTH orchestrator and engine genuinely need (e.g. model, log_stats) should be listed in SHARED_FIELDS below.

async_chunk class-attribute instance-attribute

async_chunk: bool | None = None

auxiliary_text_encoder class-attribute instance-attribute

auxiliary_text_encoder: str | None = None

batch_timeout class-attribute instance-attribute

batch_timeout: int = 10

boundary_ratio class-attribute instance-attribute

boundary_ratio: float | None = None

cache_backend class-attribute instance-attribute

cache_backend: str = 'none'

cache_config class-attribute instance-attribute

cache_config: str | None = None

cfg_parallel_size class-attribute instance-attribute

cfg_parallel_size: int = 1

default_sampling_params class-attribute instance-attribute

default_sampling_params: str | None = None

deploy_config class-attribute instance-attribute

deploy_config: str | None = None

diffusers_call_kwargs class-attribute instance-attribute

diffusers_call_kwargs: str = '{}'

diffusers_load_kwargs class-attribute instance-attribute

diffusers_load_kwargs: str = '{}'

diffusion_attention_backend class-attribute instance-attribute

diffusion_attention_backend: str | None = None

diffusion_attention_config class-attribute instance-attribute

diffusion_attention_config: str | None = None

diffusion_kv_cache_dtype class-attribute instance-attribute

diffusion_kv_cache_dtype: str | None = None

diffusion_kv_cache_skip_layers class-attribute instance-attribute

diffusion_kv_cache_skip_layers: str | None = None

diffusion_kv_cache_skip_steps class-attribute instance-attribute

diffusion_kv_cache_skip_steps: str | None = None

diffusion_load_format class-attribute instance-attribute

diffusion_load_format: str | None = None

diffusion_quantization_config class-attribute instance-attribute

diffusion_quantization_config: str | None = None

enable_ar_profiler class-attribute instance-attribute

enable_ar_profiler: bool = False

enable_cache_dit_summary class-attribute instance-attribute

enable_cache_dit_summary: bool = False

enable_cpu_offload class-attribute instance-attribute

enable_cpu_offload: bool = False

enable_diffusion_pipeline_profiler class-attribute instance-attribute

enable_diffusion_pipeline_profiler: bool = False

enable_layerwise_offload class-attribute instance-attribute

enable_layerwise_offload: bool = False

enable_multithread_weight_load class-attribute instance-attribute

enable_multithread_weight_load: bool = True

flow_shift class-attribute instance-attribute

flow_shift: float | None = None

hsdp_replicate_size class-attribute instance-attribute

hsdp_replicate_size: int = 1

hsdp_shard_size class-attribute instance-attribute

hsdp_shard_size: int = -1

init_timeout class-attribute instance-attribute

init_timeout: int = 600

log_file class-attribute instance-attribute

log_file: str | None = None

log_stats class-attribute instance-attribute

log_stats: bool = False

max_generated_image_size class-attribute instance-attribute

max_generated_image_size: int | None = None

model_class_name class-attribute instance-attribute

model_class_name: str | None = None

num_gpus class-attribute instance-attribute

num_gpus: int | None = None

num_weight_load_threads class-attribute instance-attribute

num_weight_load_threads: int = 4

omni_replica_address class-attribute instance-attribute

omni_replica_address: str | None = None

parallel_config class-attribute instance-attribute

parallel_config: Any = None

ray_address class-attribute instance-attribute

ray_address: str | None = None

replica_id class-attribute instance-attribute

replica_id: int | None = None

ring_degree class-attribute instance-attribute

ring_degree: int | None = None

shm_threshold_bytes class-attribute instance-attribute

shm_threshold_bytes: int = 65536

stage_configs_path class-attribute instance-attribute

stage_configs_path: str | None = None

stage_id class-attribute instance-attribute

stage_id: int | None = None

stage_init_timeout class-attribute instance-attribute

stage_init_timeout: int = 300

stage_overrides class-attribute instance-attribute

stage_overrides: str | None = None

step_execution class-attribute instance-attribute

step_execution: bool = False

tokenizer class-attribute instance-attribute

tokenizer: str | None = None

tts_max_instructions_length class-attribute instance-attribute

tts_max_instructions_length: int | None = None

ulysses_degree class-attribute instance-attribute

ulysses_degree: int | None = None

ulysses_mode class-attribute instance-attribute

ulysses_mode: str = 'strict'

use_hsdp class-attribute instance-attribute

use_hsdp: bool = False

vae_patch_parallel_size class-attribute instance-attribute

vae_patch_parallel_size: int = 1

vae_use_slicing class-attribute instance-attribute

vae_use_slicing: bool = False

vae_use_tiling class-attribute instance-attribute

vae_use_tiling: bool = False

worker_backend class-attribute instance-attribute

worker_backend: str = 'multi_process'

internal_blacklist_keys

internal_blacklist_keys() -> frozenset[str]

Return the set of CLI keys that must never be forwarded as per-stage engine overrides.

Derived from OrchestratorArgs fields minus SHARED_FIELDS, so adding a new orchestrator-owned flag is a one-line change to the dataclass — this function updates automatically.

orchestrator_field_names

orchestrator_field_names() -> frozenset[str]

Return the names of every field on OrchestratorArgs.

register_omni_models_to_vllm

register_omni_models_to_vllm()