vllm_omni.diffusion.data ¶
AttentionConfig dataclass ¶
Per-role attention backend configuration.
Lookup precedence for a given (role, role_category): 1. per_role[role] — exact match 2. per_role[role_category] — category fallback (e.g. "ltx2.audio_to_video" → "cross") 3. default — global default 4. platform default — unchanged platform logic
AttentionSpec dataclass ¶
DiffusionCacheConfig dataclass ¶
Configuration for cache adapters (TeaCache, cache-dit, MagCache, etc.).
This dataclass provides a unified interface for cache configuration parameters. It can be initialized from a dictionary and accessed via attributes.
Common parameters
- TeaCache: rel_l1_thresh, coefficients (optional)
- cache-dit: Fn_compute_blocks, Bn_compute_blocks, max_warmup_steps, residual_diff_threshold, enable_taylorseer, taylorseer_order, scm_steps_mask_policy, scm_steps_policy
- MagCache: mag_threshold, mag_max_skip_steps, mag_retention_ratio, mag_ratios, mag_calibrate
Example
From dict (user-facing API) - partial config uses defaults for missing keys¶
config = DiffusionCacheConfig.from_dict({"rel_l1_thresh": 0.3})
Access via attribute¶
print(config.rel_l1_thresh) # 0.3 (from dict) print(config.Fn_compute_blocks) # 8 (default)
Empty dict uses all defaults¶
default_config = DiffusionCacheConfig.from_dict({}) print(config.rel_l1_thresh) # 0.2 (default)
force_refresh_step_hint class-attribute instance-attribute ¶
force_refresh_step_hint: int | None = None
force_refresh_step_policy class-attribute instance-attribute ¶
force_refresh_step_policy: str = 'once'
max_continuous_cached_steps class-attribute instance-attribute ¶
max_continuous_cached_steps: int = 3
from_dict classmethod ¶
from_dict(data: dict[str, Any]) -> DiffusionCacheConfig
Create DiffusionCacheConfig from a dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data | dict[str, Any] | Dictionary containing cache configuration parameters | required |
Returns:
| Type | Description |
|---|---|
DiffusionCacheConfig | DiffusionCacheConfig instance with parameters set from dict |
DiffusionOutput dataclass ¶
Final output (after pipeline completion)
custom_output class-attribute instance-attribute ¶
output class-attribute instance-attribute ¶
post_process_func class-attribute instance-attribute ¶
stage_durations class-attribute instance-attribute ¶
trajectory_decoded class-attribute instance-attribute ¶
trajectory_latents class-attribute instance-attribute ¶
trajectory_log_probs class-attribute instance-attribute ¶
DiffusionParallelConfig dataclass ¶
Configuration for diffusion model distributed execution.
cfg_parallel_size class-attribute instance-attribute ¶
cfg_parallel_size: int = 1
Number of Classifier Free Guidance (CFG) parallel groups.
data_parallel_size class-attribute instance-attribute ¶
data_parallel_size: int = 1
Number of data parallel groups.
enable_expert_parallel class-attribute instance-attribute ¶
enable_expert_parallel: bool = False
Enable expert parallelism for MoE layers (TP is still used for non-MoE layers).
hsdp_replicate_size class-attribute instance-attribute ¶
hsdp_replicate_size: int = 1
Number of replica groups for HSDP. Each replica holds a full sharded copy.
hsdp_shard_size class-attribute instance-attribute ¶
hsdp_shard_size: int = -1
Number of GPUs to shard weights across within each replica group. -1 means auto-calculate.
pipeline_parallel_size class-attribute instance-attribute ¶
pipeline_parallel_size: int = 1
Number of pipeline parallel stages.
ring_degree class-attribute instance-attribute ¶
ring_degree: int = 1
Number of GPUs used for ring sequence parallelism.
sequence_parallel_size class-attribute instance-attribute ¶
sequence_parallel_size: int | None = None
Number of sequence parallel groups. sequence_parallel_size = ring_degree * ulysses_degree
tensor_parallel_size class-attribute instance-attribute ¶
tensor_parallel_size: int = 1
Number of tensor parallel groups.
ulysses_degree class-attribute instance-attribute ¶
ulysses_degree: int = 1
Number of GPUs used for ulysses sequence parallelism.
ulysses_mode class-attribute instance-attribute ¶
ulysses_mode: str = 'strict'
Ulysses sequence-parallel mode.
- "strict": Require divisibility constraints (fastest, default).
- "advanced_uaa": Enable UAA ("Ulysses Anything Attention") to support uneven sequence lengths and non-divisible head counts.
Note: - Ring attention does not support attention_mask, so models that rely on mask-based auto-padding are still incompatible with Ring. - When used in hybrid Ulysses+Ring, Ring requires consistent per-rank sequence shapes across the ring group.
use_hsdp class-attribute instance-attribute ¶
use_hsdp: bool = False
Enable Hybrid Sharded Data Parallel (HSDP) for model weight sharding.
vae_patch_parallel_size class-attribute instance-attribute ¶
vae_patch_parallel_size: int = 1
Number of ranks used for VAE patch/tile parallelism (decode/encode).
from_dict classmethod ¶
from_dict(data: dict[str, Any]) -> DiffusionParallelConfig
Create DiffusionParallelConfig from a dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data | dict[str, Any] | Dictionary containing parallel configuration parameters | required |
Returns:
| Type | Description |
|---|---|
DiffusionParallelConfig | DiffusionParallelConfig instance with parameters set from dict |
DiffusionRequestAbortedError ¶
Bases: RuntimeError
Raised when a diffusion request ends via user-visible abort.
OmniACK dataclass ¶
Handshake payload from Workers to Orchestrator.
metadata class-attribute instance-attribute ¶
Additional telemetry such as: - max_contiguous_block: for fragmentation analysis. - cuda_graph_recalled: boolean if graphs were successfully destroyed/rebuilt. - latency_ms: time taken for the D2H/H2D transfer.
OmniDiffusionConfig dataclass ¶
additional_config class-attribute instance-attribute ¶
cache_config class-attribute instance-attribute ¶
custom_pipeline_args class-attribute instance-attribute ¶
diffusers_call_kwargs class-attribute instance-attribute ¶
diffusers_load_kwargs class-attribute instance-attribute ¶
diffusers_pipeline_cls class-attribute instance-attribute ¶
diffusers_pipeline_cls: type[DiffusionPipeline] | None = (
None
)
diffusion_attention_config class-attribute instance-attribute ¶
diffusion_attention_config: AttentionConfig = field(
default_factory=lambda: AttentionConfig()
)
diffusion_kv_cache_dtype class-attribute instance-attribute ¶
diffusion_kv_cache_dtype: str | None = None
diffusion_kv_cache_skip_layer_indices class-attribute instance-attribute ¶
diffusion_kv_cache_skip_layers class-attribute instance-attribute ¶
diffusion_kv_cache_skip_layers: str | None = None
diffusion_kv_cache_skip_step_indices class-attribute instance-attribute ¶
diffusion_kv_cache_skip_steps class-attribute instance-attribute ¶
diffusion_kv_cache_skip_steps: str | None = None
distributed_executor_backend class-attribute instance-attribute ¶
distributed_executor_backend: str = 'mp'
enable_cache_dit_summary class-attribute instance-attribute ¶
enable_cache_dit_summary: bool = False
enable_diffusion_pipeline_profiler class-attribute instance-attribute ¶
enable_diffusion_pipeline_profiler: bool = False
enable_layerwise_offload class-attribute instance-attribute ¶
enable_layerwise_offload: bool = False
enable_multithread_weight_load class-attribute instance-attribute ¶
enable_multithread_weight_load: bool = True
enable_prompt_embed_cache class-attribute instance-attribute ¶
enable_prompt_embed_cache: bool = False
enable_stage_verification class-attribute instance-attribute ¶
enable_stage_verification: bool = True
mask_strategy_file_path class-attribute instance-attribute ¶
mask_strategy_file_path: str | None = None
max_multimodal_image_inputs class-attribute instance-attribute ¶
max_multimodal_image_inputs: int | None = None
model_config class-attribute instance-attribute ¶
model_loaded class-attribute instance-attribute ¶
model_loaded: dict[str, bool] = field(
default_factory=lambda: {
"transformer": True,
"vae": True,
}
)
model_paths class-attribute instance-attribute ¶
omni_kv_config class-attribute instance-attribute ¶
override_transformer_cls_name class-attribute instance-attribute ¶
override_transformer_cls_name: str | None = None
parallel_config class-attribute instance-attribute ¶
parallel_config: DiffusionParallelConfig = field(
default_factory=DiffusionParallelConfig
)
profiler_config class-attribute instance-attribute ¶
quantization_config class-attribute instance-attribute ¶
supports_multimodal_inputs class-attribute instance-attribute ¶
supports_multimodal_inputs: bool = False
tf_model_config class-attribute instance-attribute ¶
tf_model_config: TransformerConfig = field(
default_factory=TransformerConfig
)
enrich_config ¶
Load model metadata from HuggingFace and populate config fields.
Diffusers-style models expose model_index.json with _class_name. Non-diffusers models (e.g. Bagel, NextStep) only have config.json, so we fall back to reading that and mapping model_type manually.
set_tf_model_config ¶
set_tf_model_config(tf_config: TransformerConfig) -> None
Assign tf_model_config and propagate quantization if detected.
In the normal startup flow OmniDiffusionConfig is created before the transformer config.json is loaded from disk, so __post_init__ sees an empty TransformerConfig. Callers that load the config later should use this method instead of bare assignment so that an embedded quant_config is propagated to self.quantization_config automatically.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tf_config | TransformerConfig | Transformer configuration, typically built via | required |
settle_port ¶
Find an available port with retry logic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
port | int | Initial port to check | required |
port_inc | int | Port increment for each attempt | 42 |
max_attempts | int | Maximum number of attempts to find an available port | 100 |
Returns:
| Type | Description |
|---|---|
int | An available port number |
Raises:
| Type | Description |
|---|---|
RuntimeError | If no available port is found after max_attempts |
OmniSleepTask dataclass ¶
OmniWakeTask dataclass ¶
TransformerConfig dataclass ¶
Container for raw transformer configuration dictionaries.
build_attention_config ¶
build_attention_config(
attention_config: AttentionConfig
| Mapping[str, Any]
| None = None,
) -> AttentionConfig
Normalize diffusion attention config — the single authoritative entry point.
Called exactly once in OmniDiffusionConfig.__post_init__. Handles type-conversion and env-var fallback (DIFFUSION_ATTENTION_BACKEND).
parse_attention_config ¶
parse_attention_config(
attention_config: AttentionConfig
| Mapping[str, Any]
| None = None,
*,
attention_backend: str | None = None,
) -> AttentionConfig
Pure type-conversion: coerce attention_config to an AttentionConfig.
Optionally merges an attention_backend shorthand into the config's default field. This does not read environment variables — use :func:build_attention_config for the full normalisation that should happen exactly once in OmniDiffusionConfig.__post_init__.