vllm_omni.diffusion.models.interface ¶

ReferenceVideoDecodeSpec `dataclass` ¶

keep `class-attribute` `instance-attribute` ¶

keep: Literal['first', 'last'] = 'first'

max_frames `class-attribute` `instance-attribute` ¶

max_frames: int | None = None

SupportAudioInput ¶

Bases: Protocol

support_audio_input `class-attribute` ¶

support_audio_input: bool = True

SupportAudioOutput ¶

Bases: Protocol

support_audio_output `class-attribute` ¶

support_audio_output: bool = True

SupportImageInput ¶

Bases: Protocol

color_format `class-attribute` ¶

color_format: str = 'RGB'

support_image_input `class-attribute` ¶

support_image_input: bool = True

SupportsComponentDiscovery ¶

Bases: Protocol

Declares which submodules serve as pipeline components.

Used by the framework to locate DiT, encoder, and VAE modules for CPU offload, HSDP sharding, and other operations that need to know the pipeline's internal structure.

All attribute names support dotted paths for nested submodules (e.g. "pipe.transformer").

Attributes:

Name	Type	Description
`_dit_modules`	`list[str]`	Denoising submodules (on GPU during diffusion).
`_encoder_modules`	`list[str]`	Encoder submodules (offloaded during diffusion).
`_vae_modules`	`list[str]`	VAE(s) (always on GPU).
`_resident_modules`	`list[str]`	Extra modules pinned on GPU during layerwise offloading. Optional, defaults to `[]`.

SupportsStepExecution ¶

Bases: Protocol

State-driven step-level execution protocol for diffusion pipelines.

Pipelines should split request-level forward() into: prepare_encode() (one-time request setup), denoise_step() (one denoise forward), step_scheduler() (one scheduler update), and post_decode() (final decode).

supports_step_execution `class-attribute` ¶

supports_step_execution: bool = True

denoise_step ¶

denoise_step(
    input_batch: InputBatch, **kwargs: Any
) -> Tensor | None

Run one denoise forward on the runner-assembled batch.

post_decode ¶

post_decode(
    state: DiffusionRequestState, **kwargs: Any
) -> DiffusionOutput

Decode output after denoise loop or at a partial chunk boundary.

prepare_encode ¶

prepare_encode(
    state: DiffusionRequestState, **kwargs: Any
) -> DiffusionRequestState

Prepare request-level inputs and return initialized state.

step_scheduler ¶

step_scheduler(
    state: DiffusionRequestState,
    noise_pred: Tensor,
    **kwargs: Any,
) -> None

Run one scheduler step.

supports_step_execution ¶

supports_step_execution(pipeline: object) -> bool

Return whether pipeline implements :class:SupportsStepExecution.

vllm_omni.diffusion.models.interface ¶

ReferenceVideoDecodeSpec dataclass ¶

keep class-attribute instance-attribute ¶

max_frames class-attribute instance-attribute ¶

SupportAudioInput ¶

support_audio_input class-attribute ¶

SupportAudioOutput ¶

support_audio_output class-attribute ¶

SupportImageInput ¶

color_format class-attribute ¶

support_image_input class-attribute ¶

SupportsComponentDiscovery ¶

SupportsStepExecution ¶

supports_step_execution class-attribute ¶

denoise_step ¶

post_decode ¶

prepare_encode ¶

step_scheduler ¶

supports_step_execution ¶

ReferenceVideoDecodeSpec `dataclass` ¶

keep `class-attribute` `instance-attribute` ¶

max_frames `class-attribute` `instance-attribute` ¶

support_audio_input `class-attribute` ¶

support_audio_output `class-attribute` ¶

color_format `class-attribute` ¶

support_image_input `class-attribute` ¶

supports_step_execution `class-attribute` ¶