Step Execution¶
Step execution is an opt-in diffusion execution mode enabled with step_execution=True when constructing Omni.
It is not a generic diffusion toggle for every pipeline. Only pipelines that implement the stepwise contract support it today.
Quick Start¶
Python API¶
from vllm_omni import Omni
from vllm_omni.inputs.data import OmniDiffusionSamplingParams
omni = Omni(
model="Qwen/Qwen-Image",
step_execution=True,
)
outputs = omni.generate(
"A cat sitting on a windowsill",
OmniDiffusionSamplingParams(
num_inference_steps=50,
),
)
Serving¶
For serving, --step-execution enables the step-wise runtime. Continuous batching only becomes relevant when --max-num-seqs > 1.
Supported Pipelines¶
| Pipeline | Example models | Step execution |
|---|---|---|
QwenImagePipeline | Qwen/Qwen-Image, Qwen/Qwen-Image-2512 | Yes |
| All other diffusion pipelines | QwenImageEditPipeline, QwenImageEditPlusPipeline, QwenImageLayeredPipeline, GLM-Image, Wan, Flux, etc. | No |
Experimental continuous batching
When --step-execution is enabled and max_num_seqs > 1 is configured, the step-wise path can batch compatible requests together. This is experimental. Requests with incompatible sampling parameters are intentionally kept in separate batches, and max_num_seqs=1 remains the conservative default.
Current Limitations¶
- Continuous batching under
step_executionis experimental and only batches compatible requests. cache_backendis not supported together with step execution.- Unsupported pipelines fail early during model loading.
- Request-mode extras such as KV transfer are not wired into step mode yet.
- LoRA is supported in step mode, but each batch must use a single adapter: requests with different
lora_requestorlora_scaleare scheduled into separate batches.
When To Use It¶
Use step execution only when you specifically need the pipeline to run through its stepwise request state machine. For normal diffusion inference, leave it disabled unless your workflow depends on this mode.
For Qwen-Image online serving, the usual progression is:
- start with
--step-execution --max-num-seqs 1if you only need the step-wise path - increase
--max-num-seqsafter that if you want the experimental compatible-request batching behavior
If you are looking for general diffusion speedups, see Diffusion Features Overview.
Troubleshooting¶
If model loading fails with a message mentioning prepare_encode(), denoise_step(), step_scheduler(), and post_decode(), the selected pipeline does not support step execution.
For Model Authors¶
If you want to add step execution support to a new diffusion pipeline, see the implementation guide: Diffusion Step Execution Design.
If you also want that pipeline to participate in the experimental batched step-wise path, see: Continuous Batching for Step-Wise Diffusion.