Skip to content

vLLM Recipes

Stable Diffusion 3.5 Usage Guide

Stable Diffusion 3.5 Usage Guide¶

This guide provides instructions for running Stable-Diffusion3.5 text-to-image generation models using vLLM-Omni with Cache-DiT acceleration.

Supported Models¶

stabilityai/stable-diffusion-3.5-large: 8.1B parameters model
stabilityai/stable-diffusion-3.5-large-turbo: 8.1B parameters model (timestep-distilled enabling few-step inference)
stabilityai/stable-diffusion-3.5-medium: 2.5B parameters model

Installing vLLM-Omni¶

uv venv
source .venv/bin/activate
uv pip install vllm==0.12.0
uv pip install git+https://github.com/vllm-project/vllm-omni.git

The CLI examples below are from the vLLM-Omni repo. If you want to run them directly, clone that repo and run the scripts from its examples/offline_inference directory.

Text-to-Image Generation¶

Basic Usage¶

from vllm_omni.entrypoints.omni import Omni

omni = Omni(model="stabilityai/stable-diffusion-3.5-medium")

images = omni.generate(
    prompt="a cat wearing sunglasses, cyberpunk style",
    negative_prompt="blurry, low quality",
    height=1024,
    width=1024,
    num_inference_steps=28,
    guidance_scale=7.5,
    num_outputs_per_prompt=2,
)

CLI Usage¶

python examples/offline_inference/text_to_image/text_to_image.py \
  --model stabilityai/stable-diffusion-3.5-medium \
  --prompt "a cat wearing sunglasses, cyberpunk style" \
  --negative-prompt "blurry, low quality" \
  --height 1024 \
  --width 1024 \
  --num-inference-steps 28 \
  --guidance-scale 7.5

Cache-DiT Acceleration¶

vLLM-Omni supports Cache-DiT acceleration for stable-diffusion-3.5 models, which can significantly speed up image generation through caching mechanisms.

Enabling Cache-DiT¶

from vllm_omni.entrypoints.omni import Omni

omni = Omni(
    model="stabilityai/stable-diffusion-3.5-medium",
    cache_backend="cache_dit",
)

images = omni.generate(
    prompt="a cat wearing sunglasses, cyberpunk style",
    height=1024,
    width=1024,
    num_inference_steps=28,
)

Custom Cache-DiT Configuration¶

For fine-tuned control over the acceleration:

omni = Omni(
    model="stabilityai/stable-diffusion-3.5-medium",
    cache_backend="cache_dit",
    cache_config={
        "Fn_compute_blocks": 8,
        "Bn_compute_blocks": 0,
        "max_warmup_steps": 4,
        "residual_diff_threshold": 0.12,
    },
)

Key Parameters¶

Parameter	Default	Description
`height`	1024	image height (multiples of 16)
`width`	1024	image width (multiples of 16)
`num_inference_steps`	28	Denoising steps
`guidance_scale`	1.0	Classifier-free guidance scale

Additional Resources¶