Skip to content

vllm_omni.diffusion.models.sd3

Stable diffusion3 model components.

Modules:

Name Description
pipeline_sd3
sd3_transformer

SD3Transformer2DModel

Bases: Module

The Transformer model introduced in Stable Diffusion 3.

attention_head_dim instance-attribute

attention_head_dim = attention_head_dim

caption_projection_dim instance-attribute

caption_projection_dim = caption_projection_dim

context_embedder instance-attribute

context_embedder = ReplicatedLinear(
    joint_attention_dim, caption_projection_dim
)

dual_attention_layers instance-attribute

dual_attention_layers = (
    dual_attention_layers
    if hasattr(model_config, "dual_attention_layers")
    else ()
)

in_channels instance-attribute

in_channels = in_channels

inner_dim instance-attribute

inner_dim = num_attention_heads * attention_head_dim

joint_attention_dim instance-attribute

joint_attention_dim = joint_attention_dim

norm_out instance-attribute

norm_out = AdaLayerNormContinuous(
    inner_dim,
    inner_dim,
    elementwise_affine=False,
    eps=1e-06,
)

num_attention_heads instance-attribute

num_attention_heads = num_attention_heads

num_layers instance-attribute

num_layers = num_layers

out_channels instance-attribute

out_channels = out_channels

parallel_config instance-attribute

parallel_config = parallel_config

patch_size instance-attribute

patch_size = patch_size

pooled_projection_dim instance-attribute

pooled_projection_dim = pooled_projection_dim

pos_embed instance-attribute

pos_embed = PatchEmbed(
    height=sample_size,
    width=sample_size,
    patch_size=patch_size,
    in_channels=in_channels,
    embed_dim=inner_dim,
    pos_embed_max_size=pos_embed_max_size,
)

pos_embed_max_size instance-attribute

pos_embed_max_size = pos_embed_max_size

proj_out instance-attribute

proj_out = ReplicatedLinear(
    inner_dim,
    patch_size * patch_size * out_channels,
    bias=True,
)

qk_norm instance-attribute

qk_norm = (
    qk_norm if hasattr(model_config, "qk_norm") else ""
)

sample_size instance-attribute

sample_size = sample_size

time_text_embed instance-attribute

time_text_embed = CombinedTimestepTextProjEmbeddings(
    embedding_dim=inner_dim,
    pooled_projection_dim=pooled_projection_dim,
)

transformer_blocks instance-attribute

transformer_blocks = ModuleList(
    [
        (
            SD3TransformerBlock(
                dim=inner_dim,
                num_attention_heads=num_attention_heads,
                attention_head_dim=attention_head_dim,
                context_pre_only=i == num_layers - 1,
                qk_norm=qk_norm,
                use_dual_attention=True
                if i in dual_attention_layers
                else False,
            )
        )
        for i in (range(num_layers))
    ]
)

forward

forward(
    hidden_states: Tensor,
    encoder_hidden_states: Tensor,
    pooled_projections: Tensor,
    timestep: LongTensor,
    return_dict: bool = True,
) -> Tensor | Transformer2DModelOutput

The [SD3Transformer2DModel] forward method.

Parameters:

Name Type Description Default
hidden_states `torch.Tensor` of shape `(batch_size, image_sequence_length, in_channels)`

Input hidden_states.

required
encoder_hidden_states `torch.Tensor` of shape `(batch_size, text_sequence_length, joint_attention_dim)`

Conditional embeddings (embeddings computed from the input conditions such as prompts) to use.

required
pooled_projections `torch.Tensor` of shape `(batch_size, projection_dim)`

Embeddings projected from the embeddings of input conditions.

required
timestep `torch.LongTensor`

Used to indicate denoising step.

required
return_dict `bool`, *optional*, defaults to `True`

Whether or not to return a [~models.transformer_2d.Transformer2DModelOutput] instead of a plain tuple.

True

Returns:

Type Description
Tensor | Transformer2DModelOutput

If return_dict is True, an [~models.transformer_2d.Transformer2DModelOutput] is returned, otherwise a

Tensor | Transformer2DModelOutput

tuple where the first element is the sample tensor.

load_weights

load_weights(
    weights: Iterable[tuple[str, Tensor]],
) -> set[str]

StableDiffusion3Pipeline

Bases: Module, CFGParallelMixin, DiffusionPipelineProfilerMixin

current_timestep property

current_timestep

default_sample_size instance-attribute

default_sample_size = 128

device instance-attribute

device = get_local_device()

guidance_scale property

guidance_scale

image_processor instance-attribute

image_processor = VaeImageProcessor(
    vae_scale_factor=vae_scale_factor
)

interrupt property

interrupt

num_timesteps property

num_timesteps

od_config instance-attribute

od_config = od_config

output_type instance-attribute

output_type = output_type

patch_size instance-attribute

patch_size = 2

scheduler instance-attribute

scheduler = from_pretrained(
    model,
    subfolder="scheduler",
    local_files_only=local_files_only,
)

text_encoder instance-attribute

text_encoder = from_pretrained_with_prefetch(
    from_pretrained,
    model,
    subfolder="text_encoder",
    prefetch_list=sd3_subfolders,
    local_files_only=local_files_only,
    torch_dtype=dtype,
)

text_encoder_2 instance-attribute

text_encoder_2 = from_pretrained_with_prefetch(
    from_pretrained,
    model,
    subfolder="text_encoder_2",
    prefetch_list=sd3_subfolders,
    local_files_only=local_files_only,
    torch_dtype=dtype,
)

text_encoder_3 instance-attribute

text_encoder_3 = from_pretrained_with_prefetch(
    from_pretrained,
    model,
    subfolder="text_encoder_3",
    prefetch_list=sd3_subfolders,
    local_files_only=local_files_only,
    torch_dtype=dtype,
)

tokenizer instance-attribute

tokenizer = from_pretrained(
    model,
    subfolder="tokenizer",
    local_files_only=local_files_only,
)

tokenizer_2 instance-attribute

tokenizer_2 = from_pretrained(
    model,
    subfolder="tokenizer_2",
    local_files_only=local_files_only,
)

tokenizer_3 instance-attribute

tokenizer_3 = from_pretrained(
    model,
    subfolder="tokenizer_3",
    local_files_only=local_files_only,
)

tokenizer_max_length instance-attribute

tokenizer_max_length = (
    model_max_length
    if hasattr(self, "tokenizer") and tokenizer is not None
    else 77
)

transformer instance-attribute

transformer = SD3Transformer2DModel(od_config=od_config)

vae instance-attribute

vae = to(device)

vae_scale_factor instance-attribute

vae_scale_factor = (
    2 ** (len(block_out_channels) - 1)
    if getattr(self, "vae", None)
    else 8
)

weights_sources instance-attribute

weights_sources = [
    ComponentSource(
        model_or_path=model,
        subfolder="transformer",
        revision=None,
        prefix="transformer.",
        fall_back_to_pt=True,
    )
]

check_inputs

check_inputs(
    prompt,
    prompt_2,
    prompt_3,
    height,
    width,
    negative_prompt=None,
    negative_prompt_2=None,
    negative_prompt_3=None,
    prompt_embeds=None,
    negative_prompt_embeds=None,
    max_sequence_length=None,
)

diffuse

diffuse(
    latents: Tensor,
    timesteps: Tensor,
    prompt_embeds: Tensor,
    pooled_prompt_embeds: Tensor | None,
    negative_prompt_embeds: Tensor | None,
    negative_pooled_prompt_embeds: Tensor | None,
    do_true_cfg: bool,
    guidance_scale: float,
    cfg_normalize: bool = False,
) -> Tensor

Diffusion loop with optional classifier-free guidance.

Parameters:

Name Type Description Default
latents Tensor

Noise latents to denoise

required
timesteps Tensor

Diffusion timesteps

required
prompt_embeds Tensor

Positive prompt embeddings

required
pooled_prompt_embeds Tensor | None

Pooled positive prompt embeddings

required
negative_prompt_embeds Tensor | None

Negative prompt embeddings

required
negative_pooled_prompt_embeds Tensor | None

Pooled negative prompt embeddings

required
do_true_cfg bool

Whether to apply CFG

required
guidance_scale float

CFG scale factor

required
cfg_normalize bool

Whether to normalize CFG output (default: False)

False

Returns:

Type Description
Tensor

Denoised latents

encode_prompt

encode_prompt(
    prompt: str | list[str],
    prompt_2: str | list[str],
    prompt_3: str | list[str],
    prompt_embeds: Tensor | None = None,
    max_sequence_length: int = 256,
    num_images_per_prompt: int = 1,
)

Parameters:

Name Type Description Default
prompt `str` or `List[str]`, *optional*

prompt to be encoded

required
prompt_2 `str` or `List[str]`, *optional*

The prompt or prompts to be sent to the tokenizer_2 and text_encoder_2. If not defined, prompt is used in all text-encoders

required
prompt_3 `str` or `List[str]`, *optional*

The prompt or prompts to be sent to the tokenizer_3 and text_encoder_3. If not defined, prompt is used in all text-encoders

required
num_images_per_prompt `int`

number of images that should be generated per prompt

1
prompt_embeds `torch.FloatTensor`, *optional*

Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt input argument.

None

forward

forward(
    req: OmniDiffusionRequest,
    prompt: str | list[str] = "",
    prompt_2: str | list[str] = "",
    prompt_3: str | list[str] = "",
    negative_prompt: str | list[str] = "",
    negative_prompt_2: str | list[str] = "",
    negative_prompt_3: str | list[str] = "",
    height: int | None = None,
    width: int | None = None,
    num_inference_steps: int = 28,
    sigmas: list[float] | None = None,
    num_images_per_prompt: int = 1,
    generator: Generator | list[Generator] | None = None,
    latents: Tensor | None = None,
    prompt_embeds: Tensor | None = None,
    negative_prompt_embeds: Tensor | None = None,
    pooled_prompt_embeds: Tensor | None = None,
    negative_pooled_prompt_embeds: Tensor | None = None,
    max_sequence_length: int = 256,
) -> DiffusionOutput

load_weights

load_weights(
    weights: Iterable[tuple[str, Tensor]],
) -> set[str]

prepare_latents

prepare_latents(
    batch_size,
    num_channels_latents,
    height,
    width,
    generator,
    latents=None,
) -> Tensor

prepare_timesteps

prepare_timesteps(
    num_inference_steps, sigmas, image_seq_len
)

get_sd3_image_post_process_func

get_sd3_image_post_process_func(
    od_config: OmniDiffusionConfig,
)