vllm_omni.diffusion.models.sd3 ¶
Stable diffusion3 model components.
Modules:
| Name | Description |
|---|---|
pipeline_sd3 | |
sd3_transformer | |
SD3Transformer2DModel ¶
Bases: Module
The Transformer model introduced in Stable Diffusion 3.
context_embedder instance-attribute ¶
dual_attention_layers instance-attribute ¶
dual_attention_layers = (
dual_attention_layers
if hasattr(model_config, "dual_attention_layers")
else ()
)
norm_out instance-attribute ¶
pos_embed instance-attribute ¶
pos_embed = PatchEmbed(
height=sample_size,
width=sample_size,
patch_size=patch_size,
in_channels=in_channels,
embed_dim=inner_dim,
pos_embed_max_size=pos_embed_max_size,
)
proj_out instance-attribute ¶
time_text_embed instance-attribute ¶
time_text_embed = CombinedTimestepTextProjEmbeddings(
embedding_dim=inner_dim,
pooled_projection_dim=pooled_projection_dim,
)
transformer_blocks instance-attribute ¶
transformer_blocks = ModuleList(
[
(
SD3TransformerBlock(
dim=inner_dim,
num_attention_heads=num_attention_heads,
attention_head_dim=attention_head_dim,
context_pre_only=i == num_layers - 1,
qk_norm=qk_norm,
use_dual_attention=True
if i in dual_attention_layers
else False,
)
)
for i in (range(num_layers))
]
)
forward ¶
forward(
hidden_states: Tensor,
encoder_hidden_states: Tensor,
pooled_projections: Tensor,
timestep: LongTensor,
return_dict: bool = True,
) -> Tensor | Transformer2DModelOutput
The [SD3Transformer2DModel] forward method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hidden_states | `torch.Tensor` of shape `(batch_size, image_sequence_length, in_channels)` | Input | required |
encoder_hidden_states | `torch.Tensor` of shape `(batch_size, text_sequence_length, joint_attention_dim)` | Conditional embeddings (embeddings computed from the input conditions such as prompts) to use. | required |
pooled_projections | `torch.Tensor` of shape `(batch_size, projection_dim)` | Embeddings projected from the embeddings of input conditions. | required |
timestep | `torch.LongTensor` | Used to indicate denoising step. | required |
return_dict | `bool`, *optional*, defaults to `True` | Whether or not to return a [ | True |
Returns:
| Type | Description |
|---|---|
Tensor | Transformer2DModelOutput | If |
Tensor | Transformer2DModelOutput |
|
StableDiffusion3Pipeline ¶
Bases: Module, CFGParallelMixin, DiffusionPipelineProfilerMixin
image_processor instance-attribute ¶
scheduler instance-attribute ¶
text_encoder instance-attribute ¶
text_encoder = from_pretrained_with_prefetch(
from_pretrained,
model,
subfolder="text_encoder",
prefetch_list=sd3_subfolders,
local_files_only=local_files_only,
torch_dtype=dtype,
)
text_encoder_2 instance-attribute ¶
text_encoder_2 = from_pretrained_with_prefetch(
from_pretrained,
model,
subfolder="text_encoder_2",
prefetch_list=sd3_subfolders,
local_files_only=local_files_only,
torch_dtype=dtype,
)
text_encoder_3 instance-attribute ¶
text_encoder_3 = from_pretrained_with_prefetch(
from_pretrained,
model,
subfolder="text_encoder_3",
prefetch_list=sd3_subfolders,
local_files_only=local_files_only,
torch_dtype=dtype,
)
tokenizer instance-attribute ¶
tokenizer_2 instance-attribute ¶
tokenizer_3 instance-attribute ¶
tokenizer_max_length instance-attribute ¶
tokenizer_max_length = (
model_max_length
if hasattr(self, "tokenizer") and tokenizer is not None
else 77
)
vae_scale_factor instance-attribute ¶
weights_sources instance-attribute ¶
weights_sources = [
ComponentSource(
model_or_path=model,
subfolder="transformer",
revision=None,
prefix="transformer.",
fall_back_to_pt=True,
)
]
check_inputs ¶
check_inputs(
prompt,
prompt_2,
prompt_3,
height,
width,
negative_prompt=None,
negative_prompt_2=None,
negative_prompt_3=None,
prompt_embeds=None,
negative_prompt_embeds=None,
max_sequence_length=None,
)
diffuse ¶
diffuse(
latents: Tensor,
timesteps: Tensor,
prompt_embeds: Tensor,
pooled_prompt_embeds: Tensor | None,
negative_prompt_embeds: Tensor | None,
negative_pooled_prompt_embeds: Tensor | None,
do_true_cfg: bool,
guidance_scale: float,
cfg_normalize: bool = False,
) -> Tensor
Diffusion loop with optional classifier-free guidance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
latents | Tensor | Noise latents to denoise | required |
timesteps | Tensor | Diffusion timesteps | required |
prompt_embeds | Tensor | Positive prompt embeddings | required |
pooled_prompt_embeds | Tensor | None | Pooled positive prompt embeddings | required |
negative_prompt_embeds | Tensor | None | Negative prompt embeddings | required |
negative_pooled_prompt_embeds | Tensor | None | Pooled negative prompt embeddings | required |
do_true_cfg | bool | Whether to apply CFG | required |
guidance_scale | float | CFG scale factor | required |
cfg_normalize | bool | Whether to normalize CFG output (default: False) | False |
Returns:
| Type | Description |
|---|---|
Tensor | Denoised latents |
encode_prompt ¶
encode_prompt(
prompt: str | list[str],
prompt_2: str | list[str],
prompt_3: str | list[str],
prompt_embeds: Tensor | None = None,
max_sequence_length: int = 256,
num_images_per_prompt: int = 1,
)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt | `str` or `List[str]`, *optional* | prompt to be encoded | required |
prompt_2 | `str` or `List[str]`, *optional* | The prompt or prompts to be sent to the | required |
prompt_3 | `str` or `List[str]`, *optional* | The prompt or prompts to be sent to the | required |
num_images_per_prompt | `int` | number of images that should be generated per prompt | 1 |
prompt_embeds | `torch.FloatTensor`, *optional* | Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from | None |
forward ¶
forward(
req: OmniDiffusionRequest,
prompt: str | list[str] = "",
prompt_2: str | list[str] = "",
prompt_3: str | list[str] = "",
negative_prompt: str | list[str] = "",
negative_prompt_2: str | list[str] = "",
negative_prompt_3: str | list[str] = "",
height: int | None = None,
width: int | None = None,
num_inference_steps: int = 28,
sigmas: list[float] | None = None,
num_images_per_prompt: int = 1,
generator: Generator | list[Generator] | None = None,
latents: Tensor | None = None,
prompt_embeds: Tensor | None = None,
negative_prompt_embeds: Tensor | None = None,
pooled_prompt_embeds: Tensor | None = None,
negative_pooled_prompt_embeds: Tensor | None = None,
max_sequence_length: int = 256,
) -> DiffusionOutput
prepare_latents ¶
prepare_latents(
batch_size,
num_channels_latents,
height,
width,
generator,
latents=None,
) -> Tensor
get_sd3_image_post_process_func ¶
get_sd3_image_post_process_func(
od_config: OmniDiffusionConfig,
)