Skip to content

vllm_omni.diffusion.models.flux2_klein

Flux2 klein diffusion model components.

Modules:

Name Description
flux2_klein_transformer
pipeline_flux2_klein

Flux2KleinPipeline

Bases: Module, CFGParallelMixin, SupportImageInput, DiffusionPipelineProfilerMixin

Flux2 klein pipeline for text-to-image generation.

attention_kwargs property

attention_kwargs

current_timestep property

current_timestep

default_sample_size instance-attribute

default_sample_size = 128

do_classifier_free_guidance property

do_classifier_free_guidance

guidance_scale property

guidance_scale

image_processor instance-attribute

image_processor = Flux2ImageProcessor(
    vae_scale_factor=vae_scale_factor * 2
)

interrupt property

interrupt

is_distilled instance-attribute

is_distilled = is_distilled

latent_channels instance-attribute

latent_channels = (
    latent_channels if hasattr(vae, "config") else 16
)

mask_processor instance-attribute

mask_processor = VaeImageProcessor(
    vae_scale_factor=vae_scale_factor * 2,
    vae_latent_channels=latent_channels,
    do_normalize=False,
    do_binarize=True,
    do_convert_grayscale=True,
)

num_timesteps property

num_timesteps

od_config instance-attribute

od_config = od_config

scheduler instance-attribute

scheduler = from_pretrained(
    model,
    subfolder="scheduler",
    local_files_only=local_files_only,
)

support_image_input class-attribute instance-attribute

support_image_input = True

text_encoder instance-attribute

text_encoder = to(_execution_device)

tokenizer instance-attribute

tokenizer = from_pretrained(
    model,
    subfolder="tokenizer",
    local_files_only=local_files_only,
)

tokenizer_max_length instance-attribute

tokenizer_max_length = 512

transformer instance-attribute

transformer = Flux2Transformer2DModel(
    quant_config=quantization_config, **transformer_kwargs
)

vae instance-attribute

vae = to(_execution_device)

vae_scale_factor instance-attribute

vae_scale_factor = (
    2 ** (len(block_out_channels) - 1)
    if getattr(self, "vae", None)
    else 8
)

weights_sources instance-attribute

weights_sources = [
    ComponentSource(
        model_or_path=model,
        subfolder="transformer",
        revision=None,
        prefix="transformer.",
        fall_back_to_pt=True,
    )
]

check_inputs

check_inputs(
    prompt,
    height,
    width,
    prompt_embeds=None,
    callback_on_step_end_tensor_inputs=None,
    guidance_scale=None,
    strength=None,
    num_inference_steps=None,
)

encode_prompt

encode_prompt(
    prompt: str | list[str],
    device: device | None = None,
    num_images_per_prompt: int = 1,
    prompt_embeds: Tensor | None = None,
    max_sequence_length: int = 512,
    text_encoder_out_layers: tuple[int, ...] = (9, 18, 27),
)

forward

forward(
    req: OmniDiffusionRequest,
    image: Image | list[Image] | None = None,
    reference_image: Image | list[Image] | None = None,
    mask_image: Image | list[Image] | None = None,
    prompt: str | list[str] | None = None,
    height: int | None = None,
    width: int | None = None,
    num_inference_steps: int = 50,
    sigmas: list[float] | None = None,
    strength: float = 1.0,
    guidance_scale: float | None = 4.0,
    num_images_per_prompt: int = 1,
    generator: Generator | list[Generator] | None = None,
    latents: Tensor | None = None,
    prompt_embeds: Tensor | None = None,
    negative_prompt_embeds: Tensor | None = None,
    output_type: str | None = "pil",
    return_dict: bool = True,
    attention_kwargs: dict[str, Any] | None = None,
    callback_on_step_end: Callable[[int, int, dict], None]
    | None = None,
    callback_on_step_end_tensor_inputs: list[str] = [
        "latents"
    ],
    max_sequence_length: int = 512,
    text_encoder_out_layers: tuple[int, ...] = (9, 18, 27),
    padding_mask_crop: int | None = None,
) -> DiffusionOutput

Function invoked when calling the pipeline for generation.

Parameters:

Name Type Description Default
image `torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, or list of these

Image, numpy array or tensor representing an image batch to be used as the starting point. For both numpy array and pytorch tensor, the expected value range is between [0, 1] If it's a tensor or a list or tensors, the expected shape should be (B, C, H, W) or (C, H, W). If it is a numpy array or a list of arrays, the expected shape should be (B, H, W, C) or (H, W, C) It can also accept image latents as image, but if passing latents directly it is not encoded again.

None
prompt `str` or `List[str]`, *optional*

The prompt or prompts to guide the image generation. If not defined, one has to pass prompt_embeds. instead.

None
guidance_scale `float`, *optional*, defaults to 4.0

Guidance scale as defined in Classifier-Free Diffusion Guidance. guidance_scale is defined as w of equation 2. of Imagen Paper. Guidance scale is enabled by setting guidance_scale > 1. Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality. For step-wise distilled models, guidance_scale is ignored.

4.0
height `int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor

The height in pixels of the generated image. This is set to 1024 by default for the best results.

None
width `int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor

The width in pixels of the generated image. This is set to 1024 by default for the best results.

None
num_inference_steps `int`, *optional*, defaults to 50

The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.

50
sigmas `List[float]`, *optional*

Custom sigmas to use for the denoising process with schedulers which support a sigmas argument in their set_timesteps method. If not defined, the default behavior when num_inference_steps is passed will be used.

None
num_images_per_prompt `int`, *optional*, defaults to 1

The number of images to generate per prompt.

1
generator `torch.Generator` or `List[torch.Generator]`, *optional*

One or a list of torch generator(s) to make generation deterministic.

None
latents `torch.Tensor`, *optional*

Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will be generated by sampling using the supplied random generator.

None
prompt_embeds `torch.Tensor`, *optional*

Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt input argument.

None
negative_prompt_embeds `torch.Tensor`, *optional*

Pre-generated negative text embeddings. Note that "" is used as the negative prompt in this pipeline. If not provided, will be generated from "".

None
output_type `str`, *optional*, defaults to `"pil"`

The output format of the generate image. Choose between PIL: PIL.Image.Image or np.array.

'pil'
return_dict `bool`, *optional*, defaults to `True`

Whether or not to return a [~pipelines.qwenimage.QwenImagePipelineOutput] instead of a plain tuple.

True
attention_kwargs `dict`, *optional*

A kwargs dictionary that if specified is passed along to the AttentionProcessor as defined under self.processor in diffusers.models.attention_processor.

None
callback_on_step_end `Callable`, *optional*

A function that calls at the end of each denoising steps during the inference. The function is called with the following arguments: callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict). callback_kwargs will include a list of all tensors as specified by callback_on_step_end_tensor_inputs.

None
callback_on_step_end_tensor_inputs `List`, *optional*

The list of tensor inputs for the callback_on_step_end function. The tensors specified in the list will be passed as callback_kwargs argument. You will only be able to include variables listed in the ._callback_tensor_inputs attribute of your pipeline class.

['latents']
max_sequence_length `int` defaults to 512

Maximum sequence length to use with the prompt.

512
text_encoder_out_layers `Tuple[int]`

Layer indices to use in the text_encoder to derive the final prompt embeddings.

(9, 18, 27)

Examples:

Returns:

Type Description
DiffusionOutput

[~pipelines.flux2.Flux2PipelineOutput] or tuple: [~pipelines.flux2.Flux2PipelineOutput] if

DiffusionOutput

return_dict is True, otherwise a tuple. When returning a tuple, the first element is a list with the

DiffusionOutput

generated images.

get_timesteps

get_timesteps(num_inference_steps, strength, device)

load_weights

load_weights(
    weights: Iterable[tuple[str, Tensor]],
) -> set[str]

prepare_image_latents

prepare_image_latents(
    images: list[Tensor],
    batch_size,
    generator: Generator,
    device,
    dtype,
)

prepare_latents

prepare_latents(
    batch_size,
    num_latents_channels,
    height,
    width,
    dtype,
    device,
    generator: Generator,
    latents: Tensor | None = None,
)

prepare_mask_latents

prepare_mask_latents(
    mask,
    masked_image,
    batch_size,
    num_channels_latents,
    num_images_per_prompt,
    height,
    width,
    dtype,
    device,
    generator,
)

Flux2Transformer2DModel

Bases: Module

The Transformer model introduced in Flux 2.

Supports Sequence Parallelism (Ulysses and Ring) when configured via OmniDiffusionConfig.

config instance-attribute

config = SimpleNamespace(
    patch_size=patch_size,
    in_channels=in_channels,
    out_channels=out_channels,
    num_layers=num_layers,
    num_single_layers=num_single_layers,
    attention_head_dim=attention_head_dim,
    num_attention_heads=num_attention_heads,
    joint_attention_dim=joint_attention_dim,
    timestep_guidance_channels=timestep_guidance_channels,
    mlp_ratio=mlp_ratio,
    axes_dims_rope=axes_dims_rope,
    rope_theta=rope_theta,
    eps=eps,
    guidance_embeds=guidance_embeds,
)

context_embedder instance-attribute

context_embedder = Linear(
    joint_attention_dim, inner_dim, bias=False
)

double_stream_modulation_img instance-attribute

double_stream_modulation_img = Flux2Modulation(
    inner_dim, mod_param_sets=2, bias=False
)

double_stream_modulation_txt instance-attribute

double_stream_modulation_txt = Flux2Modulation(
    inner_dim, mod_param_sets=2, bias=False
)

dtype property

dtype: dtype

inner_dim instance-attribute

inner_dim = num_attention_heads * attention_head_dim

norm_out instance-attribute

norm_out = AdaLayerNormContinuous(
    inner_dim,
    inner_dim,
    elementwise_affine=False,
    eps=eps,
    bias=False,
)

out_channels instance-attribute

out_channels = out_channels or in_channels

parallel_config instance-attribute

parallel_config = parallel_config

pos_embed instance-attribute

pos_embed = Flux2PosEmbed(
    theta=rope_theta, axes_dim=list(axes_dims_rope)
)

proj_out instance-attribute

proj_out = Linear(
    inner_dim,
    patch_size * patch_size * out_channels,
    bias=False,
)

rope_prepare instance-attribute

rope_prepare = Flux2RopePrepare(pos_embed)

single_stream_modulation instance-attribute

single_stream_modulation = Flux2Modulation(
    inner_dim, mod_param_sets=1, bias=False
)

single_transformer_blocks instance-attribute

single_transformer_blocks = ModuleList(
    [
        (
            Flux2SingleTransformerBlock(
                parallel_config=parallel_config,
                dim=inner_dim,
                num_attention_heads=num_attention_heads,
                attention_head_dim=attention_head_dim,
                mlp_ratio=mlp_ratio,
                eps=eps,
                bias=False,
                quant_config=quant_config,
                prefix=f"single_transformer_blocks.{i}",
            )
        )
        for i in (range(num_single_layers))
    ]
)

time_guidance_embed instance-attribute

time_guidance_embed = Flux2TimestepGuidanceEmbeddings(
    in_channels=timestep_guidance_channels,
    embedding_dim=inner_dim,
    bias=False,
    guidance_embeds=guidance_embeds,
)

transformer_blocks instance-attribute

transformer_blocks = ModuleList(
    [
        (
            Flux2TransformerBlock(
                parallel_config=parallel_config,
                dim=inner_dim,
                num_attention_heads=num_attention_heads,
                attention_head_dim=attention_head_dim,
                mlp_ratio=mlp_ratio,
                eps=eps,
                bias=False,
                quant_config=quant_config,
                prefix=f"transformer_blocks.{i}",
            )
        )
        for i in (range(num_layers))
    ]
)

x_embedder instance-attribute

x_embedder = Linear(in_channels, inner_dim, bias=False)

forward

forward(
    hidden_states: Tensor,
    encoder_hidden_states: Tensor,
    timestep: LongTensor,
    img_ids: Tensor,
    txt_ids: Tensor,
    guidance: Tensor | None = None,
    joint_attention_kwargs: dict[str, Any] | None = None,
    return_dict: bool = True,
) -> Tensor | Transformer2DModelOutput

load_weights

load_weights(
    weights: Iterable[tuple[str, Tensor]],
) -> set[str]

get_flux2_klein_post_process_func

get_flux2_klein_post_process_func(
    od_config: OmniDiffusionConfig,
)