Skip to content

vllm_omni.diffusion.models.flux2_klein.pipeline_flux2_klein

logger module-attribute

logger = init_logger(__name__)

Flux2ImageProcessor

Bases: VaeImageProcessor

Image processor to preprocess the reference image for Flux2 klein.

check_image_input staticmethod

check_image_input(
    image: Image,
    max_aspect_ratio: int = 8,
    min_side_length: int = 64,
    max_area: int = 1024 * 1024,
) -> Image

concatenate_images staticmethod

concatenate_images(images: list[Image]) -> Image

Flux2KleinPipeline

Bases: Module, CFGParallelMixin, SupportImageInput, DiffusionPipelineProfilerMixin

Flux2 klein pipeline for text-to-image generation.

attention_kwargs property

attention_kwargs

current_timestep property

current_timestep

default_sample_size instance-attribute

default_sample_size = 128

do_classifier_free_guidance property

do_classifier_free_guidance

guidance_scale property

guidance_scale

image_processor instance-attribute

image_processor = Flux2ImageProcessor(
    vae_scale_factor=vae_scale_factor * 2
)

interrupt property

interrupt

is_distilled instance-attribute

is_distilled = is_distilled

latent_channels instance-attribute

latent_channels = (
    latent_channels if hasattr(vae, "config") else 16
)

mask_processor instance-attribute

mask_processor = VaeImageProcessor(
    vae_scale_factor=vae_scale_factor * 2,
    vae_latent_channels=latent_channels,
    do_normalize=False,
    do_binarize=True,
    do_convert_grayscale=True,
)

num_timesteps property

num_timesteps

od_config instance-attribute

od_config = od_config

scheduler instance-attribute

scheduler = from_pretrained(
    model,
    subfolder="scheduler",
    local_files_only=local_files_only,
)

support_image_input class-attribute instance-attribute

support_image_input = True

text_encoder instance-attribute

text_encoder = to(_execution_device)

tokenizer instance-attribute

tokenizer = from_pretrained(
    model,
    subfolder="tokenizer",
    local_files_only=local_files_only,
)

tokenizer_max_length instance-attribute

tokenizer_max_length = 512

transformer instance-attribute

transformer = Flux2Transformer2DModel(
    quant_config=quantization_config, **transformer_kwargs
)

vae instance-attribute

vae = to(_execution_device)

vae_scale_factor instance-attribute

vae_scale_factor = (
    2 ** (len(block_out_channels) - 1)
    if getattr(self, "vae", None)
    else 8
)

weights_sources instance-attribute

weights_sources = [
    ComponentSource(
        model_or_path=model,
        subfolder="transformer",
        revision=None,
        prefix="transformer.",
        fall_back_to_pt=True,
    )
]

check_inputs

check_inputs(
    prompt,
    height,
    width,
    prompt_embeds=None,
    callback_on_step_end_tensor_inputs=None,
    guidance_scale=None,
    strength=None,
    num_inference_steps=None,
)

encode_prompt

encode_prompt(
    prompt: str | list[str],
    device: device | None = None,
    num_images_per_prompt: int = 1,
    prompt_embeds: Tensor | None = None,
    max_sequence_length: int = 512,
    text_encoder_out_layers: tuple[int, ...] = (9, 18, 27),
)

forward

forward(
    req: OmniDiffusionRequest,
    image: Image | list[Image] | None = None,
    reference_image: Image | list[Image] | None = None,
    mask_image: Image | list[Image] | None = None,
    prompt: str | list[str] | None = None,
    height: int | None = None,
    width: int | None = None,
    num_inference_steps: int = 50,
    sigmas: list[float] | None = None,
    strength: float = 1.0,
    guidance_scale: float | None = 4.0,
    num_images_per_prompt: int = 1,
    generator: Generator | list[Generator] | None = None,
    latents: Tensor | None = None,
    prompt_embeds: Tensor | None = None,
    negative_prompt_embeds: Tensor | None = None,
    output_type: str | None = "pil",
    return_dict: bool = True,
    attention_kwargs: dict[str, Any] | None = None,
    callback_on_step_end: Callable[[int, int, dict], None]
    | None = None,
    callback_on_step_end_tensor_inputs: list[str] = [
        "latents"
    ],
    max_sequence_length: int = 512,
    text_encoder_out_layers: tuple[int, ...] = (9, 18, 27),
    padding_mask_crop: int | None = None,
) -> DiffusionOutput

Function invoked when calling the pipeline for generation.

Parameters:

Name Type Description Default
image `torch.Tensor`, `PIL.Image.Image`, `np.ndarray`, or list of these

Image, numpy array or tensor representing an image batch to be used as the starting point. For both numpy array and pytorch tensor, the expected value range is between [0, 1] If it's a tensor or a list or tensors, the expected shape should be (B, C, H, W) or (C, H, W). If it is a numpy array or a list of arrays, the expected shape should be (B, H, W, C) or (H, W, C) It can also accept image latents as image, but if passing latents directly it is not encoded again.

None
prompt `str` or `List[str]`, *optional*

The prompt or prompts to guide the image generation. If not defined, one has to pass prompt_embeds. instead.

None
guidance_scale `float`, *optional*, defaults to 4.0

Guidance scale as defined in Classifier-Free Diffusion Guidance. guidance_scale is defined as w of equation 2. of Imagen Paper. Guidance scale is enabled by setting guidance_scale > 1. Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality. For step-wise distilled models, guidance_scale is ignored.

4.0
height `int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor

The height in pixels of the generated image. This is set to 1024 by default for the best results.

None
width `int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor

The width in pixels of the generated image. This is set to 1024 by default for the best results.

None
num_inference_steps `int`, *optional*, defaults to 50

The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.

50
sigmas `List[float]`, *optional*

Custom sigmas to use for the denoising process with schedulers which support a sigmas argument in their set_timesteps method. If not defined, the default behavior when num_inference_steps is passed will be used.

None
num_images_per_prompt `int`, *optional*, defaults to 1

The number of images to generate per prompt.

1
generator `torch.Generator` or `List[torch.Generator]`, *optional*

One or a list of torch generator(s) to make generation deterministic.

None
latents `torch.Tensor`, *optional*

Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will be generated by sampling using the supplied random generator.

None
prompt_embeds `torch.Tensor`, *optional*

Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt input argument.

None
negative_prompt_embeds `torch.Tensor`, *optional*

Pre-generated negative text embeddings. Note that "" is used as the negative prompt in this pipeline. If not provided, will be generated from "".

None
output_type `str`, *optional*, defaults to `"pil"`

The output format of the generate image. Choose between PIL: PIL.Image.Image or np.array.

'pil'
return_dict `bool`, *optional*, defaults to `True`

Whether or not to return a [~pipelines.qwenimage.QwenImagePipelineOutput] instead of a plain tuple.

True
attention_kwargs `dict`, *optional*

A kwargs dictionary that if specified is passed along to the AttentionProcessor as defined under self.processor in diffusers.models.attention_processor.

None
callback_on_step_end `Callable`, *optional*

A function that calls at the end of each denoising steps during the inference. The function is called with the following arguments: callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict). callback_kwargs will include a list of all tensors as specified by callback_on_step_end_tensor_inputs.

None
callback_on_step_end_tensor_inputs `List`, *optional*

The list of tensor inputs for the callback_on_step_end function. The tensors specified in the list will be passed as callback_kwargs argument. You will only be able to include variables listed in the ._callback_tensor_inputs attribute of your pipeline class.

['latents']
max_sequence_length `int` defaults to 512

Maximum sequence length to use with the prompt.

512
text_encoder_out_layers `Tuple[int]`

Layer indices to use in the text_encoder to derive the final prompt embeddings.

(9, 18, 27)

Examples:

Returns:

Type Description
DiffusionOutput

[~pipelines.flux2.Flux2PipelineOutput] or tuple: [~pipelines.flux2.Flux2PipelineOutput] if

DiffusionOutput

return_dict is True, otherwise a tuple. When returning a tuple, the first element is a list with the

DiffusionOutput

generated images.

get_timesteps

get_timesteps(num_inference_steps, strength, device)

load_weights

load_weights(
    weights: Iterable[tuple[str, Tensor]],
) -> set[str]

prepare_image_latents

prepare_image_latents(
    images: list[Tensor],
    batch_size,
    generator: Generator,
    device,
    dtype,
)

prepare_latents

prepare_latents(
    batch_size,
    num_latents_channels,
    height,
    width,
    dtype,
    device,
    generator: Generator,
    latents: Tensor | None = None,
)

prepare_mask_latents

prepare_mask_latents(
    mask,
    masked_image,
    batch_size,
    num_channels_latents,
    num_images_per_prompt,
    height,
    width,
    dtype,
    device,
    generator,
)

compute_empirical_mu

compute_empirical_mu(
    image_seq_len: int, num_steps: int
) -> float

get_flux2_klein_post_process_func

get_flux2_klein_post_process_func(
    od_config: OmniDiffusionConfig,
)