Skip to content

vllm_omni.diffusion.models.z_image.pipeline_z_image

logger module-attribute

logger = get_logger(__name__)

ZImagePipeline

Bases: Module, DiffusionPipelineProfilerMixin

do_classifier_free_guidance property

do_classifier_free_guidance

guidance_scale property

guidance_scale

image_processor instance-attribute

image_processor = VaeImageProcessor(
    vae_scale_factor=vae_scale_factor * 2,
    do_convert_rgb=True,
)

interrupt property

interrupt

joint_attention_kwargs property

joint_attention_kwargs

num_timesteps property

num_timesteps

od_config instance-attribute

od_config = od_config

scheduler instance-attribute

scheduler = from_pretrained(
    model,
    subfolder="scheduler",
    local_files_only=local_files_only,
)

text_encoder instance-attribute

text_encoder = to(_execution_device)

tokenizer instance-attribute

tokenizer = from_pretrained(
    model,
    subfolder="tokenizer",
    local_files_only=local_files_only,
)

transformer instance-attribute

transformer = ZImageTransformer2DModel(
    quant_config=quantization_config
)

vae instance-attribute

vae = to(_execution_device)

vae_scale_factor instance-attribute

vae_scale_factor = (
    2 ** (len(block_out_channels) - 1)
    if hasattr(self, "vae") and vae is not None
    else 8
)

weights_sources instance-attribute

weights_sources = [
    ComponentSource(
        model_or_path=model,
        subfolder="text_encoder",
        revision=revision,
        prefix="text_encoder.",
    ),
    ComponentSource(
        model_or_path=model,
        subfolder="transformer",
        revision=revision,
        prefix="transformer.",
        fall_back_to_pt=True,
    ),
    ComponentSource(
        model_or_path=model,
        subfolder="vae",
        revision=revision,
        prefix="vae.",
    ),
]

encode_prompt

encode_prompt(
    prompt: str | list[str],
    device: device | None = None,
    do_classifier_free_guidance: bool = True,
    negative_prompt: str | list[str] | None = None,
    prompt_embeds: list[FloatTensor] | None = None,
    negative_prompt_embeds: FloatTensor | None = None,
    max_sequence_length: int = 512,
)

forward

forward(
    req: OmniDiffusionRequest,
    prompt: str | list[str] | None = None,
    image: PipelineImageInput = None,
    strength: float = 0.6,
    height: int = 1024,
    width: int = 1024,
    num_inference_steps: int = 50,
    sigmas: list[float] | None = None,
    guidance_scale: float = 5.0,
    cfg_normalization: bool = False,
    cfg_truncation: float = 1.0,
    negative_prompt: str | list[str] | None = None,
    num_images_per_prompt: int = 1,
    generator: Generator | list[Generator] | None = None,
    latents: FloatTensor | None = None,
    prompt_embeds: list[FloatTensor] | None = None,
    negative_prompt_embeds: list[FloatTensor] | None = None,
    output_type: str | None = "pil",
    return_dict: bool = True,
    joint_attention_kwargs: dict[str, Any] | None = None,
    callback_on_step_end: Callable[[int, int, dict], None]
    | None = None,
    callback_on_step_end_tensor_inputs: list[str] = [
        "latents"
    ],
    max_sequence_length: int = 512,
) -> DiffusionOutput

Function invoked when calling the pipeline for generation.

Parameters:

Name Type Description Default
prompt `str` or `list[str]`, *optional*

The prompt or prompts to guide the image generation. If not defined, one has to pass prompt_embeds. instead.

None
image `PipelineImageInput`, *optional*

The image to use for img2img generation. If provided, the pipeline will perform img2img instead of text-to-image.

None
strength `float`, *optional*, defaults to 0.6

Indicates extent to transform the reference image. Must be between 0 and 1.

0.6
height `int`, *optional*, defaults to 1024

The height in pixels of the generated image.

1024
width `int`, *optional*, defaults to 1024

The width in pixels of the generated image.

1024
num_inference_steps `int`, *optional*, defaults to 50

The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference.

50
sigmas `list[float]`, *optional*

Custom sigmas to use for the denoising process with schedulers which support a sigmas argument in their set_timesteps method. If not defined, the default behavior when num_inference_steps is passed will be used.

None
guidance_scale `float`, *optional*, defaults to 5.0

Guidance scale as defined in Classifier-Free Diffusion Guidance. guidance_scale is defined as w of equation 2. of Imagen Paper. Guidance scale is enabled by setting guidance_scale > 0. Higher guidance scale encourages to generate images that are closely linked to the text prompt, usually at the expense of lower image quality.

5.0
cfg_normalization `bool`, *optional*, defaults to False

Whether to apply configuration normalization.

False
cfg_truncation `float`, *optional*, defaults to 1.0

The truncation value for configuration.

1.0
negative_prompt `str` or `list[str]`, *optional*

The prompt or prompts not to guide the image generation. If not defined, one has to pass negative_prompt_embeds instead. Ignored when not using guidance (i.e., ignored if guidance_scale is less than or equal to 0).

None
num_images_per_prompt `int`, *optional*, defaults to 1

The number of images to generate per prompt.

1
generator `torch.Generator` or `list[torch.Generator]`, *optional*

One or a list of torch generator(s) to make generation deterministic.

None
latents `torch.FloatTensor`, *optional*

Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will be generated by sampling using the supplied random generator.

None
prompt_embeds `list[torch.FloatTensor]`, *optional*

Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from prompt input argument.

None
negative_prompt_embeds `list[torch.FloatTensor]`, *optional*

Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from negative_prompt input argument.

None
output_type `str`, *optional*, defaults to `"pil"`

The output format of the generate image. Choose between PIL: PIL.Image.Image or np.array.

'pil'
return_dict `bool`, *optional*, defaults to `True`

Whether or not to return a [~pipelines.stable_diffusion.ZImagePipelineOutput] instead of a plain tuple.

True
joint_attention_kwargs `dict`, *optional*

A kwargs dictionary that if specified is passed along to the AttentionProcessor as defined under self.processor in diffusers.models.attention_processor.

None
callback_on_step_end `Callable`, *optional*

A function that calls at the end of each denoising steps during the inference. The function is called with the following arguments: callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict). callback_kwargs will include a list of all tensors as specified by callback_on_step_end_tensor_inputs.

None
callback_on_step_end_tensor_inputs `list`, *optional*

The list of tensor inputs for the callback_on_step_end function. The tensors specified in the list will be passed as callback_kwargs argument. You will only be able to include variables listed in the ._callback_tensor_inputs attribute of your pipeline class.

['latents']
max_sequence_length `int`, *optional*, defaults to 512

Maximum sequence length to use with the prompt.

512

Examples:

Returns:

Type Description
DiffusionOutput

[~pipelines.z_image.ZImagePipelineOutput] or tuple: [~pipelines.z_image.ZImagePipelineOutput] if

DiffusionOutput

return_dict is True, otherwise a tuple. When returning a tuple, the first element is a list with the

DiffusionOutput

generated images.

get_timesteps

get_timesteps(num_inference_steps, strength, device)

load_weights

load_weights(
    weights: Iterable[tuple[str, Tensor]],
) -> set[str]

prepare_latents

prepare_latents(
    batch_size,
    num_channels_latents,
    height,
    width,
    dtype,
    device,
    generator,
    latents=None,
    image=None,
    timestep=None,
)

calculate_shift

calculate_shift(
    image_seq_len,
    base_seq_len: int = 256,
    max_seq_len: int = 4096,
    base_shift: float = 0.5,
    max_shift: float = 1.15,
)

get_post_process_func

get_post_process_func(od_config: OmniDiffusionConfig)

retrieve_latents

retrieve_latents(
    encoder_output: Tensor,
    generator: Generator | None = None,
    sample_mode: str = "sample",
)

retrieve_timesteps

retrieve_timesteps(
    scheduler,
    num_inference_steps: int | None = None,
    device: str | device | None = None,
    timesteps: list[int] | None = None,
    sigmas: list[float] | None = None,
    **kwargs,
) -> tuple[Tensor, int]

Calls the scheduler's set_timesteps method and retrieves timesteps from the scheduler after the call. Handles custom timesteps. Any kwargs will be supplied to scheduler.set_timesteps.

Parameters:

Name Type Description Default
scheduler `SchedulerMixin`

The scheduler to get timesteps from.

required
num_inference_steps `int`

The number of diffusion steps used when generating samples with a pre-trained model. If used, timesteps must be None.

None
device `str` or `torch.device`, *optional*

The device to which the timesteps should be moved to. If None, the timesteps are not moved.

None
timesteps `list[int]`, *optional*

Custom timesteps used to override the timestep spacing strategy of the scheduler. If timesteps is passed, num_inference_steps and sigmas must be None.

None
sigmas `list[float]`, *optional*

Custom sigmas used to override the timestep spacing strategy of the scheduler. If sigmas is passed, num_inference_steps and timesteps must be None.

None

Returns:

Type Description
Tensor

Tuple[torch.Tensor, int]: A tuple where the first element is the timestep schedule from the scheduler and the

int

second element is the number of inference steps.