vllm_omni.diffusion.models.z_image.pipeline_z_image ¶
ZImagePipeline ¶
Bases: Module, DiffusionPipelineProfilerMixin
image_processor instance-attribute ¶
scheduler instance-attribute ¶
tokenizer instance-attribute ¶
transformer instance-attribute ¶
transformer = ZImageTransformer2DModel(
quant_config=quantization_config
)
vae_scale_factor instance-attribute ¶
vae_scale_factor = (
2 ** (len(block_out_channels) - 1)
if hasattr(self, "vae") and vae is not None
else 8
)
weights_sources instance-attribute ¶
weights_sources = [
ComponentSource(
model_or_path=model,
subfolder="text_encoder",
revision=revision,
prefix="text_encoder.",
),
ComponentSource(
model_or_path=model,
subfolder="transformer",
revision=revision,
prefix="transformer.",
fall_back_to_pt=True,
),
ComponentSource(
model_or_path=model,
subfolder="vae",
revision=revision,
prefix="vae.",
),
]
encode_prompt ¶
encode_prompt(
prompt: str | list[str],
device: device | None = None,
do_classifier_free_guidance: bool = True,
negative_prompt: str | list[str] | None = None,
prompt_embeds: list[FloatTensor] | None = None,
negative_prompt_embeds: FloatTensor | None = None,
max_sequence_length: int = 512,
)
forward ¶
forward(
req: OmniDiffusionRequest,
prompt: str | list[str] | None = None,
image: PipelineImageInput = None,
strength: float = 0.6,
height: int = 1024,
width: int = 1024,
num_inference_steps: int = 50,
sigmas: list[float] | None = None,
guidance_scale: float = 5.0,
cfg_normalization: bool = False,
cfg_truncation: float = 1.0,
negative_prompt: str | list[str] | None = None,
num_images_per_prompt: int = 1,
generator: Generator | list[Generator] | None = None,
latents: FloatTensor | None = None,
prompt_embeds: list[FloatTensor] | None = None,
negative_prompt_embeds: list[FloatTensor] | None = None,
output_type: str | None = "pil",
return_dict: bool = True,
joint_attention_kwargs: dict[str, Any] | None = None,
callback_on_step_end: Callable[[int, int, dict], None]
| None = None,
callback_on_step_end_tensor_inputs: list[str] = [
"latents"
],
max_sequence_length: int = 512,
) -> DiffusionOutput
Function invoked when calling the pipeline for generation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
prompt | `str` or `list[str]`, *optional* | The prompt or prompts to guide the image generation. If not defined, one has to pass | None |
image | `PipelineImageInput`, *optional* | The image to use for img2img generation. If provided, the pipeline will perform img2img instead of text-to-image. | None |
strength | `float`, *optional*, defaults to 0.6 | Indicates extent to transform the reference | 0.6 |
height | `int`, *optional*, defaults to 1024 | The height in pixels of the generated image. | 1024 |
width | `int`, *optional*, defaults to 1024 | The width in pixels of the generated image. | 1024 |
num_inference_steps | `int`, *optional*, defaults to 50 | The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. | 50 |
sigmas | `list[float]`, *optional* | Custom sigmas to use for the denoising process with schedulers which support a | None |
guidance_scale | `float`, *optional*, defaults to 5.0 | Guidance scale as defined in Classifier-Free Diffusion Guidance. | 5.0 |
cfg_normalization | `bool`, *optional*, defaults to False | Whether to apply configuration normalization. | False |
cfg_truncation | `float`, *optional*, defaults to 1.0 | The truncation value for configuration. | 1.0 |
negative_prompt | `str` or `list[str]`, *optional* | The prompt or prompts not to guide the image generation. If not defined, one has to pass | None |
num_images_per_prompt | `int`, *optional*, defaults to 1 | The number of images to generate per prompt. | 1 |
generator | `torch.Generator` or `list[torch.Generator]`, *optional* | One or a list of torch generator(s) to make generation deterministic. | None |
latents | `torch.FloatTensor`, *optional* | Pre-generated noisy latents, sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor will be generated by sampling using the supplied random | None |
prompt_embeds | `list[torch.FloatTensor]`, *optional* | Pre-generated text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, text embeddings will be generated from | None |
negative_prompt_embeds | `list[torch.FloatTensor]`, *optional* | Pre-generated negative text embeddings. Can be used to easily tweak text inputs, e.g. prompt weighting. If not provided, negative_prompt_embeds will be generated from | None |
output_type | `str`, *optional*, defaults to `"pil"` | The output format of the generate image. Choose between PIL: | 'pil' |
return_dict | `bool`, *optional*, defaults to `True` | Whether or not to return a [ | True |
joint_attention_kwargs | `dict`, *optional* | A kwargs dictionary that if specified is passed along to the | None |
callback_on_step_end | `Callable`, *optional* | A function that calls at the end of each denoising steps during the inference. The function is called with the following arguments: | None |
callback_on_step_end_tensor_inputs | `list`, *optional* | The list of tensor inputs for the | ['latents'] |
max_sequence_length | `int`, *optional*, defaults to 512 | Maximum sequence length to use with the | 512 |
Examples:
Returns:
| Type | Description |
|---|---|
DiffusionOutput | [ |
DiffusionOutput |
|
DiffusionOutput | generated images. |
prepare_latents ¶
prepare_latents(
batch_size,
num_channels_latents,
height,
width,
dtype,
device,
generator,
latents=None,
image=None,
timestep=None,
)
calculate_shift ¶
calculate_shift(
image_seq_len,
base_seq_len: int = 256,
max_seq_len: int = 4096,
base_shift: float = 0.5,
max_shift: float = 1.15,
)
retrieve_latents ¶
retrieve_latents(
encoder_output: Tensor,
generator: Generator | None = None,
sample_mode: str = "sample",
)
retrieve_timesteps ¶
retrieve_timesteps(
scheduler,
num_inference_steps: int | None = None,
device: str | device | None = None,
timesteps: list[int] | None = None,
sigmas: list[float] | None = None,
**kwargs,
) -> tuple[Tensor, int]
Calls the scheduler's set_timesteps method and retrieves timesteps from the scheduler after the call. Handles custom timesteps. Any kwargs will be supplied to scheduler.set_timesteps.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scheduler | `SchedulerMixin` | The scheduler to get timesteps from. | required |
num_inference_steps | `int` | The number of diffusion steps used when generating samples with a pre-trained model. If used, | None |
device | `str` or `torch.device`, *optional* | The device to which the timesteps should be moved to. If | None |
timesteps | `list[int]`, *optional* | Custom timesteps used to override the timestep spacing strategy of the scheduler. If | None |
sigmas | `list[float]`, *optional* | Custom sigmas used to override the timestep spacing strategy of the scheduler. If | None |
Returns:
| Type | Description |
|---|---|
Tensor |
|
int | second element is the number of inference steps. |