vllm_omni.diffusion.models.ernie_image ¶
ErnieImage diffusion model for vLLM-Omni.
This module implements ERNIE-Image text-to-image generation with: - ErnieImageTransformer2DModel: Custom DiT transformer - ErnieImagePipeline: Full generation pipeline
Modules:
| Name | Description |
|---|---|
ernie_image_transformer | |
pipeline_ernie_image | |
ErnieImagePipeline ¶
Bases: Module, CFGParallelMixin, SupportImageInput, ProgressBarMixin, DiffusionPipelineProfilerMixin
image_processor instance-attribute ¶
pe_tokenizer instance-attribute ¶
pe_tokenizer = from_pretrained(
pe_base_path,
subfolder="pe_tokenizer",
local_files_only=True,
trust_remote_code=True,
use_fast=False,
)
scheduler instance-attribute ¶
tokenizer instance-attribute ¶
transformer instance-attribute ¶
transformer = ErnieImageTransformer2DModel(
quant_config=quantization_config, **transformer_kwargs
)
vae_scale_factor instance-attribute ¶
weights_sources instance-attribute ¶
weights_sources = [
ComponentSource(
model_or_path=model,
subfolder="transformer",
revision=None,
prefix="transformer.",
fall_back_to_pt=True,
)
]
check_inputs ¶
check_inputs(
prompt,
height,
width,
prompt_embeds=None,
callback_on_step_end_tensor_inputs=None,
guidance_scale=None,
)
encode_prompt ¶
encode_prompt(
prompt: str | list[str],
device: device,
num_images_per_prompt: int = 1,
width: int = 1024,
height: int = 1024,
apply_pe: bool = True,
) -> list[Tensor]
forward ¶
forward(
req: OmniDiffusionRequest,
prompt: str | list[str] | None = None,
negative_prompt: str | list[str] | None = "",
height: int = 1024,
width: int = 1024,
num_inference_steps: int = 50,
guidance_scale: float = 4.0,
num_images_per_prompt: int = 1,
generator: Generator | None = None,
latents: Tensor | None = None,
prompt_embeds: list[FloatTensor] | None = None,
negative_prompt_embeds: list[FloatTensor] | None = None,
output_type: str = "pil",
return_dict: bool = True,
callback_on_step_end: Callable[[int, int, dict], None]
| None = None,
callback_on_step_end_tensor_inputs: list[str] = [
"latents"
],
) -> DiffusionOutput
ErnieImageTransformer2DModel ¶
Bases: Module
adaLN_modulation instance-attribute ¶
config instance-attribute ¶
config = SimpleNamespace(
patch_size=patch_size,
in_channels=in_channels,
out_channels=out_channels,
num_layers=num_layers,
num_attention_heads=num_attention_heads,
ffn_hidden_size=ffn_hidden_size,
hidden_size=hidden_size,
text_in_dim=text_in_dim,
rope_theta=rope_theta,
rope_axes_dim=rope_axes_dim,
eps=eps,
qk_layernorm=qk_layernorm,
)
final_linear instance-attribute ¶
layers instance-attribute ¶
layers = ModuleList(
[
(
ErnieImageSharedAdaLNBlock(
parallel_config=parallel_config,
hidden_size=hidden_size,
num_heads=num_attention_heads,
ffn_hidden_size=ffn_hidden_size,
eps=eps,
qk_layernorm=qk_layernorm,
quant_config=quant_config,
)
)
for _ in (range(num_layers))
]
)
pos_embed instance-attribute ¶
pos_embed = ErnieImageEmbedND3(
dim=head_dim, theta=rope_theta, axes_dim=rope_axes_dim
)
text_proj instance-attribute ¶
text_proj = (
Linear(text_in_dim, hidden_size, bias=False)
if text_in_dim != hidden_size
else None
)
time_proj instance-attribute ¶
unified_prepare instance-attribute ¶
unified_prepare = UnifiedPrepare(
x_embedder, text_proj, pos_embed
)
x_embedder instance-attribute ¶
x_embedder = ErnieImagePatchEmbedDynamic(
in_channels, hidden_size, patch_size
)