vllm_omni.diffusion.models.ernie_image ¶
ErnieImage diffusion model for vLLM-Omni.
This module implements ERNIE-Image text-to-image generation with: - ErnieImageTransformer2DModel: Custom DiT transformer - ErnieImagePipeline: Full generation pipeline
Modules:
| Name | Description |
|---|---|
ernie_image_transformer | |
pipeline_ernie_image | |
ErnieImagePipeline ¶
Bases: Module, CFGParallelMixin, SupportImageInput, ProgressBarMixin, DiffusionPipelineProfilerMixin
image_processor instance-attribute ¶
pe_model instance-attribute ¶
pe_model = AutoModelForCausalLM.from_pretrained(
pe_model_path,
torch_dtype=od_config.dtype,
local_files_only=True,
trust_remote_code=True,
).to(self._execution_device)
pe_tokenizer instance-attribute ¶
pe_tokenizer = AutoTokenizer.from_pretrained(
pe_base_path,
subfolder="pe_tokenizer",
local_files_only=True,
trust_remote_code=True,
use_fast=False,
)
scheduler instance-attribute ¶
scheduler = FlowMatchEulerDiscreteScheduler.from_pretrained(
model,
subfolder="scheduler",
local_files_only=local_files_only,
)
text_encoder instance-attribute ¶
text_encoder = AutoModel.from_pretrained(
model,
subfolder="text_encoder",
torch_dtype=od_config.dtype,
local_files_only=local_files_only,
).to(self._execution_device)
tokenizer instance-attribute ¶
tokenizer = AutoTokenizer.from_pretrained(
model,
subfolder="tokenizer",
local_files_only=local_files_only,
)
transformer instance-attribute ¶
transformer = ErnieImageTransformer2DModel(
quant_config=od_config.quantization_config,
**transformer_kwargs,
)
vae instance-attribute ¶
vae = AutoencoderKLFlux2.from_pretrained(
model,
subfolder="vae",
torch_dtype=od_config.dtype,
local_files_only=local_files_only,
).to(self._execution_device)
vae_scale_factor instance-attribute ¶
vae_scale_factor = (
2 ** len(self.vae.config.block_out_channels)
if getattr(self, "vae", None)
else 16
)
weights_sources instance-attribute ¶
weights_sources = [
DiffusersPipelineLoader.ComponentSource(
model_or_path=od_config.model,
subfolder="transformer",
revision=None,
prefix="transformer.",
fall_back_to_pt=True,
)
]
check_inputs ¶
check_inputs(
prompt,
height,
width,
prompt_embeds=None,
callback_on_step_end_tensor_inputs=None,
guidance_scale=None,
)
encode_prompt ¶
encode_prompt(
prompt: str | list[str],
device: device,
num_images_per_prompt: int = 1,
width: int = 1024,
height: int = 1024,
apply_pe: bool = True,
) -> list[Tensor]
forward ¶
forward(
req: OmniDiffusionRequest,
prompt: str | list[str] | None = None,
negative_prompt: str | list[str] | None = "",
height: int = 1024,
width: int = 1024,
num_inference_steps: int = 50,
guidance_scale: float = 4.0,
num_images_per_prompt: int = 1,
generator: Generator | None = None,
latents: Tensor | None = None,
prompt_embeds: list[FloatTensor] | None = None,
negative_prompt_embeds: list[FloatTensor] | None = None,
output_type: str = "pil",
return_dict: bool = True,
callback_on_step_end: Callable[[int, int, dict], None]
| None = None,
callback_on_step_end_tensor_inputs: list[str] = [
"latents"
],
) -> DiffusionOutput
ErnieImageTransformer2DModel ¶
Bases: Module
adaLN_modulation instance-attribute ¶
config instance-attribute ¶
config = SimpleNamespace(
patch_size=patch_size,
in_channels=in_channels,
out_channels=self.out_channels,
num_layers=num_layers,
num_attention_heads=num_attention_heads,
ffn_hidden_size=ffn_hidden_size,
hidden_size=hidden_size,
text_in_dim=text_in_dim,
rope_theta=rope_theta,
rope_axes_dim=rope_axes_dim,
eps=eps,
qk_layernorm=qk_layernorm,
)
final_linear instance-attribute ¶
layers instance-attribute ¶
layers = nn.ModuleList(
[
(
ErnieImageSharedAdaLNBlock(
parallel_config=self.parallel_config,
hidden_size=hidden_size,
num_heads=num_attention_heads,
ffn_hidden_size=ffn_hidden_size,
eps=eps,
qk_layernorm=qk_layernorm,
quant_config=quant_config,
)
)
for _ in (range(num_layers))
]
)
pos_embed instance-attribute ¶
pos_embed = ErnieImageEmbedND3(
dim=self.head_dim,
theta=rope_theta,
axes_dim=rope_axes_dim,
)
text_proj instance-attribute ¶
text_proj = (
nn.Linear(text_in_dim, hidden_size, bias=False)
if text_in_dim != hidden_size
else None
)
time_proj instance-attribute ¶
unified_prepare instance-attribute ¶
unified_prepare = UnifiedPrepare(
self.x_embedder, self.text_proj, self.pos_embed
)
x_embedder instance-attribute ¶
x_embedder = ErnieImagePatchEmbedDynamic(
in_channels, hidden_size, patch_size
)