vllm_omni.diffusion.models.longcat_image.pipeline_longcat_image ¶
LongCatImagePipeline ¶
Bases: Module, CFGParallelMixin, DiffusionPipelineProfilerMixin
prompt_template_encode_prefix instance-attribute ¶
prompt_template_encode_prefix = "<|im_start|>system\nAs an image captioning expert, generate a descriptive text prompt based on an image content, suitable for input to a text-to-image model.<|im_end|>\n<|im_start|>user\n"
prompt_template_encode_suffix instance-attribute ¶
scheduler instance-attribute ¶
text_encoder instance-attribute ¶
text_encoder = from_pretrained_with_prefetch(
from_pretrained,
model,
subfolder="text_encoder",
prefetch_list=longcat_subfolders,
local_files_only=local_files_only,
)
text_processor instance-attribute ¶
text_processor = from_pretrained(
model,
subfolder="tokenizer",
local_files_only=local_files_only,
)
tokenizer instance-attribute ¶
transformer instance-attribute ¶
transformer = LongCatImageTransformer2DModel(
od_config=od_config
)
vae_scale_factor instance-attribute ¶
weights_sources instance-attribute ¶
weights_sources = [
ComponentSource(
model_or_path=model,
subfolder="transformer",
revision=None,
prefix="transformer.",
fall_back_to_pt=True,
)
]
cfg_normalize_function ¶
Normalize the combined noise prediction.
check_inputs ¶
check_inputs(
prompt,
height,
width,
negative_prompt=None,
prompt_embeds=None,
negative_prompt_embeds=None,
)
encode_prompt ¶
encode_prompt(
prompt: str | list[str] | None = None,
num_images_per_prompt: int | None = 1,
prompt_embeds: Tensor | None = None,
) -> tuple[Tensor, Tensor]
forward ¶
forward(
req: OmniDiffusionRequest,
prompt: str | list[str] | None = None,
negative_prompt: str | list[str] | None = None,
height: int | None = None,
width: int | None = None,
num_inference_steps: int = 50,
sigmas: list[float] | None = None,
guidance_scale: float = 4.5,
num_images_per_prompt: int = 1,
generator: Generator | list[Generator] | None = None,
latents: FloatTensor | None = None,
prompt_embeds: Tensor | None = None,
negative_prompt_embeds: Tensor | None = None,
output_type: str | None = "pil",
return_dict: bool = True,
joint_attention_kwargs: dict[str, Any] | None = None,
enable_cfg_renorm: bool | None = True,
cfg_renorm_min: float | None = 0.0,
enable_prompt_rewrite: bool | None = True,
) -> DiffusionOutput
load_weights ¶
Load weights using AutoWeightsLoader for vLLM integration.
prepare_latents ¶
prepare_latents(
batch_size,
num_channels_latents,
height,
width,
dtype,
device,
generator,
latents=None,
)
calculate_shift ¶
calculate_shift(
image_seq_len,
base_seq_len: int = 256,
max_seq_len: int = 4096,
base_shift: float = 0.5,
max_shift: float = 1.15,
)
get_longcat_image_post_process_func ¶
get_longcat_image_post_process_func(
od_config: OmniDiffusionConfig,
)
prepare_pos_ids ¶
prepare_pos_ids(
modality_id=0,
type="text",
start=(0, 0),
num_token=None,
height=None,
width=None,
) -> Tensor
retrieve_timesteps ¶
retrieve_timesteps(
scheduler: SchedulerMixin,
num_inference_steps: int | None = None,
device: str | device | None = None,
timesteps: list[int] | None = None,
sigmas: list[float] | None = None,
**kwargs,
) -> tuple[Tensor, int]
Calls the scheduler's set_timesteps method and retrieves timesteps from the scheduler after the call. Handles custom timesteps. Any kwargs will be supplied to scheduler.set_timesteps.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
scheduler | `SchedulerMixin` | The scheduler to get timesteps from. | required |
num_inference_steps | `int` | The number of diffusion steps used when generating samples with a pre-trained model. If used, | None |
device | `str` or `torch.device`, *optional* | The device to which the timesteps should be moved to. If | None |
timesteps | `list[int]`, *optional* | Custom timesteps used to override the timestep spacing strategy of the scheduler. If | None |
sigmas | `list[float]`, *optional* | Custom sigmas used to override the timestep spacing strategy of the scheduler. If | None |
Returns:
| Type | Description |
|---|---|
Tensor |
|
int | second element is the number of inference steps. |
split_quotation ¶
Implement a regex-based string splitting algorithm that identifies delimiters defined by single or double quote pairs.
Examples:: >>> prompt_en = "Please write 'Hello' on the blackboard for me." >>> print(split_quotation(prompt_en)) >>> # output: [('Please write ', False), ("'Hello'", True), (' on the blackboard for me.', False)]