vllm_omni.diffusion.models.nextstep_1_1 ¶
Modules:
| Name | Description |
|---|---|
modeling_flux_vae | |
modeling_nextstep | |
modeling_nextstep_heads | |
modeling_nextstep_llama | |
pipeline_nextstep_1_1 | |
NextStep11Pipeline ¶
Bases: Module, DiffusionPipelineProfilerMixin
NextStep-1.1 Pipeline for text-to-image generation.
This pipeline implements the autoregressive flow-based image generation model from StepFun. It uses an LLM backbone with a flow matching head to generate images autoregressively.
image_placeholder_id instance-attribute ¶
image_placeholder_id = getattr(
config, "image_placeholder_id", None
)
pil2tensor instance-attribute ¶
tokenizer instance-attribute ¶
tokenizer: PreTrainedTokenizer = from_pretrained(
model_path,
local_files_only=True,
model_max_length=512,
padding_side="left",
use_fast=True,
trust_remote_code=True,
)
weights_sources instance-attribute ¶
weights_sources = [
ComponentSource(
model_or_path=model_path,
subfolder=None,
revision=None,
prefix="model.",
fall_back_to_pt=True,
allow_patterns_overrides=[
"model-*.safetensors",
"model.safetensors",
],
)
]
decoding ¶
decoding(
c: Tensor,
attention_mask: Tensor,
past_key_values,
max_new_len: int,
num_images_per_caption: int,
use_norm: bool = False,
cfg: float = 1.0,
cfg_img: float = 1.0,
cfg_mult: int = 1,
cfg_schedule: Literal[
"linear", "constant"
] = "constant",
timesteps_shift: float = 1.0,
num_sampling_steps: int = 20,
progress: bool = True,
hw: tuple[int, int] = (256, 256),
)
Autoregressive image token decoding with optional CFG-Parallel.
forward ¶
forward(
req: OmniDiffusionRequest,
prompt: str | list[str] | None = None,
height: int | None = None,
width: int | None = None,
num_inference_steps: int = 28,
guidance_scale: float = 7.5,
negative_prompt: str | list[str] | None = None,
num_images_per_prompt: int = 1,
generator: Generator | None = None,
seed: int | None = None,
**kwargs,
) -> DiffusionOutput
Generate images from text prompts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
req | OmniDiffusionRequest | OmniDiffusionRequest containing generation parameters | required |
prompt | str | list[str] | None | Text prompt(s) for generation | None |
height | int | None | Output image height | None |
width | int | None | Output image width | None |
num_inference_steps | int | Number of sampling steps (default 28 for NextStep-1.1) | 28 |
guidance_scale | float | CFG scale | 7.5 |
negative_prompt | str | list[str] | None | Negative prompt for CFG | None |
num_images_per_prompt | int | Number of images per prompt | 1 |
generator | Generator | None | Random generator for reproducibility | None |
seed | int | None | Random seed | None |
Returns:
| Type | Description |
|---|---|
DiffusionOutput | DiffusionOutput containing generated images |
load_weights ¶
Load model weights.
get_nextstep11_post_process_func ¶
get_nextstep11_post_process_func(
od_config: OmniDiffusionConfig,
)
Return post-processing function for NextStep-1.1 pipeline outputs.