Skip to content

vllm_omni.diffusion.models.nextstep_1_1

Modules:

Name Description
modeling_flux_vae
modeling_nextstep
modeling_nextstep_heads
modeling_nextstep_llama
pipeline_nextstep_1_1

NextStep11Pipeline

Bases: Module, DiffusionPipelineProfilerMixin

NextStep-1.1 Pipeline for text-to-image generation.

This pipeline implements the autoregressive flow-based image generation model from StepFun. It uses an LLM backbone with a flow matching head to generate images autoregressively.

boi instance-attribute

boi = getattr(config, 'boi', None)

config instance-attribute

config = config

device property

device

down_factor instance-attribute

down_factor = vae_factor * latent_patch_size

dtype property

dtype

eoi instance-attribute

eoi = getattr(config, 'eoi', None)

image_placeholder_id instance-attribute

image_placeholder_id = getattr(
    config, "image_placeholder_id", None
)

model instance-attribute

model = NextStepModel(config)

od_config instance-attribute

od_config = od_config

pil2tensor instance-attribute

pil2tensor = Compose(
    [
        ToTensor(),
        Normalize([0.5, 0.5, 0.5], [0.5, 0.5, 0.5]),
    ]
)

scaling_factor instance-attribute

scaling_factor = getattr(config, 'scaling_factor', 1.0)

shift_factor instance-attribute

shift_factor = getattr(config, 'shift_factor', 0.0)

tokenizer instance-attribute

tokenizer: PreTrainedTokenizer = from_pretrained(
    model_path,
    local_files_only=True,
    model_max_length=512,
    padding_side="left",
    use_fast=True,
    trust_remote_code=True,
)

vae instance-attribute

vae = from_pretrained(vae_path)

weights_sources instance-attribute

weights_sources = [
    ComponentSource(
        model_or_path=model_path,
        subfolder=None,
        revision=None,
        prefix="model.",
        fall_back_to_pt=True,
        allow_patterns_overrides=[
            "model-*.safetensors",
            "model.safetensors",
        ],
    )
]

decoding

decoding(
    c: Tensor,
    attention_mask: Tensor,
    past_key_values,
    max_new_len: int,
    num_images_per_caption: int,
    use_norm: bool = False,
    cfg: float = 1.0,
    cfg_img: float = 1.0,
    cfg_mult: int = 1,
    cfg_schedule: Literal[
        "linear", "constant"
    ] = "constant",
    timesteps_shift: float = 1.0,
    num_sampling_steps: int = 20,
    progress: bool = True,
    hw: tuple[int, int] = (256, 256),
)

Autoregressive image token decoding with optional CFG-Parallel.

forward

forward(
    req: OmniDiffusionRequest,
    prompt: str | list[str] | None = None,
    height: int | None = None,
    width: int | None = None,
    num_inference_steps: int = 28,
    guidance_scale: float = 7.5,
    negative_prompt: str | list[str] | None = None,
    num_images_per_prompt: int = 1,
    generator: Generator | None = None,
    seed: int | None = None,
    **kwargs,
) -> DiffusionOutput

Generate images from text prompts.

Parameters:

Name Type Description Default
req OmniDiffusionRequest

OmniDiffusionRequest containing generation parameters

required
prompt str | list[str] | None

Text prompt(s) for generation

None
height int | None

Output image height

None
width int | None

Output image width

None
num_inference_steps int

Number of sampling steps (default 28 for NextStep-1.1)

28
guidance_scale float

CFG scale

7.5
negative_prompt str | list[str] | None

Negative prompt for CFG

None
num_images_per_prompt int

Number of images per prompt

1
generator Generator | None

Random generator for reproducibility

None
seed int | None

Random seed

None

Returns:

Type Description
DiffusionOutput

DiffusionOutput containing generated images

load_weights

load_weights(
    weights: Iterable[tuple[str, Tensor]],
) -> set[str]

Load model weights.

to

to(device=None, dtype=None)

get_nextstep11_post_process_func

get_nextstep11_post_process_func(
    od_config: OmniDiffusionConfig,
)

Return post-processing function for NextStep-1.1 pipeline outputs.