vllm_omni.diffusion.models.lance ¶

Lance (ByteDance) diffusion model components.

Lance is a unified autoregressive + diffusion multimodal model on a Qwen2.5-VL-3B backbone. Architecturally it is the BAGEL family (ByteDance Mixture-of-Transformers): the released Lance_3B checkpoint uses the exact same *_moe_gen MoT weight layout as BAGEL, plus vae2llm / llm2vae / time_embedder / latent_pos_embed connectors. The deltas vs BAGEL are:

backbone is Qwen2.5-VL (mRoPE) instead of Qwen2,
understanding ViT is Qwen2.5-VL vision (not SigLIP), loaded from the base Qwen/Qwen2.5-VL-3B-Instruct rather than from the Lance checkpoint,
VAE is Wan2.2 (reused from the vLLM-Omni WAN path) instead of the BAGEL AE,
video path adds 3D latent position embeddings (follow-up; this module implements the image path first).

Because Lance is BAGEL-lineage, the transformer core is reused verbatim from vllm_omni.diffusion.models.bagel.bagel_transformer and only the pipeline wiring (ViT / VAE / checkpoint layout) is specialized here.

Modules:

Name	Description
`lance_transformer`	Lance transformer pieces.
`pipeline_lance`	LancePipeline — Lance (ByteDance) packaged for the vLLM-Omni diffusion engine.
`prompts`	Lance chat / system prompts.
`wan_vae`	Wan2.2 VAE used by Lance, ported from upstream so `Wan2.2_VAE.pth` loads

LancePipeline ¶

Bases: BagelPipeline

Lance pipeline. Inherits BAGEL's forward/generation; overrides only construction (checkpoint layout, Qwen2.5-VL ViT, Wan2.2 VAE).

bagel `instance-attribute` ¶

bagel = LanceBagel(
    language_model=self.language_model,
    vit_model=self.vit_model,
    parallel_config=parallel_config,
    quant_config=quant_config,
    prefix="bagel",
    config=BagelConfig(
        llm_config=llm_config,
        vae_config=vae_cfg,
        vit_config=vit_cfg,
        vit_max_num_patch_per_side=LANCE_DEFAULTS.vit_max_num_patch_per_side,
        connector_act=LANCE_DEFAULTS.connector_act,
        interpolate_pos=False,
        latent_patch_size=LANCE_DEFAULTS.latent_patch_size_spatial,
        max_latent_size=LANCE_DEFAULTS.max_latent_size,
        timestep_shift=LANCE_DEFAULTS.timestep_shift,
        visual_gen=True,
        visual_und=und_enabled,
    ),
)

device `instance-attribute` ¶

device = get_local_device()

image_processor `instance-attribute` ¶

image_processor = self._build_image_processor()

language_model `instance-attribute` ¶

language_model = Qwen2MoTForCausalLM(
    llm_config,
    parallel_config=parallel_config,
    quant_config=quant_config,
    prefix="bagel.language_model",
)

od_config `instance-attribute` ¶

od_config = od_config

scheduler `instance-attribute` ¶

scheduler = None

scheduler_kwargs `instance-attribute` ¶

scheduler_kwargs = {}

tokenizer `instance-attribute` ¶

tokenizer = self._load_tokenizer(ckpt_path)

transformer `instance-attribute` ¶

transformer = self.language_model.model

vae `instance-attribute` ¶

vae = self._build_wan22_vae(repo_root)

video_processor `instance-attribute` ¶

video_processor = self._build_video_processor()

vit_model `instance-attribute` ¶

vit_model = self._build_qwen2_5_vl_vit(repo_root)

weights_sources `instance-attribute` ¶

weights_sources = [
    DiffusersPipelineLoader.ComponentSource(
        model_or_path=weights_model,
        subfolder=ckpt_dir
        if ckpt_path != repo_root
        else None,
        revision=od_config.revision,
        prefix="bagel.",
        fall_back_to_pt=False,
    )
]

forward ¶

forward(req)

Dispatch on prompt modality.

modalities == ["video"] (text-to-video) → :meth:_forward_t2v (3-D latents + LanceWanVAE.decode_video).
modalities == ["text"] + multi_modal_data.video (x2t_video) → :meth:_forward_x2t_video (multi-frame Qwen2.5-VL ViT prefill).
modalities == ["image"] + multi_modal_data.img2img (image_edit) → :meth:_forward_image_edit (Lance-native VAE+ViT prefill + image gen).
modalities == ["video"] + multi_modal_data.video (video_edit) → :meth:_forward_video_edit (Lance-native multi-frame VAE+ViT prefill
video gen).
Everything else falls through to :meth:BagelPipeline.forward (t2i, x2t_image).

get_lance_post_process_func ¶

get_lance_post_process_func(od_config: OmniDiffusionConfig)

Lance returns PIL.Image.Image directly, same as BAGEL.

vllm_omni.diffusion.models.lance ¶

LancePipeline ¶

bagel instance-attribute ¶

device instance-attribute ¶

image_processor instance-attribute ¶

language_model instance-attribute ¶

od_config instance-attribute ¶

scheduler instance-attribute ¶

scheduler_kwargs instance-attribute ¶

tokenizer instance-attribute ¶

transformer instance-attribute ¶

vae instance-attribute ¶

video_processor instance-attribute ¶

vit_model instance-attribute ¶

weights_sources instance-attribute ¶