Skip to content

vllm_omni.model_executor.models.glm_image.pipeline

GLM-Image pipeline topologies (frozen). Two-stage (default): Stage 0: AR — multimodal understanding + token_ids generation Stage 1: DiT — diffusion image generation

GLM_IMAGE_PIPELINE module-attribute

GLM_IMAGE_PIPELINE = PipelineConfig(
    model_type="glm_image",
    model_arch="GlmImageForConditionalGeneration",
    hf_architectures=("GlmImageForConditionalGeneration",),
    diffusers_class_name="GlmImagePipeline",
    stages=(
        StagePipelineConfig(
            stage_id=0,
            model_stage="ar",
            execution_type=LLM_AR,
            requires_multimodal_data=True,
            input_sources=(),
            final_output=False,
            owns_tokenizer=True,
            model_arch="GlmImageForConditionalGeneration",
            engine_output_type="token_ids",
            model_subdir="vision_language_encoder",
            tokenizer_subdir="processor",
        ),
        StagePipelineConfig(
            stage_id=1,
            model_stage="dit",
            execution_type=DIFFUSION,
            input_sources=(0,),
            requires_multimodal_data=True,
            final_output=True,
            final_output_type="image",
            model_arch="GlmImagePipeline",
            custom_process_input_func="vllm_omni.model_executor.stage_input_processors.glm_image.ar2diffusion",
            omni_kv_config={"need_recv_cache": False},
        ),
    ),
)