Skip to content

vllm_omni.model_executor.models.dynin_omni.pipeline

Dynin-Omni pipeline topology (frozen).

Stage 0: token2text — multimodal understanding / text generation (comprehension) Stage 1: token2image — stage-0 tokens → image latents Stage 2: token2audio — stage-1 tokens → audio latents

All three stages run on the generation worker (LLM_GENERATION); the inter-stage hand-off uses the worker-connector full-payload data plane (*_full_payload producers + token2text_to_token2image / token2image_to_token2audio consumers). Deploy knobs (devices, GPU memory, batched tokens, connectors) live in vllm_omni/deploy/dynin_omni*.yaml.

DYNIN_OMNI_PIPELINE module-attribute

DYNIN_OMNI_PIPELINE = PipelineConfig(
    model_type="dynin_omni",
    model_arch="DyninOmniForConditionalGeneration",
    hf_architectures=("DyninOmniForConditionalGeneration",),
    stages=(
        StagePipelineConfig(
            stage_id=0,
            model_stage="token2text",
            execution_type=LLM_GENERATION,
            input_sources=(),
            final_output=True,
            final_output_type="text",
            owns_tokenizer=True,
            engine_output_type="latent",
            custom_process_next_stage_input_func=f"{_PROC}.token2text_to_token2image_full_payload",
        ),
        StagePipelineConfig(
            stage_id=1,
            model_stage="token2image",
            execution_type=LLM_GENERATION,
            input_sources=(0,),
            final_output=True,
            final_output_type="image",
            engine_output_type="latent",
            custom_process_input_func=f"{_PROC}.token2text_to_token2image",
            custom_process_next_stage_input_func=f"{_PROC}.token2image_to_token2audio_full_payload",
        ),
        StagePipelineConfig(
            stage_id=2,
            model_stage="token2audio",
            execution_type=LLM_GENERATION,
            input_sources=(1,),
            final_output=True,
            final_output_type="audio",
            engine_output_type="latent",
            custom_process_input_func=f"{_PROC}.token2image_to_token2audio",
        ),
    ),
)